Monday, July 3, 2017

Giraph Error: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

Very old post from 2014 that got lost in my drafts. Posting so hopefully this helps out someone.

Often Google acts like magic for me: type in my error, and out pops the solution. Not so for a Giraph error I recently hit. Hopefully this post lets Google work like magic for someone else :)

After installing Giraph on a BigTop 0.7 VM, I was able to run the benchmark that takes no input or output but nothing more complicated.

This works:
hadoop jar /usr/share/doc/giraph-1.0.0.5/giraph-examples-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.zkList=127.0.0.1:2181 -libjars /usr/lib/giraph/giraph-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar -e 1 -s 3 -v -V 50 -w 1

But this:
hadoop jar /usr/share/doc/giraph-1.0.0.5/giraph-examples-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.zkList=127.0.0.1:2181 -libjars /usr/lib/giraph/giraph.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/acoleman/giraphtest/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/acoleman/giraphtest/shortestpathsC2 -ca SimpleShortestPathsVertex.source=2 -w 1

does not.

Looking at the latest container logs with:
cat $(ls -1rtd $(ls -1rtd /var/log/hadoop-yarn/containers/application_* | tail -1)/container_* | tail -1)/*

I find:
Error: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

I beat my head against the wall trying to add to libjars, to -yj, copying jars into every directory i could find.

I stumbled across
http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3C198091226.KO6f1kuK42@chronos7%3E

which gives the answer. If https://issues.apache.org/jira/browse/GIRAPH-814 hasn't been applied, then mapreduce.application.classpath has to be hard set or Giraph simply won't work.

vi /etc/hadoop/conf.pseudo/mapred-site.xml
  <property>
    <name>mapreduce.application.classpath</name>
    <value>/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,/usr/lib/giraph/giraph-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar
    </value>
  </property>

I did not need to restart yarn-resourcemanager or yarn-nodemanager for this to get picked up.

DropWizard and Hive (and/or Impala)

I have a small DropWizard/D3.js/jqGrid application to visualize the results of some analysis. I had been taking the results of the analysis from hdfs and shoveling it into mySQL (with sqoop) to examine samples. This is working well enough that I wanted to go straight to the source. With DropWizard this should be easy enough to wrap my data in a Hive external table and use the Hive JDBC driver instead of mySQL.

If you are already familiar with DropWizard and just need an example, examine the pom.xml and config-hive.yaml files in my example application on GitHub.

To pull in Hive JDBC and its dependencies, add to pom.xml:

       
  <dependency>
   <groupid>org.apache.hive</groupid>
   <artifactid>hive-jdbc</artifactid>
   <version>1.1.0</version>
   <exclusions>
    <exclusion>
     <groupid>org.slf4j</groupid>
     <artifactid>slf4j-log4j12</artifactid>
    </exclusion>
    <exclusion>
     <groupid>com.sun.jersey</groupid>
     <artifactid>*</artifactid>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupid>org.apache.hadoop</groupid>
   <artifactid>hadoop-common</artifactid>
   <version>2.6.0</version>
   <exclusions>
    <exclusion>
     <groupid>org.slf4j</groupid>
     <artifactid>slf4j-log4j12</artifactid>
    </exclusion>
    <exclusion>
     <groupid>com.sun.jersey</groupid>
     <artifactid>*</artifactid>
    </exclusion>
   </exclusions>
  </dependency>