Andrew Coleman: 2017

Monday, July 3, 2017

Giraph Error: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

Very old post from 2014 that got lost in my drafts. Posting so hopefully this helps out someone.

Often Google acts like magic for me: type in my error, and out pops the solution. Not so for a Giraph error I recently hit. Hopefully this post lets Google work like magic for someone else :)

After installing Giraph on a BigTop 0.7 VM, I was able to run the benchmark that takes no input or output but nothing more complicated.

This works:
hadoop jar /usr/share/doc/giraph-1.0.0.5/giraph-examples-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.zkList=127.0.0.1:2181 -libjars /usr/lib/giraph/giraph-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar -e 1 -s 3 -v -V 50 -w 1

But this:
hadoop jar /usr/share/doc/giraph-1.0.0.5/giraph-examples-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.zkList=127.0.0.1:2181 -libjars /usr/lib/giraph/giraph.jar org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/acoleman/giraphtest/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/acoleman/giraphtest/shortestpathsC2 -ca SimpleShortestPathsVertex.source=2 -w 1

does not.

Looking at the latest container logs with:

cat $(ls -1rtd $(ls -1rtd /var/log/hadoop-yarn/containers/application_* | tail -1)/container_* | tail -1)/*

I find:

Error: Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster

I beat my head against the wall trying to add to libjars, to -yj, copying jars into every directory i could find.

I stumbled across

http://mail-archives.apache.org/mod_mbox/giraph-user/201312.mbox/%3C198091226.KO6f1kuK42@chronos7%3E

which gives the answer. If https://issues.apache.org/jira/browse/GIRAPH-814 hasn't been applied, then mapreduce.application.classpath has to be hard set or Giraph simply won't work.

vi /etc/hadoop/conf.pseudo/mapred-site.xml

<name>mapreduce.application.classpath</name>

<value>/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*,/usr/lib/giraph/giraph-1.0.0-for-hadoop-2.0.6-alpha-jar-with-dependencies.jar

</value>

</property>

I did not need to restart yarn-resourcemanager or yarn-nodemanager for this to get picked up.

DropWizard and Hive (and/or Impala)

I have a small DropWizard/D3.js/jqGrid application to visualize the results of some analysis. I had been taking the results of the analysis from hdfs and shoveling it into mySQL (with sqoop) to examine samples. This is working well enough that I wanted to go straight to the source. With DropWizard this should be easy enough to wrap my data in a Hive external table and use the Hive JDBC driver instead of mySQL.

If you are already familiar with DropWizard and just need an example, examine the pom.xml and config-hive.yaml files in my example application on GitHub.

To pull in Hive JDBC and its dependencies, add to pom.xml:

       
  <dependency>
   <groupid>org.apache.hive</groupid>
   <artifactid>hive-jdbc</artifactid>
   <version>1.1.0</version>
   <exclusions>
    <exclusion>
     <groupid>org.slf4j</groupid>
     <artifactid>slf4j-log4j12</artifactid>
    </exclusion>
    <exclusion>
     <groupid>com.sun.jersey</groupid>
     <artifactid>*</artifactid>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupid>org.apache.hadoop</groupid>
   <artifactid>hadoop-common</artifactid>
   <version>2.6.0</version>
   <exclusions>
    <exclusion>
     <groupid>org.slf4j</groupid>
     <artifactid>slf4j-log4j12</artifactid>
    </exclusion>
    <exclusion>
     <groupid>com.sun.jersey</groupid>
     <artifactid>*</artifactid>
    </exclusion>
   </exclusions>
  </dependency>

Sunday, June 18, 2017

Pinebook!

Not Hadoop-related, but awesome all the same. A few months ago I stumbled on PINE64's website and saw the pinebook, a linux arm64 laptop. That and a PocketCHIP made a great late-birthday, early-father's-day set of presents.

Build and shipping takes a couple months, shipping was almost 1/3 of the laptop cost, and performance and keyboard quality is exactly what you would expect :) But it is still a fun bit of hardware.

If you decide to get one, make sure to add on a USB-to-H-barrel power cord (or make your own). The pinebook does come with a power supply, but no point in carting around yet another wall-wart when the pinebook happily charges off a phone charger.

Mine powered right up into Xenial. I'm normally RH-based since everywhere I've been employed in the last couple decades has been, so it's nice to jump back into Debian-based.

aarch64 wasn't in mainline rust, but was in nursery, so
curl -sSf https://raw.githubusercontent.com/rust-lang-nursery/rustup.rs/master/rustup-init.sh | bash

worked just fine and got me up and going with rust.

Update: HackADay has a great write-up. I didn't experience any of the screen issues they had since I have the 14", but the page has a great tear-down and overview of performance (which is not much :) )

Thursday, February 23, 2017

Writing ORC files is easier than a few years ago

Several years ago I was asked to compare writing Parquet and ORCFile formats from standalone java (without using the Hadoop libraries). At the time ORC was not separated from Hive and it was much more involved than writing Parquet from java. It looks like that changed in 2015 but I only revisited the issue within the past few months.

To build ORC:
Download the current release (currently 1.3.2)
tar xzvf orc-1.3.2.tar.gz && cd ./orc-1.3.2/
cd ./java

mvn package

ls -la ./tools/target/orc-tools-1.3.2-uber.jar

A simple example of writing is:

And a simple example of reading is: