Thursday, February 23, 2017

Writing ORC files is easier than a few years ago

Several years ago I was asked to compare writing Parquet and ORCFile formats from standalone java (without using the Hadoop libraries). At the time ORC was not separated from Hive and it was much more involved than writing Parquet from java. It looks like that changed in 2015 but I only revisited the issue within the past few months.

To build ORC:
Download the current release (currently 1.3.2)
tar xzvf orc-1.3.2.tar.gz && cd ./orc-1.3.2/
cd ./java
mvn package

ls -la ./tools/target/orc-tools-1.3.2-uber.jar

A simple example of writing is:


And a simple example of reading is: