Sunday, July 6, 2014

Custom Writable

I have never tackled a custom Writable before. I am a huge Avro (http://avro.apache.org/) fan, so I usually try to get my data converted to avro early. A discussion got me interested in tackling it and I had my Bouncy Castle ASN.1 Hadoop example open, so I extended that to a basic custom Writable example.

This thread by Oded Rosen was invaluable:
http://mail-archives.apache.org/mod_mbox/hadoop-general/201005.mbox/%3CAANLkTinzP8-nnGg8Q5aaJ8gXCCg6Som7e8Xarc_2PGDD@mail.gmail.com%3E
(also at http://osdir.com/ml/general-hadoop-apache/2010-05/msg00073.html if above is down)

I put the code in package com.awcoleman.BouncyCastleGenericCDRHadoopWithWritable in github.

The basics from the thread above and a bit of other reading are:
If your class will only be used as a value and not a key, implement the Writable interface.
If your class will be used as a key (and possibly a value), implement the WritableComparable interface (which extends Writable).

A Writable must have 3 things:
An empty contructor. There can be other contructors with arguments, but there must be a no argument one as well.
An overridden write method to write variables out.
An overridden readFields method to populate an object from a previous write method output.

Hadoop reuses Writable objects, so cleaning all variables before populating them in readFields will stop surprises.

WritableComparable adds to Writable:
An overridden hashcode method to partition keys.
An overridden compareTo method.

The advice given in the 'How to write a complex Writable' thread adds:
Override the equals method
Implement RawComparator for your type. This post (http://vangjee.wordpress.com/2012/03/30/implementing-rawcomparator-will-speed-up-your-hadoop-mapreduce-mr-jobs-2/) has an example that extends WritableComparator, which implements RawComparator.

In my example in github, I only tested Writable since I pull individual fields and wrap them as Text or LongWritable for the keys.


No comments:

Post a Comment