Monday, May 26, 2014

Processing ASN.1 Call Detail Records with Hadoop (using Bouncy Castle)

In these posts I describe using the Bouncy Castle java library to process Call Detail Records (CDRs) in ASN.1 format (encoded as DER). The same process should work for any ASN.1 data encoded as DER.

I hope to replicate this with an ASN.1 Java compiler (BinaryNotes), but right now bnotes does not handle indefinite length. With an ASN.1 compiler, the compiler will create the Java classes from the ASN.1 specification so I don't have to manually create the classes to hold the data.

Creating some data

First, I need a specification I can work with and post. I created a "Simple Generic CDR" ASN.1 specification/schema:

GenericCDR-Schema DEFINITIONS IMPLICIT TAGS ::=
BEGIN
GenericCallDataRecord ::= SEQUENCE {
recordNumber [APPLICATION 2] IMPLICIT INTEGER,
callingNumber [APPLICATION 8] IMPLICIT UTF8String (SIZE(1..20)),
calledNumber [APPLICATION 9] IMPLICIT UTF8String (SIZE(1..20)),
startDate [APPLICATION 16] IMPLICIT  UTF8String (SIZE(8)),
startTime [APPLICATION 18] IMPLICIT UTF8String (SIZE(6)),
duration [APPLICATION 19] IMPLICIT INTEGER
}
END

For production data, the ASN.1 specification (also called grammar) would come from the vendor producing the data.

The awesome OSS Nokalva people have an online schema checker/compiler and data encoder/decoder. (If you are looking for support, I think you can fairly easily switch out Bouncy Castle with OSS Nokalva's ASN.1 Tools for Java but I haven't tried it).

To create data, paste the above schema into the Schema textbox at asn1-playground.oss.com and press Compile. "Compiled successfully." should show up below the textbox. If not the Console Output textbox on the page should give some clues to the problem.

Next paste in some text-formatted data to compile. In the Data: Encode text box, paste in:

first-cdr GenericCallDataRecord ::=
{
recordNumber 1,
callingNumber "15555550100",
calledNumber "15555550101",
startDate "20131016",
startTime "134534",
duration 65
}
second-cdr GenericCallDataRecord ::=
{
    recordNumber 2,
    callingNumber "15555550102",
    calledNumber "15555550104",
startDate "20131016",
startTime "134541",
duration 52
}
third-cdr GenericCallDataRecord ::=
{
    recordNumber 3,
    callingNumber "15555550103",
    calledNumber "15555550102",
startDate "20131016",
startTime "134751",
duration 62
}
fourth-cdr GenericCallDataRecord ::=
{
    recordNumber 4,
    callingNumber "15555550104",
    calledNumber "15555550102",
startDate "20131016",
startTime "134901",
duration 72
}
fifth-cdr GenericCallDataRecord ::=
{
    recordNumber 5,
    callingNumber "15555550101",
    calledNumber "15555550100",
startDate "20131016",
startTime "135134",
duration 32
}
And press Encode. The Console Output box should show 0 errors. To download the ASN.1 DER encoded data press the DER link below the Data: Encode textbox. The XML link is also nice to download since that is a human readable representation of the same data.

I put the files from all the encoding options in the asn1data folder of the github repo for this post.

Next I create a standalone decoder, then create a Hadoop InputFormat and RecordReader, and finally run the Hadoop job to process the ASN.1 DER-encoded data we just created above.

Update: Links to Part 1, Part 2, Part 3.

1 comment:

  1. As said in the last lines. Where I can get the links for the next steps.

    ReplyDelete