Setting up my environment to build packages to run on Hadoop on Azure

It shouldn’t have, and I have only myself to blame, but it took some time before I finally figured out what I need to do to setup an environment on my laptop that I could use to build Map/Reduce programs in Java to run on Hadoop on Azure, here’s the set-up I have -

I’ve downloaded and extracted Eclipse Version: 3.6.1 (Helios) from http://archive.eclipse.org/eclipse/downloads/ to my Program Files (x86) directory (could have been anywhere, of course)

I then downloaded the Hadoop Eclipse plug-in (hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar) and placed it in the plug-ins folder in Eclipse – I found it here

Following the excellent instructions on the YDN tutorial(despite the versions mismatch) I was able to confirm that the plug-in loads fine and looks as it should, although currently, given the lack of proper authentication with hadoop, hadoop-on-azure does not allow connecting to the cluster from the outside (that would introduce a risk of somebody else connecting to the cluster, as all it takes is guessing the username) which means it was not actually possible for me to connect from the Map/Reduce locations panel to the cluster or indeed through the HDFS node in the project explorer.

It also appears that the plug-in lags behind the core development, and project templates are not up-to-date with the most recent changes to the hadoop classes, but that’s not too much of a problem as there’s not much code in the templates and this can be easily replaced/corrected.

The bit that, due to my lack of experience with Java and Eclipse, took infinitely longer than it should have is figuring out that this is not enough to build a map/reduce project….

Copying the code from the WordCount sample I kept getting errors about most of my imports until I finally figured out what should have been very obvious – I needed hadoop-core-0.20.203.0.jar and commons-cli-1.2.jar, the former could be found on http://mirror.catn.com/pub/apache/hadoop/common/hadoop-0.23.0/  the latter could be found on http://commons.apache.org/cli/download_cli.cgi, although both (and others) also exist in the cluster so I could RDP into it and use skydrive to transfer them over.

That was pretty much it – I could then create a new project, create a new class, paste in the contents of WordCount.java from the sample provided, export the JAR file and use it to submit a new job on hadoop-on-azure

What took me so long?! Smile

Next step would be to be able to test things locally, but I don’t think I’ll go there just yet…

About these ads

About Yossi Dahan
I work as a principal consultant in the CTO office of Solidsoft - a Microsoft partner in the UK with a strong focus on cloud, hybrid and integration based solutions. I spend my days working with both our customers and our project teams, helping them explore the possibilities that technology enables and how to derive value from them.

One Response to Setting up my environment to build packages to run on Hadoop on Azure

  1. John says:

    Thanks for the info. What did you do about the versions mismatch from YDN tutorial? I play their hadoop image on virtual machine and use hadoop-0.20.3-dev-eclipse-plugin.jar on eclipse. The view shows up but hadoop side logs says “Incorrect header or version mismatch from 192.168.253.1:56615 got version 3 expected version 2″. Do you have to install a newer version of hadoop? Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: