Setting up my environment to build packages to run on Hadoop on Azure
May 7, 2012 1 Comment
It shouldn’t have, and I have only myself to blame, but it took some time before I finally figured out what I need to do to setup an environment on my laptop that I could use to build Map/Reduce programs in Java to run on Hadoop on Azure, here’s the set-up I have -
I’ve downloaded and extracted Eclipse Version: 3.6.1 (Helios) from http://archive.eclipse.org/eclipse/downloads/ to my Program Files (x86) directory (could have been anywhere, of course)
I then downloaded the Hadoop Eclipse plug-in (hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar) and placed it in the plug-ins folder in Eclipse – I found it here
Following the excellent instructions on the YDN tutorial(despite the versions mismatch) I was able to confirm that the plug-in loads fine and looks as it should, although currently, given the lack of proper authentication with hadoop, hadoop-on-azure does not allow connecting to the cluster from the outside (that would introduce a risk of somebody else connecting to the cluster, as all it takes is guessing the username) which means it was not actually possible for me to connect from the Map/Reduce locations panel to the cluster or indeed through the HDFS node in the project explorer.
It also appears that the plug-in lags behind the core development, and project templates are not up-to-date with the most recent changes to the hadoop classes, but that’s not too much of a problem as there’s not much code in the templates and this can be easily replaced/corrected.
The bit that, due to my lack of experience with Java and Eclipse, took infinitely longer than it should have is figuring out that this is not enough to build a map/reduce project….
Copying the code from the WordCount sample I kept getting errors about most of my imports until I finally figured out what should have been very obvious – I needed hadoop-core-0.20.203.0.jar and commons-cli-1.2.jar, the former could be found on http://mirror.catn.com/pub/apache/hadoop/common/hadoop-0.23.0/ the latter could be found on http://commons.apache.org/cli/download_cli.cgi, although both (and others) also exist in the cluster so I could RDP into it and use skydrive to transfer them over.
That was pretty much it – I could then create a new project, create a new class, paste in the contents of WordCount.java from the sample provided, export the JAR file and use it to submit a new job on hadoop-on-azure
What took me so long?!
Next step would be to be able to test things locally, but I don’t think I’ll go there just yet…