Moving my .net Word Count Map/Reduce to Azure

Of course the next step after running my first .net Map Reduce job on HDInsight Server and then the effort to get from unstructured to Power View must be to try it out on Azure as well so I provisioned myself a test cluster on HadoopOnAzure and initiated a remote desktop into it.

I need to get my files onto it (my WordCountSample.dll and all the files from theMRLib folder in the project), so I zipped them up, stored them in skydrive and downloaded on the other end (love SkyDrive)

After extracting the folder on the Azure node, in order to test the word count sample, it was simply a case of uploading the source (I used the interactive web console with fs.put() for that), opening the hadoop command prompt (on the desktop) and running the job as I did previously.

As expected – the two work just the same and I can easy move my jobs between them untouched.

To test the aviation weather report scenario I wanted more data than I cared to upload manually, so I wrote a small Worker role for Azure that downloaded the latest METAR reports hourly and stored in them as blobs in a contained on Azure.

I then configured my cluster with my Azure account and was able to run the job direcly on the data in Azure Storage, which is really cool in my view.

I could store lots of data cheaply, then provision a cluster, run the processing, store the result in Azure as well and get rid of the cluster.