Hide-and-seek with Cloud Services containers for Virtual Machines on Azure

From several conversations I had recently it transpired that there’s some confusion around the relevancy of Cloud Service when using Virtual Machines on Azure.
Initially it surprised me, but then it was pointed out to me that, in fact, the Windows Azure Management Portal does a very good job festering this confusion.

I’ll get to why in a moment but first let me start with this –

The simplest way to think about how a Virtual Machine is deployed to Azure is the good-old VM Role –

Just as the VM Role used to, a Windows Azure Virtual Machine is deployed as a role with a single instance within a cloud service. This is quite apparent when you look at it in the old portal – here’s an example of a VM I created in the Management Portal using the standard quick-create as seen on the old portal –

image

Ok – there’s plenty behind the scenes that the platform is doing differently when managing VMs vs. Web or Worker roles, but from a deployment model point of view, a cloud service is created with a single instance of a role which happens to be a Virtual Machine.

So – why is the portal confusing? well – because, and I suspect that it was an attempt to simplify things that ended up making things rather more confusing – when you deploy a VM, the management portal chooses to hide it’s existence –

I’m guessing that the thinking was that with a single VM the cloud service is rather irrelevant – conceptually, the role and the instance are one and all the management operations and monitoring are done at an instance level and so – there was no real need to surface the fact that a cloud service exists.

Unfortunately, things start to get more confusing as soon as you delete your VM because at some point during the delete operation, the corresponding Cloud Service will suddenly appear, and eventually, when the delete operation will complete, it will be empty –

image

It appears largely for one reason – so that it can be deleted; otherwise a phantom service will remain in the subscription with no way to manage it through the UI.
BTW – the API and Powershell (and some dialogues, as you will see shortly), seem to always reveal the real details, and the cloud service that is clearly visible in the old-portal is also fully accessible through these. In fact – the description given to the cloud service says it all –

image

So – why the platform, which had kindly created this cloud service for me – doesn’t also kindly remove it? well – I suspect that this is because it cannot assume that the container will only contain the original VM. How come? let’s look at another case when the cloud service suddenly appears –

If you deploy a second VM, from Gallery, and choose to connect to an existing virtual machine –

image

(notice how the non-existent cloud service suddenly makes an appearance?)

What you are actually requesting the platform to do is to deploy an new instance, of a new role, into the same cloud service, and that’s a whole different story!  Now – not only the cloud service has a meaning that is greater than either VM definitions, deleting the first VM clearly cannot delete the cloud service and so – shortly after requesting the second VM – the cloud service will make it’s formal appearance in the portal –

image

and again – looking at how things look like in the old portal can help get a somewhat clearer behind-the-scenes view – one service, two roles, one instance of each –

image

In the new portal, this is reflected like this (which isn’t telling the full story) –

image

Interestingly, although not wholly surprisingly, removing the second VM does not re-hide the cloud service and so it is possible to ‘see in the wild’ a Cloud Service with a single VM in it –

image

And so – given this seemingly erratic behaviour, even if there is some method behind the madness, it is not surprising the people get confused.
Personally, I would have much preferred if things were straight forward, black and white – you create a VM, you get the corresponding Cloud Service. Simples.

But there you go – nobody asked me, so at least I hope this helps shed some light on the subject

Side note – all of this also helps explain the very long and repetitive names given to disks – the naming convention is [cloud serivce]-[VM name]-[disk number]-[timestamp]

Reflection on HDInsight

After a few days of ‘playing’ with HDInsight, both server and service, it was time to think back and take stock.

I had no exposure to Hadoop before I started playing with it on Azure a few months back, and I am still, by all accounts, a complete novice, but having spent the last few days on HDInsight there are a few interesting observations I thought I’d share –

Lowering the barrier of entry

Personally, and this is a very subjective view point, this is perhaps the greatest wow factor of all – until now, if I wanted to “get into the game” with Hadoop, I had to get comfortable with Java (nothing against it, but I’m not), and I had to have Eclipse, and everything had to be just right.
Iif I wanted to run anything locally, I had to get comfortable with running Hadoop in Cygwin and all sort of things (but frankly – tests clusters on Azure have been a great experience, more on that shortly)

Now – with HDInsight Server I can install Hadoop on Windows with a click from the web platform installer; I can get the latest of our distribution of Hadoop (currently in preview) on my laptop, in minutes, with zero config.

I can then use Visual Studio and .net, both I’m very familiar with, to do pretty much all the development I need, I no longer have Eclipse and I don’t really need to use Java.

This is bound to significantly lower the barrier of entry to handling big data for a lot of people. is this the beginning of Hadoop for the masses? Winking smile

Enabling reuse

The other thing that became apparent very quickly as I was building various scenarios in my tests, is that now that I’m developing in .net I can not only build on all the knowledge and experience I’ve accumulated over the years, I can actually build on a lot of code I already have.
When I processed the METAR data – I already had the parser I developed for my Windows 8 application, I did not need to re-write anything, it just slotted in.

Speaking to a few architects and developers in the last week or so these two points resonate very well – so many people want to get into the thick of things but are somewhat intimidated (as I was) or, quite simply, the cost of implementations is too high.

Choice between cloud and on-premise

When I started, I used the hadoponazure.com exclusively, because I did not want to run Hadoop on Cygwin, and I did not have access to our server deployment. Now everyone, pretty much, have access to both. This choice is quite powerful – I can see people like me running development instances locally to benefit from quick coding iterations, but running production clusters in the cloud, benefitting from the cost effectiveness of scaling in the cloud, not to mention on-demand clusters that can be removed when the processing is done.

I could also easily imagine the reverse – teams using the cloud as a test bed, on test data, before running on a production cluster, avoiding the need to maintain several environments on-premises.

Sticking with the community

Hadoop has a large eco-system, and it’s great that we can remain part of that. many projects are being worked on to stabilize and improve on Windows, with others to come in the future, but it seems that many ‘just work’ even now. it is simply great that the decision was taken not to re-invent the wheel here.

So far I’ve been using HDFS and Map/Reduce, but also Hive quite significantly and a bit of Mahout, and I know others have been trying out Oozie and other projects on it too.

I’ve been browsing the Apache repositories a little bit and it seems to me that the contributions made are very well received and go beyond benefiting just those who chose to run Hadoop on Windows, and that’s great too! Smile

Connecting with the rest of the stack

But of course, technology is here to serve a purpose, and when it comes to data, there are already protocols and interfaces that systems and users use very effectively.
People, fundamentally, don’t want to change they consume data – big or small – they want to be able to leverage Hadoop but remain in the familiar tools and technologies such as SQL Server with Analysis Services, Reporting Services, Power View and tools such as Excel with Pivot Tables, data mining plug-ins etc.

It was great to see how easy it is to get from Hadoop to PowerPivot and Power View (even better told by Denny Lee) as well as read a white paper discussing how to leverage Hadoop from SSIS 

Of course there are many other benefits, I’m being really selfish here and put down the 4 that struck me most for my immediate needs. I suspect many organisations will value the ability to run on Windows with all the management story that comes with it (whilst others, I’m sure, won’t care), and there are some very important capabilities that still need covering – Active Directory integration, for example, for better security or System Centre integration for better monitoring. but these are for another day Smile

%d bloggers like this: