Windows Azure HPC Job Scheduler

HPC Server 2008 R2 SP1 added the ability to burst into Azure. David Chappell wrote a good white paper on the combination of Windows HPC Server and Windows Azure.

Now, with the introduction of SP3, we have announced the ability to run the scheduler directly on Azure, with no need for on-premise software.

To learn more take a look at Getting Started with Application Deployment with the Windows Azure HPC Scheduler

“I have a cloud”

In all the initial conversations that I’m having with customers about the Windows Azure Platform I go over what I believe cloud computing to be, which often includes this definition from the National Institute of Standards and Technology (NST) (bolds are mine)

Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.

I also often use a metaphor for cloud computing I heard not long ago and really liked – which is one of a water tap –

  • When you approach a tap, you expect to be able to open and close it yourself – this is the principle of self-provisioning
  • You expect that the more you open it – the more water you are going to get –  in other words – immediate scalability
  • You expect to only pay for the water you use, with 0 cost when the tap is fully closed- metering and chargeback
  • and you also expect that the utility company will share the water in the most efficient way possible, so that you get the water in the lowest cost possible – resource pooling

I think this metaphor is very useful when discussing what one should expect from a cloud platform, and helps distinguishing traditional virtual hosting farms from cloud platforms.

This is important because in many of the conversations I’m having, somebody will, at some point, suggest that this particular company already has a cloud platform delivered through virtualisation, 3rd party hosting etc., but practically every time, when we explore this point a bit more in conversation, and dig a bit deeper into the capabilities of the Windows Azure platform versus those currently available to the company, significant differences emerge, often enough to suggest that public cloud can add a lot of value.

Private cloud, in its current form, isn’t an overhead-free solution – whilst its true that very large organisations can spread overhead costs of purchasing and maintaining server farms across many business units, and that those who can make some large technological investments can achieve a reasonable high level of automation – overheads still do exist, and, for most, organisations, these appear as significant upfront costs in purchasing (i.e. CAPEX) as well as substantial costs in running (i.e OPEX) servers and software,

Public cloud offering at the scale of Microsoft’s spreads this overhead over many customers, and removes any upfront investment aspect, the cost is strictly on a pay-for-what-you-use basis at a well know rate, there’s not cost to not using resources.

Further more – the scale of the Microsoft investment in the technology in those data centres, driven by the scale of the adoption, means that the level of automation achieved in our data centres is incredibly high (no human interaction required to provision services or to update them, or even to overcome issues in any running service)

This high level of automation helps keeping the cost of the services offered low, certainly lower than most, if not all, private cloud solutions , but it also allows us to provide some capabilities that are unique to the platform, and this is where the majority of value of the platform lies, consider a couple of examples –

When you’re handing your application to the Windows Azure Fabric to be deployed, the fabric makes a decision where to deploy your application to. several factors come into play in this decision, one of which is the concept of failure domains:

Assuming you’ve asked the controller to deploy your application to at least 2 instances, it ensures that the deployment is such that these instances will not have any single point of failure – they will be deployed on separate hosts, on separate racks, with separate network switches and power supply. short of the data centre blowing up, you are pretty much guaranteed that at least one of your instances will remain alive. this is much more than most organisations have the ability to do.

But it doesn’t stop there – the controller also constantly monitors the state of the instances it ‘owns’ and if indeed one fails, it would immediately start deploying it elsewhere. the net result of which is that in minutes after any failure, your application will resume the same level of resiliency as it had initially, and all of this is done with no human intervention.

These are just a couple of examples of the sort of things that we can do on our platform that private cloud initiative can’t easily achieve today. this is at the heart of the strong proposition that is public cloud.

And so – whilst it is true that many organisations have built strong virtualisation platform that provide many cloud-like capabilities, I don’t think that they are quite comparable to the capabilities of Windows Azure – being able to self-provision is great, but being able to do so and in the process consider aspects such as failure domains and performance characteristics is better. private clouds can offer dynamic scaling, but are more sensitive to capacity planning than a massive scale public cloud offering. virtualisation helps increase utilisation, but hosting many many customers across geographies and industries, and cleverly analysing multi-tenancy patterns means that utilisation is greater still.

%d bloggers like this: