Manipulating data in Windows Azure Table using PowerPivot

A customer asked yesterday a very good question – is it possible to use PowerPivot analyse data stored in Windows Azure Table Storage.

Given that Table Storage is exposed through oData, which is, as you’d expect, a data source fully supported by PowerPivot, my gut feeling was that this should not be a problem and, in fact, a very interesting scenario – many customers use Windows Azure Storage to store vast quantities of data, given how cost-effective it is and the scale that is possible, but ultimately data is there to be used, and PowerPivot is an amazing tool to process data – the two together make a powerful pair.

Looking at it closer, though, I stumbled into a small hurdle – whilst PowerPivot had support for working with data from Azure DataMarket out of the box for some time now and it supports working with oData feeds, I don’t believe it supports, currently, working with Azure Table Storage directly, the stumbling block being the ShareKey authentication mechanism.

However, this is too useful to give up, so I looked at a workaround, and the most obvious one was to take the man-in-the-middle approach and to publish a WCF Data Service onto an Azure Web Role (doesn’t have to be, of course, but makes perfect sense), which would expose a ‘standard’ oData feed to be consumed by PowerPivot and would get the data from the Table Storage. simples.

To do that I needed some data in Azure Tables and so I decided to use Cerebrata’s Cloud Storage Studio to upload the pubs database to Window Azure Storage – quite a cool and useful feature of their product if you ask me!
(Right click on the ‘Table’s’ node, choose ‘Upload Relational Database’ and follow the steps in the short wizard)

I decided to publish data from the roysched table, only because it had the most rows in it; to do that I create a class that represented a roysched entity –

[DataServiceKey("RowKey", "PartitionKey")]
    public class roysched
    {
        public string PartitionKey {get;set;}
        public string RowKey {get;set;}
        public DateTime Timestamp {get;set;}
        public string title_ID {get;set;}
        public int lorange {get;set;}
        public int hirange {get;set;}
        public int royalty {get;set;}

                public roysched()
        {
        }

        public roysched(string partitionKey, string rowKey, DateTime timestamp, string titleId, int lorange, int hirange, int royalty)
        {
            PartitionKey = partitionKey;
            RowKey = rowKey;
            Timestamp = timestamp;
            title_ID = titleId;
            this.lorange = lorange;
            this.hirange = hirange;
            this.royalty = royalty;
        }
    }

You will notice the DataServiceKey attribute I’ve added – this is needed for the Entity Framework to figure out which fields (or combination of keys, as is the case here) can be used as the identity of the entity as I’ve blogged here

With that done I needed to create a context class to be used by the WCF Data Service, this class will read data from Azure and ‘re-publish’ data as the feed behind the data, this is where the majority of the logic would generally go, but as you can expect I’ve kept this to the minimum for the purpose of this demonstration.

public class PubsContext
    {
        public IQueryable<roysched> list
        {
            get
            {
                var account = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
                var context = new AzurePubsContext(account.TableEndpoint.ToString(), account.Credentials);
                return context.AzureList;
            }
        }
    }

One thing to note is that whilst technically I could expose the TableServiceContext I’ve used to access Windows Azure Storage directly, I did not do that, following from the guidance that can be found here

Also bear in mind, as these samples often go, this is by no means the best or most efficient way of doing things, but I did want to keep things as simple as possible to focus on the concept rather than lines of code – in a real, production, code I would almost certainly not want to create the Azure TableServiceContext on every call!

The last ‘big’ piece in the puzzle is creating the data service itself – adding a WCF Data Service item to the project adds a handy template in which only the context class and list property are needed to be updated (highlighted in the code below)

public class SomeDataService : DataService<PubsContext>
{
    public static void InitializeService(DataServiceConfiguration config)
    {
        config.SetEntitySetAccessRule("list", EntitySetRights.AllRead);
        config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
    }
}

To get everything working I needed to do a couple more small changes –

I needed to define the DataConnectionString configuration setting in the csdef file and add the connection string value pointing at my Azure Storage (or the local emulator), this is easily done through the Visual Studio UI.

Last – I needed to put the code to initialise set the configuration setting publisher in the Global.asax’ Application_Start handler, this is pretty standard for any project deployed on Azure –

// This code sets up a handler to update CloudStorageAccount instances when their corresponding
            // configuration settings change in the service configuration file.
            CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
            {
                // Provide the configSetter with the initial value
                configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));
            });

…and voila – calling this service exposed the information from Windows Azure as a basic oData feed, easily consumable from PowerPivot  –

image

One last thing to bear in mind, of course, is that I kept my service completely open for anonymous access, which you’d probably not want to do in real life, but as this is now a standard WCF Data Service than the normal configuration applies, and PowerPivot will support both SSPI and basic authentication)

Error in WCF Data Service

I’ve been working on a small demo for a customer involving a WCF Data Service (blog post to come, hopefully) when my newly developed, and very simple, service insisted on returning – “The server encountered an error processing the request. See server logs for more details.”

This happened before any of my code executed directly, which buffeled me for a short bit, but then adding the [System.ServiceModel.ServiceBehavior(IncludeExceptionDetailInFaults = true)] attribute to the data service class revealed a little bit more detail –

The server encountered an error processing the request. The exception message is ‘On data context type ‘AzurePubsContext’, there is a top IQueryable property ‘list’ whose element type is not an entity type. Make sure that the IQueryable property is of entity type or specify the IgnoreProperties attribute on the data context type to ignore this property.’

My AzurePubsContext class did not have any public properties that do not return IQueryable, so this was not pointing directly at the problem, but it was clear that I have an issue with the entity I was using.

I was trying to represent pub’s roysched table, migrated to Azure Table Storage and I suspect my problem was that there was no obvious key the Entity Framework could use.

Specifying the [DataServiceKey(“PartitionKey”,”RowKey”)] attribute on my entity class sorted this quickly enough.

An SLA Detente?

The latest ‘cloud computing’ newsletter pointed at Alistair Croll’s Top 12 Cloud Trends Of 2012 which makes an interesting read; particularly interesting, I found, was his ‘Trend No 8: An SLA Detente’ (I had to look Datente up, turns out it means relaxation…).

In his article Alistair suggests, and I paraphrase, that customers’ expectations from the SLAs provided by cloud vendors are unrealistic and he suggest the point that Car makers don’t provide insurance as part of the deal to purchase the car, but rather customers buy insurances they deem necessary separately.

This had got me thinking – the man does have a point. to a point.

I would suggest that analogy stretches further, and actually taking about the warranty is a better viewpoint – Consider you’re a logistics business in need of a fleet – you would make your research and settle on a car from a maker with a good track record of reliability and service.

You might pay premium for these, but you asses the risk and suggest that buying a truck from a reliable manufacturer is worth more than buying one from a less robust maker; I don’t think anybody believes it is possible to buy a truck which ‘5 nines availability, and even businesses that relay on these accept a certain amount of ‘down time’; of course we expect a good maker to have a good service network, that will see our truck fixed in the shortest amount of time possible (and at the first attempt), and we would almost certainly expect a courtesy car/van/truck whilst ours is in the garage, but if it takes half a day to sort this out, in the main, we accept the fact.

Now – I don’t suggest that cloud platform are equivalent to cars and certainly when thinking of the Windows Azure platform I would think closer to the space shuttle – with tons of redundancy and a lot of ‘big brains’ behind it, and so – the chances for failures are indeed much smaller than those of a car or a truck, and recovery is much faster, and certainly at large IT operations in general we tend to aim for the mythical 100% availability and try to hit that 5 nines promise, largely because we believe we can, but I do think I agree with Alistair that organisations increasingly feel more than comfortable with the SLAs on offer, accepting the cost/benefit analysis behind it and – more importantly – accepting that in most cases they could not have achieved better or even similar in most cases!)

I also agree, and would like to emphasise Alistair’s last point on this topic which is that cloud vendors, and certainly the Windows Azure platform, provide a lot of capabilities that allow solutions respond to any issues that emerge and so building high availability solutions is actually much easier on Windows Azure than it is on-premises, or – as he had put it ”In 2012, we’ll realize that the providers have been trying to tell us something: You can have any SLA you want, as long as you code it yourself and find a way to turn risk into economic value.”

%d bloggers like this: