Manipulating data in Windows Azure Table using PowerPivot

A customer asked yesterday a very good question – is it possible to use PowerPivot analyse data stored in Windows Azure Table Storage.

Given that Table Storage is exposed through oData, which is, as you’d expect, a data source fully supported by PowerPivot, my gut feeling was that this should not be a problem and, in fact, a very interesting scenario – many customers use Windows Azure Storage to store vast quantities of data, given how cost-effective it is and the scale that is possible, but ultimately data is there to be used, and PowerPivot is an amazing tool to process data – the two together make a powerful pair.

Looking at it closer, though, I stumbled into a small hurdle – whilst PowerPivot had support for working with data from Azure DataMarket out of the box for some time now and it supports working with oData feeds, I don’t believe it supports, currently, working with Azure Table Storage directly, the stumbling block being the ShareKey authentication mechanism.

However, this is too useful to give up, so I looked at a workaround, and the most obvious one was to take the man-in-the-middle approach and to publish a WCF Data Service onto an Azure Web Role (doesn’t have to be, of course, but makes perfect sense), which would expose a ‘standard’ oData feed to be consumed by PowerPivot and would get the data from the Table Storage. simples.

To do that I needed some data in Azure Tables and so I decided to use Cerebrata’s Cloud Storage Studio to upload the pubs database to Window Azure Storage – quite a cool and useful feature of their product if you ask me!
(Right click on the ‘Table’s’ node, choose ‘Upload Relational Database’ and follow the steps in the short wizard)

I decided to publish data from the roysched table, only because it had the most rows in it; to do that I create a class that represented a roysched entity –

[DataServiceKey("RowKey", "PartitionKey")]
    public class roysched
        public string PartitionKey {get;set;}
        public string RowKey {get;set;}
        public DateTime Timestamp {get;set;}
        public string title_ID {get;set;}
        public int lorange {get;set;}
        public int hirange {get;set;}
        public int royalty {get;set;}

                public roysched()

        public roysched(string partitionKey, string rowKey, DateTime timestamp, string titleId, int lorange, int hirange, int royalty)
            PartitionKey = partitionKey;
            RowKey = rowKey;
            Timestamp = timestamp;
            title_ID = titleId;
            this.lorange = lorange;
            this.hirange = hirange;
            this.royalty = royalty;

You will notice the DataServiceKey attribute I’ve added – this is needed for the Entity Framework to figure out which fields (or combination of keys, as is the case here) can be used as the identity of the entity as I’ve blogged here

With that done I needed to create a context class to be used by the WCF Data Service, this class will read data from Azure and ‘re-publish’ data as the feed behind the data, this is where the majority of the logic would generally go, but as you can expect I’ve kept this to the minimum for the purpose of this demonstration.

public class PubsContext
        public IQueryable<roysched> list
                var account = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
                var context = new AzurePubsContext(account.TableEndpoint.ToString(), account.Credentials);
                return context.AzureList;

One thing to note is that whilst technically I could expose the TableServiceContext I’ve used to access Windows Azure Storage directly, I did not do that, following from the guidance that can be found here

Also bear in mind, as these samples often go, this is by no means the best or most efficient way of doing things, but I did want to keep things as simple as possible to focus on the concept rather than lines of code – in a real, production, code I would almost certainly not want to create the Azure TableServiceContext on every call!

The last ‘big’ piece in the puzzle is creating the data service itself – adding a WCF Data Service item to the project adds a handy template in which only the context class and list property are needed to be updated (highlighted in the code below)

public class SomeDataService : DataService<PubsContext>
    public static void InitializeService(DataServiceConfiguration config)
        config.SetEntitySetAccessRule("list", EntitySetRights.AllRead);
        config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;

To get everything working I needed to do a couple more small changes –

I needed to define the DataConnectionString configuration setting in the csdef file and add the connection string value pointing at my Azure Storage (or the local emulator), this is easily done through the Visual Studio UI.

Last – I needed to put the code to initialise set the configuration setting publisher in the Global.asax’ Application_Start handler, this is pretty standard for any project deployed on Azure –

// This code sets up a handler to update CloudStorageAccount instances when their corresponding
            // configuration settings change in the service configuration file.
            CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
                // Provide the configSetter with the initial value

…and voila – calling this service exposed the information from Windows Azure as a basic oData feed, easily consumable from PowerPivot  –


One last thing to bear in mind, of course, is that I kept my service completely open for anonymous access, which you’d probably not want to do in real life, but as this is now a standard WCF Data Service than the normal configuration applies, and PowerPivot will support both SSPI and basic authentication)

%d bloggers like this: