Manipulating data in Windows Azure Table using PowerPivot

A customer asked yesterday a very good question – is it possible to use PowerPivot analyse data stored in Windows Azure Table Storage.

Given that Table Storage is exposed through oData, which is, as you’d expect, a data source fully supported by PowerPivot, my gut feeling was that this should not be a problem and, in fact, a very interesting scenario – many customers use Windows Azure Storage to store vast quantities of data, given how cost-effective it is and the scale that is possible, but ultimately data is there to be used, and PowerPivot is an amazing tool to process data – the two together make a powerful pair.

Looking at it closer, though, I stumbled into a small hurdle – whilst PowerPivot had support for working with data from Azure DataMarket out of the box for some time now and it supports working with oData feeds, I don’t believe it supports, currently, working with Azure Table Storage directly, the stumbling block being the ShareKey authentication mechanism.

However, this is too useful to give up, so I looked at a workaround, and the most obvious one was to take the man-in-the-middle approach and to publish a WCF Data Service onto an Azure Web Role (doesn’t have to be, of course, but makes perfect sense), which would expose a ‘standard’ oData feed to be consumed by PowerPivot and would get the data from the Table Storage. simples.

To do that I needed some data in Azure Tables and so I decided to use Cerebrata’s Cloud Storage Studio to upload the pubs database to Window Azure Storage – quite a cool and useful feature of their product if you ask me!
(Right click on the ‘Table’s’ node, choose ‘Upload Relational Database’ and follow the steps in the short wizard)

I decided to publish data from the roysched table, only because it had the most rows in it; to do that I create a class that represented a roysched entity –

[DataServiceKey("RowKey", "PartitionKey")]
    public class roysched
    {
        public string PartitionKey {get;set;}
        public string RowKey {get;set;}
        public DateTime Timestamp {get;set;}
        public string title_ID {get;set;}
        public int lorange {get;set;}
        public int hirange {get;set;}
        public int royalty {get;set;}

                public roysched()
        {
        }

        public roysched(string partitionKey, string rowKey, DateTime timestamp, string titleId, int lorange, int hirange, int royalty)
        {
            PartitionKey = partitionKey;
            RowKey = rowKey;
            Timestamp = timestamp;
            title_ID = titleId;
            this.lorange = lorange;
            this.hirange = hirange;
            this.royalty = royalty;
        }
    }

You will notice the DataServiceKey attribute I’ve added – this is needed for the Entity Framework to figure out which fields (or combination of keys, as is the case here) can be used as the identity of the entity as I’ve blogged here

With that done I needed to create a context class to be used by the WCF Data Service, this class will read data from Azure and ‘re-publish’ data as the feed behind the data, this is where the majority of the logic would generally go, but as you can expect I’ve kept this to the minimum for the purpose of this demonstration.

public class PubsContext
    {
        public IQueryable<roysched> list
        {
            get
            {
                var account = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
                var context = new AzurePubsContext(account.TableEndpoint.ToString(), account.Credentials);
                return context.AzureList;
            }
        }
    }

One thing to note is that whilst technically I could expose the TableServiceContext I’ve used to access Windows Azure Storage directly, I did not do that, following from the guidance that can be found here

Also bear in mind, as these samples often go, this is by no means the best or most efficient way of doing things, but I did want to keep things as simple as possible to focus on the concept rather than lines of code – in a real, production, code I would almost certainly not want to create the Azure TableServiceContext on every call!

The last ‘big’ piece in the puzzle is creating the data service itself – adding a WCF Data Service item to the project adds a handy template in which only the context class and list property are needed to be updated (highlighted in the code below)

public class SomeDataService : DataService<PubsContext>
{
    public static void InitializeService(DataServiceConfiguration config)
    {
        config.SetEntitySetAccessRule("list", EntitySetRights.AllRead);
        config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
    }
}

To get everything working I needed to do a couple more small changes –

I needed to define the DataConnectionString configuration setting in the csdef file and add the connection string value pointing at my Azure Storage (or the local emulator), this is easily done through the Visual Studio UI.

Last – I needed to put the code to initialise set the configuration setting publisher in the Global.asax’ Application_Start handler, this is pretty standard for any project deployed on Azure –

// This code sets up a handler to update CloudStorageAccount instances when their corresponding
            // configuration settings change in the service configuration file.
            CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
            {
                // Provide the configSetter with the initial value
                configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));
            });

…and voila – calling this service exposed the information from Windows Azure as a basic oData feed, easily consumable from PowerPivot  –

image

One last thing to bear in mind, of course, is that I kept my service completely open for anonymous access, which you’d probably not want to do in real life, but as this is now a standard WCF Data Service than the normal configuration applies, and PowerPivot will support both SSPI and basic authentication)

About Yossi Dahan
I work as a cloud solutions architect in the Azure team at Microsoft UK. I spend my days working with customers helping be successful in the cloud with Microsoft Azure.

2 Responses to Manipulating data in Windows Azure Table using PowerPivot

  1. J says:

    I think you might left out the definition of AzurePubsContext

  2. Peter Reid says:

    Interesting, I came across the same problem but went the proxy road to solve it
    http://blog.kloud.com.au/2012/09/19/visualise-azure-table-storage-with-excel-and-fiddler/
    Does this solution handle the paging differences also?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: