Microsoft Azure Media Indexer basics – Part 1

One of the most recent additions to Microsoft Azure Media Services is the Media Indexer which came out of Microsoft Research’s project MAVIS

In short – the Indexer is an Azure Media Services processor that performs speech-to-text on video or audio assets and produces both close-captions files and an Audio Indexer Blob (AIB) file – a binary formatted file containing the extracted text with specific location information and match ranking.

The indexer can be used simply to automate the production of closed captions for audio and video files and this could be further enhanced by leveraging Bing translation to create closed captions in multiple languages.

It does, however, support a much broader use for the enterprise – in addition to the media indexer processor the team had also released an add-on to SQL Server that plugs the AIB file processing to SQL Server’s full-text-search capability allowing clients to query AIB contents, stored as a as varbinary column in a SQL table, for text in the originating spoken content.

Making speech in audio and video files searchable is incredibly powerful, with uses for compliance and auditing (for example in financial services), proactive fraud detection, sentiment analysis for call centres and many more scenarios.

Indexing an asset involves several steps - 

  1. Creation of an input assets in media services
  2. Submitting a job with the indexer processor. This will produce an output asset with multiple files – closed captions, keywords xml and AIB.
  3. Download the output asset’s files
  4. Load the AIB file into a table in SQL Server (configured for full text search with the add-on installed)

With the AIB file loaded clients are ready to run full text search queries against the AIB column as needed.

Optionally the client (or a service wrapping the database, see coming post on that) can use the AIB Retriever class, supplied with the SQL add-on, to further analyse the AIB of a given record (found through full text search) and identify the specific locations in the asset where the queried text was found and the specific matching score for each location.

In this first post I’ll run through the first 3 steps – the basics of running the media indexer. A second post will walk through the configuration of a SQL Server environment to support the search and a third will discuss execution of the searches themselves

    1. Creating an input Asset

    The first step is to create the input asset the job should work on, in line with other Media service scenarios. An example of which can be found here.

    The one thing to note is that the asset could be a media asset or a simple file containing one or more URLs to source content.
    This second option is very interesting as it allows indexing media without having to upload it to the content store first, as long as it is accessible via a URL either anonymously or using basic authentication (a configuration file can be used to provide the credentials, more on this shortly)

    2. Submitting the indexer job

    Submitting a job start with finding the correct processor. For the media indexer the processor name is “Azure Media Indexer” and the following method can return the latest version of it –

     private static IMediaProcessor GetLatestMediaProcessorByName(string mediaProcessorName)
                var processor = _context.MediaProcessors
                                        .Where(p => p.Name == mediaProcessorName)
                                        .OrderBy(p => new Version(p.Version))
                if (processor == null)
                    throw new ArgumentException(string.Format("Unknown media processor", mediaProcessorName));
                return processor;

With a processor at hand a job can be created  –

IJob job = _context.Jobs.Create("Index asset " + assetName);

a task is added to the job, with the indexer processor provided as the first parameter and an optional configuration file provided as the second and the input and output assets are linked –

ITask task = job.Tasks.AddNew("Indexing task", indexer, configuration, TaskOptions.None);
task.OutputAssets.AddNew(assetName + ".Indexed", AssetCreationOptions.None)

The configuration file is discussed here – if empty than the default behaviour will be applied which is to process the first file in the asset provided, but the configuration file can include a list of assets to index and/or the name of an asset which is a text file containing a URL list to process (a simple list of URLs delimited by CR-LF). it can also include the username and password to use to access URLs if basic authentication is needed.

3. Download output assets

Once the job had completed the output asset’s file can be downloaded-

IAsset indexedAsset = _context.Assets.Where(a => a.Name == asset).FirstOrDefault();
            foreach (IAssetFile file in indexedAsset.AssetFiles)
                file.Download(Path.Combine(localFolder, file.Name));

This should result in 4 files downloaded for each input asset –

Two closed caption files, in two different format – ttml and smi, a keywords file – kw.xml and, crucially, the AIB file.

The closed caption files are very useful if you wanted to have CC when playing back the media file (windows media player, for example, will automatically look for and use an smi file with the same name as the original media name) or if you wanted to read the parsed text to assess the quality of the parsing.

The AIB file, however, is where the real power lies, as it holds, in binary format, the parsed text, but including accuracy scoring alongside location information and is what used to drive the full text search capability

In the next post discuss how to prepare SQL server to support loading and searching the AIB contents.

About Yossi Dahan
I work as a cloud solutions architect in the Azure team at Microsoft UK. I spend my days working with customers helping be successful in the cloud with Microsoft Azure.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: