Updated: 03/25/09 11:22:03 AM (GMT+1)

Media Decoder Framework

This page contain my notes on the media decoder framework in Firefox/Fennec. It is written as my own notes, while working on updating the gstreamer decoder integration that was made some time ago.

What is this about?

This is about embedded video and audio on web pages. One way of doing this is to have a media element on the page or more specifically a video element or an audio element.

More specifically it's about playing the embedded media to the user of the browser.

What is a codec?

When you store a video clip, you need to choose a format to store it in. The formats we think about in our daily life might be DVD or Blue-ray, but these discs actually have their data encoded by a codec.

An encoding is the process of transforming the different pixel and audio samples that describe light and air pressure into a data stream.

For most codecs some sort of compression takes place in this process to limit the amount of data that the stream should handle – Describing the how this is done, is out of scope of this page.

The stream coming from the codec is sometimes stored, like on a DVD or a hard-drive, and sometimes it's not stored, only transferred liked with tv or radio broadcasts.

How does it work in a browser?

If the creator of a web-site wants a page to include a video clip that can be seen by the reader of the page, he can include some special tags that can tell the browser which video to play, the size of the video on the screen etc.

When the browser passes these tags and starts to stream the video, it tries to see if it has a codec installed that can handle that specific type of video.

If a matching codec is found it is then used to convert the stream of data sent from the web-server to the browser from the encoded format back into images and audio samples.

What happens in Mozilla code?

(NOTE: The following is my interpretation of what is happening in the code, if it gets “approved” by someone responsible for the code, this comment will be removed)

nsVideoDocument::StartDocumentLoad(..., in nsIChannel, ...)

For one reason or the other a nsVideoDocument::StartDocumentLoad call is made. The StartDocumentLoad function is defined the nsDocument base class, and is one of these function calls where the parameters are not properly described :(

What's important for us is that it comes with a nsIChannel and nsIStreamListener parameter. The nsIChannel is what the stream that we will be getting data from is flowing through. The nsIStreamListern is an “out” parameter and I assume that it on the return of the function will point the stream used to transfer data.

In the nsVideoDocument::StartDocumentLoad function we find the next interesting thing a call to nsVideoDocument::CreateSyntheticVideoDocument here we find the channel and a stream as parameters, the channel is the same one that was given as argument to the function, the stream is one who's origin is somehow related to the nsVideoDocument, but out of scope here.

nsVideoDocument::CreateSyntheticVideoDocument(in nsIChannel, out nsIStreamListener)

Here a “synthetic” document is created, a video element is created, some properties are set (autoplay and controls are set to true). The element is then asked to load its content by calling nsHTMLMediaElement::LoadWithChannel, this call gets the nsIChannel and nsIStreamListener parameters forwarded. The element is then added to the body content of the video document.

nsHTMLMediaElement::LoadWithChannel(in nsIChannel, out nsIStreamListener)

Here the code aborts any existing loads, it initializes the decoder by calling nsHTMLMediaElement::InitializeDecoderForChannel again forwarding the channel and stream listener, finally it dispatches a “loadstart” event to notify who ever is concerned that the loading has started.

nsHTMLMediaElement::InitializeDecoderForChannel(in nsIChannel, out nsIStreamListener)

First and maybe most importantly the mime type of the content is retrieved, and from this the decoder is created by a call to nsHTMLMediaContent::CreateDecoder this implicitly stores the decoder in the mDecoder member, this decoder is then asked to load the content given the channel and listener as parameter by calling the pure virtual function nsMediaDecoder::Load on the decoder – the type of the decoder is determined by the call to CreateDecoder, but the base class of the decoder needs to be nsMediaDecoder.

If the content shouldn't be paused, the Play function is then called on the decoder.

For a description of the GStreamer nsMediaDecoder, follow this link

nsHTMLMediaElement::CreateDecoder(in nsACString)

The in parameter here is the mime type from the stream in the channel.

The way the decoder is determined, is to ask each of the decoders if they can handle the given mime type – we might want to add some logic here so the user can choose which decoder to use, if multiple decoders can handle the mime-type – for the GStreamer the function that determines this is the IsGStreamerType this is a function local to the source file with the implementation of the nsHTMLMediaElement functions – but it isn't a member of the class itself.

If no suitable decoder is found the function returns false.

IsGStreamer(in nsACString)

This function needs to identify the different mime types that the installed GStreamer decoders can handle – TODO: add the code that can do this