Fun with QtWebKit HTML5 Video

Mar 31 2013 Published by under Open Source, Qt

Solving the QtWebKit HTML5 Video DirectShow Problem

A while back I was given the task of fixing the problems that our development team was having with playing H.264 or WebM video on Windows in a QWebView widget using the HTML5 <video> tag. The application in question is a hybrid of a traditional desktop application and a web-based application, and there is a need to be able to use the HTML5 video capabilities of WebKit in one of the application’s components, in order to deliver some training content.

Had I known how big of a headache it was going to be, I may have just set my hair on fire and run screaming from the room at the beginning. But instead, being blissfully ignorant, I said, “Sure, I can take a look at that.” Oh, by the way, this was my very first day of work at Enthought, (within the first hour or so if I remember correctly) and also the first time I had dived deeply in to the Qt and PySide code. It was also the first time I had worked with the DirectShow API and the first time in many years that I had worked much with COM. Yeah, I don’t think setting my hair on fire and screaming would have made a big enough statement.

Anyway, the purpose of this article is to discuss some of the problems I ran into and to show you how I solved or worked around them. On that first day I got a few high level reports of the problems people were having. I heard things like:

  • It doesn’t work at all on some computers and sorta works on others.
  • When it works it crashes upon reloading the page.
  • It’s probably just a codec problem.
  • I don’t think it can ever work because of _____________.
  • It should work out of the box because of _____________.
  • Etc.

First things first

So the first thing I do of course it is ask the smartest guy I know about it, Google. I uncovered many tales of woe from people experiencing the same problems, hopes that it would be fixed in Qt 5, disappointments in discovering that it wasn’t. And here and there a few little tidbits and clues about how to make things work.

Background

One of the components of the Qt toolkit is the Phonon library, which provides various classes related to streaming and playing media, and until recently QtWebKit used Phonon to embed media players in web pages. It was decided that the Phonon API was a higher level API than many multimedia applications would need, so the QtMultiMediaKit API was started as a lower-level replacement and Phonon was deprecated. The Windows QtWebKit code was ported to use QtMultiMediaKit instead of Phonon and with the Qt 4.8 release it no longer uses the Phonon back-end.

However, at about the same time the transition of Qt from Nokia to Digia happened and the development of QtMultiMediaKit (as part of the qt-mobility libraries and plugins) was paused in a not quite completed state, and so it hasn’t been fully incorporated into the Qt distribution yet. So this means that out of the box QtWebKit is not able to play HTML5 media on Windows, because the code for the multimedia plugin it is expecting to use is not included. QtWebKit’s HTML5 media features on Windows is basically caught in a gap between past and future technologies. I believe that QtWebKit is still using Phonon on the other platforms.

How to build Qt + qt-mobility

So the first thing that needed to be done to solve this problem is to figure out how to build Qt plus the qt-mobility libraries, such that QtWebKit is able to use the multimedia plugins for displaying video. I quickly found out that this is a classic chicken-and-egg problem because qt-mobility needs an existing Qt build to be able to use the classes that it provides, but the Qt build also needs to have qt-mobility present so that it knows to include the code that will use the multimedia plugins. So to make this work we will need to use two chickens to get an egg. In other words, we’ll need to build Qt twice.

But first, here are some prerequisites:

  • Since my end goal is to build PySide for Python 2.7, I used MS Visual Studio 2008 as the compiler.
  • A fairly recent Windows SDK is needed, since the one with VS 2008 doesn’t have new enough DirectShow support. So I used version 7.1 of the Windows SDK. I initially went down the rat-hole of trying to install the DirectX SDK to use with the older platform SDK included with VS 2008, but that just caused more trouble and I never got that build fully working. Just use the 7.1 SDK instead and you can avoid wasting a few days like I did.
  • Get and install the OpenSSL library
  • I recommend using JOM instead of nmake for the build, as it is able to parallelize the build steps to take advantage of multiple processor cores if you have them. It is fully nmake compatible, and greatly reduces the time needed to build Qt. (From several hours to around 40 minutes for one of my computers.) I copied it to “nmake.exe” and put it in a location on the PATH that is found before the Visual Studio nmake.
  • Some parts of the configuration process need a working Perl interpreter. I had troubles getting it to work with the cygwin perl that I already had installed, so I installed Strawberry Perl instead. Make sure it is found first on the PATH.

The next step is configuring and building Qt. To save time you can tell it to skip building the WebKit components for this first build, and then turn it back on for the second build after qt-mobility has been built.

Follow the regular Qt build instructions for setting up the environment and such. For example I set QTDIR to the root of the Qt source tree, added $QTDIR/bin to the PATH, and set QMAKESPEC to win32-msvc2008. I used the stock qt-everywhere-opensource-src-4.8.4 tarball.

Next go to your Qt source tree and run configure, followed by running nmake. I’m using a cygwin bash shell, so if you are using a stock Windows cmd.exe shell or something else then you may have to adapt a few things. Here is how I run configure:

./configure.exe -release -opensource -platform $QMAKESPEC \
       -qt-zlib -qt-libpng -qt-libmng -qt-libtiff -qt-libjpeg \
       -openssl -I ${OPENSSL_BASE}include -L ${OPENSSL_BASE}lib \
       -nomake demos -nomake examples \
       -no-webkit
nmake

The next step is to build qt-mobility. You can fetch the code from the project’s git repository at http://qt.gitorious.org/qt-mobility. I run configure in the qt-mobility source tree like this:

cmd.exe /c configure.bat -prefix $QTDIR -no-wmf -release \
       -modules "sensors multimedia"
nmake
nmake install

Let’s break that down a little. I used the $QTDIR value as the prefix so when the qt-mobility libraries are installed they will be in the same place as the Qt libraries, which means that it’s easier for the Qt code and other applications as they do not have to do anything extra to find them. On the other hand, it clutters up the Qt tree a little and when doing a “nmake clean” there I have to clean up qt-mobility stuff and a few other things by hand.

I used the -no-wmf flag (No Windows Media Framework) because we need our application to work on XP and WMF is not available there. Plus, although WMF is supposed to be “the future” it isn’t all there yet and DirectShow is still more capable.

The -modules flag tells configure to set up the build for only the sensors and multimedia components of qt-mobility. Those are the only modules needed for the QtMultiMediaKit library and plugins that we want.

Finally, we run “nmake” followed by “nmake install”.

The 2nd chicken

The final step is to reconfigure and rebuild Qt. Just run the same configure command as before, substituting “-webkit” in place of “-no-webkit” and then run nmake again.

Bugs

But wait, there’s more!

At this point we were able to play WebM and H.264 video in a QWebView widget using the HTML5 video tag, but it was still crashing hard when reloading the page, or when navigating a page or two away from the page with the video. Not good.

After much debugging, experimenting, rebuilding and cursing I found the problem. I won’t go into a complete explanation of DirectShow here, to be informative enough it would have to be a huge amount of text and this article is already too long. The nutshell version of the pertinent bits is that DirectShow constructs a “graph” of “filter components” that are able to take the media stream as input, split it into audio and video streams if necessary, and run it through various transformation components until it is able to provide whatever format that the output devices require. Depending on the DirectShow components that are installed, the format of the source, and the needs of the output, then very different graphs can be constructed. Here are a couple simple ones that I was working with:

  

DirectShow components, like most of COM, uses a reference counting pattern to manage the life-cycle of the components. Every time some other component wants to hold on to a reference to something they increment the reference count, and then when they are done with it they release their reference and the count is decremented. When the reference count reaches zero then the object deletes itself.

I found that one of the DirectShow components in QtMultiMediaKit was not following that pattern, and it was being explicitly deleted from the class that created it. Via some debugging code I found that the reference count of this component, the VideoSurfaceFilter class shown as “VideoOutput” in the graphs above, was still around 3 or 4 when that deletion happened. That meant that although that class instance was gone, there were still 3 or 4 other components that thought that it still existed. When QtWebKit cleaned up the resources for that page when it was reloaded or after the next page was loaded then that filter graph was released and one of those other components tried to access the now invalid VideoSurfaceFilter and the application crashed.

My qt-mobility changes fixing this have been submitted to the Qt bug tracker and you can see it here: https://bugreports.qt-project.org/browse/QTMOBILITY-2091

For the record, here is a simple little application I used for testing:

import sys
from PySide import QtGui, QtWebKit
TESTURL = "http://camendesign.com/code/video_for_everybody/test.html"
app = QtGui.QApplication(sys.argv)
QtWebKit.QWebSettings.globalSettings().setAttribute(
         QtWebKit.QWebSettings.PluginsEnabled, True)
view = QtWebKit.QWebView()
url = sys.argv[1] if len(sys.argv) > 1 else TESTURL
view.load(url)
view.show()
app.exec_()

Working Codecs

But wait! There is still more!

As alluded to above, DirectShow is not an all-in-one solution. It is a collection of components that are plugged in to a filter graph, where each can provide just one part of the transformation of the media’s ones and zeros into audio sound waves and dancing pixels on the screen. And as with any collection, more components from various 3rd-party sources can be added to the collection. These other components can enhance existing capabilities, or even add new capabilities to the system.

For example, out of the box Windows is not able to decode and render WebM media. It is able to decode and render H.264 audio/video streams, but it doesn’t know how to split those streams out from the typical container formats used today, such as .MP4 files. By installing and registering some 3rd-party DirectShow filters then functionality gaps such as these can be filled.

For our application we want to be able to include a set of DirectShow filters with our installer, so we can be sure that our customers have at least basic functionality on their systems and that our application can work out of the box. In order to do that we needed something with a permissive license, and the OpenCodecs package from Xiph.org fit the bill. They provide filters that can handle WebM video streams and use a BSD-style license so we can distribute it without risk of GPL infection. It still has some issues though.

The filter pack that I experimented with and had the best results with was the LAV filters from https://code.google.com/p/lavfilters/. It is able to transform the video streams directly to the input format required by qt-mobility’s VideoSurfaceFilter, so less transformation steps are required. Compare the two graphs above to see the difference. As nearly any engineer will tell you, fewer transformations of anything is almost always better than more transformations. However, it is licensed using the GPL and we felt that it is still too much of a grey area for us to distribute it along with a non-GPL’d application. Since we don’t link to it directly and it is only accessed via operating system services then using it at runtime is fine, however distributing it as part of our installer such that it looks like it is part of the whole thing would probably ring a bell somewhere and start some lawyers salivating. But we will be suggesting that our users install it themselves if they experience problems.

Conclusion

Yes, there is still even more, including some other changes and enhancements that I’ve been making to other related projects along the way. But I won’t go into details here. If they become significant enough I’ll probably write another blog post or two.

To conclude this article I’d just like to mention that despite the problems I’ve been dealing with I have been very impressed with the Qt source code and its capabilities. I can tell that a lot of thought went in to the design and implementation, and I look forward to being able to contribute more to it and also to PySide.

4 responses so far

Leave a Reply

Featuring Advanced Search Functions plugin by YD