Bug 56206 - Multithreaded H264 video decoding ineffective on Windows
Summary: Multithreaded H264 video decoding ineffective on Windows
Status: NEW
Alias: None
Product: GStreamer SDK
Classification: Unclassified
Component: General (show other bugs)
Version: 2013.6
Hardware: All All
: medium normal
Assignee: bugs
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-19 21:35 UTC by Mario Kleiner
Modified: 2013-07-11 01:05 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Fixes error from git pre-commit hook style checker. (1.05 KB, patch)
2013-01-17 00:37 UTC, Mario Kleiner
Details | Splinter Review
Add new property "thread-types" for decoding method. (3.96 KB, patch)
2013-01-17 00:39 UTC, Mario Kleiner
Details | Splinter Review
Add new property "thread-types" for decoding method (v2) (4.32 KB, patch)
2013-01-17 20:00 UTC, Mario Kleiner
Details | Splinter Review

Description Mario Kleiner 2012-10-19 21:35:46 UTC
Hi,

when playing back video with GStreamer, my software sets the 'max-threads' option of any video codec that supports it to zero (=auto) or a value > 1. This works perfectly fine on Linux or OSX when tested with the ffdec_h264 video codec. Both the number of threads created and the performance of video decoding / cpu utilization scales with the number of threads. E.g., on a 8-core system, settings max-threads to 0 or 8 will use 8 threads for the 8 cores, put about 90% load on each core and provide a speedup of a factor of 5-6, compared to the single-threaded default setting of max-threads = 1.

When i try to execute exactly the same code on the same videofile and hardware with the Windows SDK (all releases 2005.5, 2005.7, 2005.9, both 32-Bit and 64-Bit), the number of threads increases as on Linux or OSX, but cpu utilization stays low, apparently only loading 1 core, and performance stays exactly the same as in the single threaded case.

So something on Windows makes the threading totally ineffective, as if all threads would serialize their execution on some shared/contended resource. I also tried the same with ffmpeg.exe and vlc, which both use the ffmpeg h264 decoder. ffmpeg.exe showed the same problem, whereas vlc seems to be able to utilize all cores.

It looks like some low-level detail of how the h264 decoder is set up or similar.

Any ideas?

thanks,
-mario
Comment 1 Andoni Morales Alastruey 2012-10-23 23:47:43 UTC
The decoder is setup in the same way for all platforms using thread_type=FF_THREAD_SLICE and thread_count=max_threads and this issue can be reproduced in all platforms too, including linux.

The problem here is that we are not compiling libav with hardware acceleration support. We will try to fix that for the next release.
Comment 2 Andoni Morales Alastruey 2012-10-24 00:22:47 UTC
(In reply to comment #1)
> The problem here is that we are not compiling libav with hardware
> acceleration support. We will try to fix that for the next release.
Forget this comment about hardware acceleration
Comment 3 Andoni Morales Alastruey 2012-10-24 00:49:26 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > The problem here is that we are not compiling libav with hardware
> > acceleration support. We will try to fix that for the next release.
> Forget this comment about hardware acceleration

I believe that the system's gst-ffmpeg was not explicitely setting FF_THREAD_SLICE and therefore using  FF_THREAD_FRAME too, which was changed afterwards because it was not stable enough. Framed multithreaded decoding scales much better in threads compared to sliced decoding, which only allow multiple threads to decode a single frame.

Anyway, I think we should enable hardware accelerated decoders in gst-ffmpeg
Comment 4 Mario Kleiner 2012-10-28 18:55:45 UTC
Oh, i'm a fool! You're right, they behave the same. I tricked myself into thinking it was a Windows only problem.

This is what happened:

1. I didn't test the SDK on Linux, but used the gstreamer packages that come with Ubuntu 12.04 LTS and 12.10. Those show very good multi-threading performance at least with ffdec_h264. I saw large speedups when testing on dual-core, quad-core and 8-core machines, e.g., around 5-6 fold on the 8-core machine. This was both with standard HD material (1920 x 1080p) and with special footage we use (1920 x 2160 p "top-bottom" packed stereo movies).

2. I used the SDK on Windows, with no measureable speedup, as reported.

3. On OSX i was testing with both the SDK and with a GStreamer installation from the Homebrew package manager, compiled from source. I switched between both for comparison and mixed up the results! The Homebrew version shows good multi-threading speedup, the GStreamer SDK doesn't.

So the multi-threading definitely helps a lot if set up properly. Whatever settings current Ubuntu uses for their packages, they are very effective.

Just for context: Our OSS app is used for scientific experiments and has slightly exotic requirements, e.g., that video footage needs to be presented at 1920x2160p or higher resolution at >= 100 frames per second, with very precise control over presentation timing. We can't use gpu based video decoding, as our material exceeds the specs of current gpu's video decoders, e.g., on NVidia cards. Also their performance would be insufficient for us, and we need the video buffers intermediately in host memory, so cpu decoding with multi-threading is what we need. It works well on Ubuntu with their gstreamer packages, but we'd love to be able to do the same via the SDK on other platforms.

Thanks a lot,
-mario
Comment 5 Mario Kleiner 2012-11-30 15:00:37 UTC
Retested with GStreamer SDK 2012.11. Multi-threading is still completely ineffective, despite the expected number of threads being created. This confirmed on OSX 10.7.5 and Windows-7 with the 2012.11 SDK and the ffdec_h264 decoder.

The GStreamer builds distributed by the OSX Homebrew project and with Linux distributions like Ubuntu 11.10, 12.04, 12.10 provide highly effective multi-threading under the same conditions.

thanks,
-mario
Comment 6 Andoni Morales Alastruey 2012-12-01 17:27:19 UTC
Frame-based multithreading was disabled upstream because it causes corruptions in some clips:
http://cgit.freedesktop.org/gstreamer/gst-ffmpeg/commit/?h=0.10&id=57c7f592689ea9110c4d6ec3b3090bbe1649eea3

For this release we support now re-building a single project with the SDK:
http://docs.gstreamer.com/display/GstSDK/Building+from+source+using+Cerbero

You can now rebuild gst-ffmpeg reverting this commit and check if it works properly for you with your clips.
Comment 7 Mario Kleiner 2012-12-04 16:55:51 UTC
(In reply to comment #6)
> Frame-based multithreading was disabled upstream because it causes
> corruptions in some clips:
> http://cgit.freedesktop.org/gstreamer/gst-ffmpeg/commit/?h=0.
> 10&id=57c7f592689ea9110c4d6ec3b3090bbe1649eea3
> 
> For this release we support now re-building a single project with the SDK:
> http://docs.gstreamer.com/display/GstSDK/Building+from+source+using+Cerbero
> 
> You can now rebuild gst-ffmpeg reverting this commit and check if it works
> properly for you with your clips.

Ok. I'll try. The Homebrew source-code version which shows good multi-threading performance on MacOSX doesn't have that commit applied. I do remember there was one clip that showed some corruption, although the majority worked flawless. That's why i have "max-threads" == 1 by default and allow to opt into multi-threading depending on application. What confuses me a bit is that before the commit you mentioned, there was a commit end of 2011 which makes "max-threads" default to 0 == auto, but the 2012.11 SDK reports a default of 1 == single-threaded if i gst-inspect-0.10 ffdec_h264, so apparently that commit was somehow omitted in the SDK build? Is there some special branch or tag that marks which commits exactly go into a specific SDK release?

But i don't want to maintain my own forks/builds of GStreamer, that would be a nightmare, so i'll try to asap prepare an upstream patch to add a new property which allows to configure if one wants slice, frame , frame + slice threading and defaults to slice threading only, like now. Does a property name of "thread-types" sound good? Is the SDK build from the latest head of the 0.10 branch?

Thanks,
-mario
Comment 8 Andoni Morales Alastruey 2012-12-04 18:03:30 UTC
A new property to select the threading model defaulting to slice-based threading sounds good to me.
For the SDK we use the following repositories:
http://cgit.freedesktop.org/gstreamer-sdk/

All the work in the this branch is upstreamed so there shouldn't be any difference between the 2 branches.
Comment 9 Sebastian Dröge (slomo) 2012-12-10 11:00:31 UTC
(In reply to comment #8)
> A new property to select the threading model defaulting to slice-based
> threading sounds good to me.
> For the SDK we use the following repositories:
> http://cgit.freedesktop.org/gstreamer-sdk/
> 
> All the work in the this branch is upstreamed so there shouldn't be any
> difference between the 2 branches.

There are, there are many more changes in the upstream 0.10 branch.


Anyway, a property for selecting the threading model would be a bit useful, yes. The only problem with that is that it can't be used in a useful way in decodebin/playbin scenarios.
Ideally the best threading model should be chosen automatically. Does anybody know the advantages and disadvantages of the different models?
Comment 10 Andoni Morales Alastruey 2012-12-10 11:28:04 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > A new property to select the threading model defaulting to slice-based
> > threading sounds good to me.
> > For the SDK we use the following repositories:
> > http://cgit.freedesktop.org/gstreamer-sdk/
> > 
> > All the work in the this branch is upstreamed so there shouldn't be any
> > difference between the 2 branches.
> 
> There are, there are many more changes in the upstream 0.10 branch.
> 
> 
> Anyway, a property for selecting the threading model would be a bit useful,
> yes. The only problem with that is that it can't be used in a useful way in
> decodebin/playbin scenarios.

But that's already an issue since the default max-threads property is 1 and therefore multi-threaded decoding is disabled by default. If your application want to use multithreaded decoding it should set a second property instead of one.

The threading model to use might vary depending on the use case. Frame based multithreading usually adds a bigger latency (around one frame per thread) but scales much better over threads. Also from what I have read in libav's ML frame-based multithreading does not support resolution changes in the stream.
Comment 11 Mario Kleiner 2013-01-17 00:37:45 UTC
Created attachment 73172 [details] [review]
Fixes error from git pre-commit hook style checker.
Comment 12 Mario Kleiner 2013-01-17 00:39:23 UTC
Created attachment 73174 [details] [review]
Add new property "thread-types" for decoding method.

Adds a new property to gst-ffmpeg plugin: "thread-types" allows selection of multi-threaded decoding method between: slice threading, frame threading or both. The property defaults to slice threading, which was the hard-coded setting in the past.
Comment 13 Mario Kleiner 2013-01-17 00:51:27 UTC
Ok, as the attached patches say, this should do the trick. Patches are against gst-ffmpeg from the sdk-0.10.13 branch of the gstreamer-sdk repository. Tested on a OSX 10.7 system. Me and my users would be very happy if this could be merged for the next SDK release.

Will this get upstreamed by you, or would i have to send separate patches to upstream?

FWIW my quick testing with the patch for the ffdec_h264 codec with different H264 videos showed:

Works "well" for all files with current default of FF_THREAD_SLICE. However i couldn't find a file where i could get any speedup in multi-threaded playback vs. single-thread. So it is a safe but ineffective choice.

Using FF_THREAD_FRAME or both frame + slice threading gave different results on different files. On all files i had a good speedup as soon as frame threading was in use, as expected. On some files, i had playback artifacts for a few seconds after start of playback, but then artifact free playback for the remainder of the movies. On other files, e.g., HD movie trailers, i didn't have any artifacts.

So at least as far as my testing with H264 playback goes, frame threading is not a safe choice for all movies, but a very effective choice for the movies with which it works.

Thanks,
-mario
Comment 14 Sebastian Dröge (slomo) 2013-01-17 10:01:03 UTC
Comment on attachment 73174 [details] [review]
Add new property "thread-types" for decoding method.

Review of attachment 73174 [details] [review]:
-----------------------------------------------------------------

::: ext/ffmpeg/gstffmpegdec.c
@@ +302,5 @@
> +
> +  if (!ffmpegdec_thread_types_type) {
> +    static const GEnumValue ffmpegdec_thread_types[] = {
> +      {FF_THREAD_FRAME, "1", "Frame"},
> +      {FF_THREAD_SLICE, "2", "Slice"},

This should be
{FF_THREAD_SLICE, "Slice", "slice"}  (and equivalent for frame)

@@ +309,5 @@
> +    };
> +
> +    ffmpegdec_thread_types_type =
> +        g_enum_register_static ("GstFFMpegDecThreadTypes",
> +        ffmpegdec_thread_types);

Use a GFlags type here and drop the frame|slice value above
Comment 15 Sebastian Dröge (slomo) 2013-01-17 10:09:20 UTC
(In reply to comment #13)

> Will this get upstreamed by you, or would i have to send separate patches to
> upstream?

We will upstream it, also if upstream does not accept it, it will also not be part of the SDK.
Comment 16 Mario Kleiner 2013-01-17 20:00:15 UTC
Created attachment 73197 [details] [review]
Add new property "thread-types" for decoding method (v2)

Changes in v2 as recommended by review of Sebastian Dröge:

* Use GstFlags instead of GstEnum for property.
* Fix name, nick-name of the flags.

Also:

* Update commit message with description of use case. Profiling showed that, e.g, ffdec_h264 does not support the default slice threading and falls back to single-threaded decode if FF_THREAD_FRAME is not enabled. Also useful because different performance vs. latency tradeoffs exist for pure playback clients vs. streaming clients, vs. video-conferencing apps.
Comment 17 Mario Kleiner 2013-01-17 20:06:19 UTC
Btw. i think the artifacts i observed on some of my H264 videos pretty much match the description reported in bug #57923 - "h264parse of GStreamer SDK behaves differently from vanilla one."

For me, playing the same video in GStreamer SDK on OSX with frame threading shows the artifacts, but doing the same on Ubuntu Linux 12.04 with the systems default GStreamer packages didn't show any artifacts or other problems.

So maybe similar cause, just that the video plays fine on the SDK in single threading but has a few seconds of trash at the beginning of a playback session in multi-threaded mode.
Comment 18 Olivier Crête 2013-03-25 21:32:23 UTC
1. For low latency applications (such as video calls), one must use the slice mode as we want more or less zero latency. It would be amazing to be able to communicate latency requirements throughout the pipeline somehow.

2. Is the corruption still happening in newer libav/ffmpeg versions? I noticed the SDK is using a pretty old snapshot.

3. To set the properties in a decodebin-like scenario, maybe we should but the FsElementAddedNotifier from Farstream in the core (to be able to set properties on arbitrary subelements).
Comment 19 Mario Kleiner 2013-07-03 17:00:54 UTC
Hi, the problem persists in the 2013.6 SDK. Is there some hope that my patch, which resolves the problem and was reviewed and cleaned up, gets moved forward?

thanks,
-mario
Comment 20 Mario Kleiner 2013-07-11 01:05:11 UTC
To answer my own question: This upstream commit by Sebastian Droege...

http://cgit.freedesktop.org/gstreamer/gst-ffmpeg/commit/?id=2d2c9b1aac6f2fa3a1a7c8a9ed46b76cefe228c8

.. obsoletes my patches and should solve my problems :)

Could you please merge that commit into the next SDK release?

Thanks,
-mario


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.