This is on an intel + radeon laptop, so I need to run encoding with gstreamer with DRI_PRIME=1. Here is an example video: http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_1mb.mp4 DRI_PRIME is doing a good job of waking up the GPU from runpm when needed for encoding via VAAPI and OMX, but for comparison I'll run glxgears both times. I'm encoding the mentioned example video with VAAPI with this exact command: $ time DRI_PRIME=1 LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 -e filesrc location=big_buck_bunny_720p_1mb.mp4 ! qtdemux ! h264parse ! avdec_h264 ! queue ! videoconvert ! queue ! video/x-raw,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.mkv For low GPU stress I run the gst pipeline while glxgears with vsync is running: $ DRI_PRIME=1 glxgears Result: 0.75s user 0.33s system 2% cpu 52.779 total For higher GPU stress I run the gst pipeline while glxgears without vsync is running: $ DRI_PRIME=1 vblank_mode=0 glxgears Result: 0.99s user 0.28s system 43% cpu 2.928 total I also tried a very similar pipeline with OMX: $ time DRI_PRIME=1 gst-launch-1.0 -e filesrc location=big_buck_bunny_720p_1mb.mp4 ! qtdemux ! h264parse ! avdec_h264 ! queue ! videoconvert ! queue ! video/x-raw,format=NV12 ! omxh264enc ! h264parse ! matroskamux ! filesink location=output.mkv Low GPU stress: 0.96s user 0.24s system 19% cpu 6.298 total High GPU stress: 1.10s user 0.24s system 141% cpu 0.949 total Overall OMX encoding does a lot better, but it's still a large difference and still below "real time" for the 5 second video.
Yeah, that is a known issue. The current VA-API implementation waits for the result after sending a single frame to the hardware. The OpenMAX implementation pipelines the whole thing and waits for a result after sending multiple frames to the hardware to chew on. So with OpenMAX the hardware is always busy, while with VA-API it constantly turns on/off.
So maybe there is also some dpm type issue on your system. Rather than running gears maybe there is somewhere you can force gpu clocks to high. My setup is very different but I would do - echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
I put the issue in DRM/radeon instead of mesa/radeonsi because I thought it would be related to power management. I tried echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level and performance > /sys/class/drm/card1/device/power_dpm_state but it makes no difference, still just as slow.
Good point, but no the problem is clearly in the VA-API state tracker.
Well his omx test is 6x slower as well without load (though the test vid is very short). So I think in addition to to the vaapi issue he is seeing some prime+HD 7970M dpm problem. Though maybe forcing CPUs to high and re-testing would help rule out cpufreq messing things up.
I should open my eyes while reading. Indeed that is way to much to be explained by the VAAPI problems.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1235.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.