Bug 71448 - [UVD] qvdpautest is very slow on radeonsi (HD 7950)
Summary: [UVD] qvdpautest is very slow on radeonsi (HD 7950)
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-09 23:17 UTC by darkbasic
Modified: 2014-02-01 18:07 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (184.16 KB, text/plain)
2013-11-11 11:38 UTC, darkbasic
no flags Details
Fix. (968 bytes, text/plain)
2014-01-30 17:57 UTC, Christian König
no flags Details
dmesg output (70.53 KB, text/plain)
2014-01-30 19:21 UTC, Vladimir Usikov
no flags Details
Xorg log (46.23 KB, text/plain)
2014-01-30 19:22 UTC, Vladimir Usikov
no flags Details

Description darkbasic 2013-11-09 23:17:56 UTC
http://bpaste.net/show/148239/

kernel is 3.13 (~agd5f drm-next-3.13).
The whole graphic stack is from git except xorg-server which is 1.14.3.
Comment 1 Vladimir Usikov 2013-11-10 09:18:27 UTC
ArchLinux x32; kernel 3.12; llvm - svn; mesa - git; xorg-server 1.14.4;
Radeon HD 7950

qvdpautest 0.5.2
AMD Phenom(tm) 9550 Quad-Core Processor
Unknown GPU

VDPAU API version : 1
VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0


MPEG DECODING (1920x1080): 19 frames/s
MPEG DECODING (1280x720): 19 frames/s
H264 DECODING (1920x1080): 15 frames/s
H264 DECODING (1280x720): 16 frames/s
MPEG4 DECODING (1920x1080): 15 frames/s

MIXER WEAVE (1920x1080): 3293 frames/s
MIXER BOB (1920x1080): 3878 fields/s
MIXER TEMPORAL (1920x1080): 3884 fields/s
MIXER TEMPORAL + IVTC (1920x1080): 3881 fields/s
MIXER TEMPORAL + SKIP_CHROMA (1920x1080): 3895 fields/s
MIXER TEMPORAL_SPATIAL (1920x1080): 3881 fields/s
MIXER TEMPORAL_SPATIAL + IVTC (1920x1080): 3885 fields/s
MIXER TEMPORAL_SPATIAL + SKIP_CHROMA (1920x1080): 3888 fields/s
MIXER TEMPORAL_SPATIAL (720x576 video to 1920x1080 display): 3439 fields/s

MULTITHREADED MPEG DECODING (1920x1080): 76 frames/s
MULTITHREADED MIXER TEMPORAL (1920x1080): 3930 fields/s
Comment 2 Alex Deucher 2013-11-10 13:36:32 UTC
Make sure dpm is enabled.  add radeon.dpm=1 to the kernel command line in grub.
Comment 3 Mike Lothian 2013-11-10 13:38:47 UTC
I'm not sure if it's related but make sure your xserver is patched to work with the latest mesa fd1b24a93e ("glx: Add support for the new DRI")
Comment 4 Alex Deucher 2013-11-10 14:01:40 UTC
(In reply to comment #0)
> http://bpaste.net/show/148239/

In the future, please attach the output rather than referring to an external site that may go away at some point.

> 
> kernel is 3.13 (~agd5f drm-next-3.13).
> The whole graphic stack is from git except xorg-server which is 1.14.3.

from your log:
FATAL: get_bits failed : No backend implementation could be loaded.!!    

There's some problem with your build.
Comment 5 Christian König 2013-11-11 08:12:34 UTC
(In reply to comment #4)
> FATAL: get_bits failed : No backend implementation could be loaded.!!    
> 
> There's some problem with your build.

That message is normal, just a function we haven't implemented yet.

But I agree the numbers look like you are on the bootup clocks for UVD/graphics or something is going wrong with dpm.
Comment 6 darkbasic 2013-11-11 10:13:25 UTC
dpm is enabled of course (because I set radeon.dpm=1 and because 3.13 should have dpm enabled by default afaik).
When using UVD with dpm set to auto it switches from the lowest state to the highest lots of times, again and again.
*Anyway* when I did run the attached benchmark I forced dpm to "high" before starting.
Comment 7 darkbasic 2013-11-11 10:17:23 UTC
Myke I'm upgrading to 1.14.4 and patching with glx: Add support for the new DRI loader entrypoint: http://cgit.freedesktop.org/xorg/xserver/commit/?id=7ecfab47eb221dbb996ea6c033348b8eceaeb893
Comment 8 darkbasic 2013-11-11 11:37:46 UTC
I applied 'glx: Add support for the new DRI loader entrypoint' to xorg-server-1.14.4 and I updated the rest of the graphic stack to latest snapshot from git master: nothing changes.

While the test was running I got a "Bus error":

Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 14: reading configurations from ~/.fonts.conf is deprecated.
qvdpautest 0.5.2
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Unknown GPU

VDPAU API version : 1
VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0

FATAL: get_bits failed : No backend implementation could be loaded.!!

MPEG DECODING (1920x1080): 8 frames/s
MPEG DECODING (1280x720): 5 frames/s                                                                                                                                                                                                                                                                                                                                       
Errore di bus


Also I noticed that despite I did "echo high > /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/power_dpm_force_performance_level" I still get lots of power states switching in dmesg.

Please see attached dmesg.

I also noticed lots of "HDMI: ELD buf size is 0, force 128" and "HDMI: invalid ELD data byte 0" in my dmesg. Maybe something audio related? Monitor is attached using DVI, not HDMI.
Comment 9 darkbasic 2013-11-11 11:38:35 UTC
Created attachment 89017 [details]
dmesg

dmesg after running qvdpautest
Comment 10 Vadim Girlin 2013-11-11 11:44:36 UTC
This is probably related to dpm and gpu clocks - if I run "vblank_mode=0 glxgears" in parallel with the benchmark the results are significantly better for me:

w/o gears:

MPEG DECODING (1920x1080): 13 frames/s
MPEG DECODING (1280x720): 13 frames/s
H264 DECODING (1920x1080): 12 frames/s
H264 DECODING (1280x720): 13 frames/s

with gears:

MPEG DECODING (1920x1080): 77 frames/s
MPEG DECODING (1280x720): 118 frames/s
H264 DECODING (1920x1080): 51 frames/s
H264 DECODING (1280x720): 92 frames/s 

(In reply to comment #6)
> *Anyway* when I did run the attached benchmark I forced dpm to "high" before
> starting.

Setting power_dpm_force_performance_level to "high" doesn't really work for me in this case - AFAICS the driver resets it back to "auto" when the benchmark starts, probably when switching to uvd state.
Comment 11 Grigori Goronzy 2013-11-11 12:04:00 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > FATAL: get_bits failed : No backend implementation could be loaded.!!    
> > 
> > There's some problem with your build.
> 
> That message is normal, just a function we haven't implemented yet.
> 

As far as I can see it's actually illegal API usage in qvdpautest. It's trying to read from uninitialized video surfaces, which is not guaranteed to work. Swapping around the order of tests so that it does the PutBits test first fixes it.
Comment 12 Christian König 2013-11-11 12:15:49 UTC
(In reply to comment #8)
> Also I noticed that despite I did "echo high >
> /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/
> power_dpm_force_performance_level" I still get lots of power states
> switching in dmesg.

As Vadim correctly noted forcing any power state doesn't work here, because we need to switch to the UVD power state anyway.

BTW: Is this a regression?
Comment 13 Christian König 2013-11-11 12:17:01 UTC
(In reply to comment #10)
> This is probably related to dpm and gpu clocks - if I run "vblank_mode=0
> glxgears" in parallel with the benchmark the results are significantly
> better for me:
> 
> w/o gears:
> 
> MPEG DECODING (1920x1080): 13 frames/s
> MPEG DECODING (1280x720): 13 frames/s
> H264 DECODING (1920x1080): 12 frames/s
> H264 DECODING (1280x720): 13 frames/s
> 
> with gears:
> 
> MPEG DECODING (1920x1080): 77 frames/s
> MPEG DECODING (1280x720): 118 frames/s
> H264 DECODING (1920x1080): 51 frames/s
> H264 DECODING (1280x720): 92 frames/s 

^^ very valuable comment, thx.

So do I get that right that generation 3D load affects UVD decoding performance here?
Comment 14 Christian König 2013-11-11 12:20:13 UTC
(In reply to comment #11)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > FATAL: get_bits failed : No backend implementation could be loaded.!!    
> > > 
> > > There's some problem with your build.
> > 
> > That message is normal, just a function we haven't implemented yet.
> > 
> 
> As far as I can see it's actually illegal API usage in qvdpautest. It's
> trying to read from uninitialized video surfaces, which is not guaranteed to
> work. Swapping around the order of tests so that it does the PutBits test
> first fixes it.

Thx for the into. qvdpautest is badly written in many aspects (takes to many time, is inaccurate etc...). Would be nice if somebody could sit down and either write something new from scratch or start to improve it.

Some rather stupid command-line tool with a couple of options for testing different decoding profile and output methods should be perfectly sufficient.
Comment 15 darkbasic 2013-11-11 12:42:59 UTC
Here is mine while running glxgears:

MPEG DECODING (1920x1080): 77 frames/s                                                                                                                                                                                                                                                                                                                                     
MPEG DECODING (1280x720): 117 frames/s                                                                                                                                                                                                                                                                                                                                     
H264 DECODING (1920x1080): 16 frames/s                                                                                                                                                                                                                                                                                                                                     
H264 DECODING (1280x720): 91 frames/s                                                                                                                                                                                                                                                                                                                                      
Profile unsupported.                                                                                                                                                                                                                                                                                                                                                       
MPEG4 DECODING (1920x1080): 72 frames/s

No, it isn't a reggression: it never worked for me.
Comment 16 Vadim Girlin 2013-11-11 12:50:14 UTC
(In reply to comment #13)
> So do I get that right that generation 3D load affects UVD decoding
> performance here?

Yes, I think the explanation of the difference in benchmark results is that glxgears triggers higher gpu power level, probably the benchmark alone simply doesn't provide enough load for that or maybe something is wrong with dpm logic. Here is what I see while running the benchmark:

w/o gears:
power level 0    sclk: 45000 mclk: 120000 vddc: 900 vddci: 975 pcie gen: 2
with gears:
power level 2    sclk: 100000 mclk: 120000 vddc: 1219 vddci: 975 pcie gen: 2
uvd clocks are the same in both cases:
uvd    vclk: 72000 dclk: 56000

when the system is completely idle I see the following values:
uvd    vclk: 0 dclk: 0
power level 0    sclk: 30000 mclk: 15000 vddc: 825 vddci: 850 pcie gen: 2
Comment 17 darkbasic 2013-11-11 13:36:21 UTC
Anyway even with glxgears it's far from being perfect. See this video: http://www.youtube.com/watch?v=aM3aRiKgxwM
Comment 18 Alex Deucher 2013-11-11 13:57:53 UTC
With kernel 3.13, the driver retains the user selected performance level across state changes.  Additionally, when using a UVD state, the sclk and mclk are always forced to their highest levels.  This isn't reflected in the debugfs output since that just prints the unpatched power state.  Does plain video playback work ok (i.e., not qvdpautest)?
Comment 19 darkbasic 2013-11-11 14:08:13 UTC
Alex from the tests I did 3.13 doesn't seem to behave the way you described.

With power state set to "high", desktop effect OFF and no glxgears:
MPEG DECODING (1920x1080): 5 frames/s
MPEG DECODING (1280x720): 21 frames/s
H264 DECODING (1920x1080): 8 frames/s
H264 DECODING (1280x720): 5 frames/s

With power state set to "high", desktop effect OFF and glxgears:
MPEG DECODING (1920x1080): 77 frames/s
MPEG DECODING (1280x720): 117 frames/s
H264 DECODING (1920x1080): 16 frames/s
H264 DECODING (1280x720): 91 frames/s

With power state set to "high", desktop effect ON and glxgears:
MPEG DECODING (1920x1080): 77 frames/s
MPEG DECODING (1280x720): 117 frames/s
H264 DECODING (1920x1080): 51 frames/s
H264 DECODING (1280x720): 91 frames/s

It seems even glxgears wasn't able to keep highest power state in all tests: enabling desktop effects was enough to keep higher state in the second-last test.

What do you mean by "plain video playback"? If you mean without using vdpau it works flawlessly.
Comment 20 Alex Deucher 2013-11-11 14:39:02 UTC
(In reply to comment #19)
> 
> What do you mean by "plain video playback"? If you mean without using vdpau
> it works flawlessly.

Just play a video with vdpau using mplayer or some other app that supports vdpau.  I'm wondering if perhaps the way qvdpautest works causes the driver to switch between power states too often so the clocks never get a chance to stablize.
Comment 21 darkbasic 2013-11-11 14:44:49 UTC
No, I have the very same problem with mplayer2 + vdapu and adobe flash + vdpau.
Comment 22 Vladimir Usikov 2013-11-11 16:15:09 UTC
(In reply to comment #14)
> Some rather stupid command-line tool with a couple of options for testing
> different decoding profile and output methods should be perfectly sufficient.

mplayer -benchmark ?

http://www.w6rz.net/1080p25.zip

kwin desktop effect ON and no glxgears
mplayer -vo gl -benchmark -nosound 1080p25.ts
...
BENCHMARKs: VC:  79.531s VO:  27.184s A:   0.000s Sys:   2.558s =  109.273s
BENCHMARK%: VC: 72.7814% VO: 24.8773% A:  0.0000% Sys:  2.3413% = 100.0000%

kwin desktop effect ON and no glxgears
mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts
...
BENCHMARKs: VC:   2.425s VO:  38.371s A:   0.000s Sys:   3.190s =   43.986s
BENCHMARK%: VC:  5.5141% VO: 87.2335% A:  0.0000% Sys:  7.2523% = 100.0000%

kwin desktop effect ON and glxgears
mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts
...
BENCHMARKs: VC:   2.449s VO:  38.074s A:   0.000s Sys:   3.748s =   44.271s
BENCHMARK%: VC:  5.5325% VO: 86.0010% A:  0.0000% Sys:  8.4665% = 100.0000%
Comment 23 darkbasic 2013-11-11 16:25:27 UTC
I used to use mplayer benchmark before switching to qvdpautest, unfortunately results are not comparable because other peoples have to have the very same videos (which tends to go offline after some months).
vdpautest is the right way to go in my opinion, we just need someone to fix the remaining bugs.
Comment 24 darkbasic 2013-11-11 18:21:50 UTC
kwin desktop effects ON and glxgears
mplayer2 -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts

BENCHMARKs: VC:   5.073s VO: 105.829s A:   0.000s Sys:   3.263s =  114.164s
BENCHMARK%: VC:  4.4433% VO: 92.6989% A:  0.0000% Sys:  2.8578% = 100.0000%

Mplayer better shows the behaviour I wanted to show with the previous video: it takes *ALOT* of time to start the benchmark, but then it's quite fast. At least while glxgears is running: in fact without glxgears it takes ages.

Here is a second video showing the lag I'm talking about: http://www.youtube.com/watch?v=BDhB61U9S0A
As you can see when it starts decoding is quite fast.
Comment 25 darkbasic 2014-01-28 22:10:55 UTC
DPM doesn't still work with 3.14-rc0 :(:(:(
Comment 26 Alex Deucher 2014-01-28 22:25:49 UTC
Does forcing the power state to high help?  As root:

echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
Comment 27 darkbasic 2014-01-28 23:20:55 UTC
Power state high doesn't help.

qvdpautest + 3.14-rc0 + AUTO + KWIN OFF:

MPEG DECODING (1920x1080): 8 frames/s                                                                                                                                                                                                                                                                                                                                      
MPEG DECODING (1280x720): 9 frames/s                                                                                                                                                                                                                                                                                                                                       
H264 DECODING (1920x1080): 8 frames/s                                                                                                                                                                                                                                                                                                                                      
H264 DECODING (1280x720): 8 frames/s                                                                                                                                                                                                                                                                                                                                       
Profile unsupported.                                                                                                                                                                                                                                                                                                                                                       
MPEG4 DECODING (1920x1080): 8 frames/s

qvdpautest + 3.14-rc0 + HIGH + KWIN OFF: the same

qvdpautest + 3.14-rc0 + HIGH + KWIN ON + glxgears:

MPEG DECODING (1920x1080): 77 frames/s                                                                                                                                                                                                                                                                                                                                     
MPEG DECODING (1280x720): 117 frames/s                                                                                                                                                                                                                                                                                                                                     
H264 DECODING (1920x1080): 51 frames/s                                                                                                                                                                                                                                                                                                                                     
H264 DECODING (1280x720): 90 frames/s                                                                                                                                                                                                                                                                                                                                      
Profile unsupported.                                                                                                                                                                                                                                                                                                                                                       
MPEG4 DECODING (1920x1080): 71 frames/s

Fortunately I noticed a great improvement with 3.14: I don't have the huge lag before the start of the benchmark and glxgears doesn't freeze anymore like in http://www.youtube.com/watch?v=BDhB61U9S0A
Comment 28 Christian König 2014-01-30 17:57:46 UTC
Created attachment 93072 [details]
Fix.

Sorry that it took me so long to find this. It's a rather simple issue that the IRQ support for UVD on SI wasn't activated.

With this patch in place I now get 52fps with 1080p H264 decoding.
Comment 29 Vladimir Usikov 2014-01-30 19:20:23 UTC
(In reply to comment #28)
> Created attachment 93072 [details]
> Fix.
> 
> Sorry that it took me so long to find this. It's a rather simple issue that
> the IRQ support for UVD on SI wasn't activated.
> 
> With this patch in place I now get 52fps with 1080p H264 decoding.

With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, llvm-svn, Archlinux x86.
Comment 30 Vladimir Usikov 2014-01-30 19:21:03 UTC
Created attachment 93084 [details]
dmesg output
Comment 31 Vladimir Usikov 2014-01-30 19:22:25 UTC
Created attachment 93085 [details]
Xorg log
Comment 32 Alex Deucher 2014-01-30 19:44:59 UTC
(In reply to comment #29)
> With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git,
> llvm-svn, Archlinux x86.

Are you sure this lockup isn't caused by some other upgrade you did such as mesa?  The patch works fine here.
Comment 33 darkbasic 2014-01-30 22:19:17 UTC
I confirm the patch works on top of drm-next:

MPEG DECODING (1920x1080): 77 frames/s
MPEG DECODING (1280x720): 117 frames/s
H264 DECODING (1920x1080): 51 frames/s
H264 DECODING (1280x720): 91 frames/s
Profile unsupported.
MPEG4 DECODING (1920x1080): 72 frames/s

Any chance to get it merged in time for 3.14?
Comment 34 Alex Deucher 2014-01-30 22:20:41 UTC
(In reply to comment #33)
> Any chance to get it merged in time for 3.14?

Yes, it'll show up in 3.14 and the stable kernels.
Comment 35 Christian König 2014-01-31 09:05:32 UTC
Sounds like we can close this bug.

The GPU lockup of the GFX ring seems to be unreleated, please open up a new bugreport if that's really a regression.
Comment 36 Vladimir Usikov 2014-01-31 13:02:12 UTC
(In reply to comment #32)
> (In reply to comment #29)
> > With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git,
> > llvm-svn, Archlinux x86.
> 
> Are you sure this lockup isn't caused by some other upgrade you did such as
> mesa?  The patch works fine here.

Yes i am sure. I run qvdpautest before and after upgrade kernel. But now i run test again several times and all fine, no more gpu lockup.

If this happen again i open new bug.
Comment 37 Christian König 2014-02-01 10:47:21 UTC
Ok, thanks allot. Looks like we can close it.
Comment 38 darkbasic 2014-02-01 14:05:44 UTC
We need to reopen, it hangs for me too.

Here is a video which hangs 100% of the times:
https://mega.co.nz/#!eQhSjJQR!EEe8-taN5IspIu-RW0WQzmvKzc5fkCn282kS5ugZ_as

Play with
mplayer2 -vo vdpau, -vc ffmpeg12vdpau,ffwmv3vdpau,ffvc1vdpau,ffh264vdpau,ffodivxvdpau, PlanetEarthBirds.mkv
Comment 39 Christian König 2014-02-01 16:45:02 UTC
As already mentioned then please open up a new bugreport, cause that's clearly a different issue.
Comment 40 darkbasic 2014-02-01 18:07:29 UTC
Here it is: https://bugs.freedesktop.org/show_bug.cgi?id=74335


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.