Summary: | [UVD] qvdpautest is very slow on radeonsi (HD 7950) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | darkbasic <darkbasic> | ||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | CC: | grantipak | ||||||||||
Version: | XOrg git | ||||||||||||
Hardware: | Other | ||||||||||||
OS: | All | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
darkbasic
2013-11-09 23:17:56 UTC
ArchLinux x32; kernel 3.12; llvm - svn; mesa - git; xorg-server 1.14.4; Radeon HD 7950 qvdpautest 0.5.2 AMD Phenom(tm) 9550 Quad-Core Processor Unknown GPU VDPAU API version : 1 VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0 MPEG DECODING (1920x1080): 19 frames/s MPEG DECODING (1280x720): 19 frames/s H264 DECODING (1920x1080): 15 frames/s H264 DECODING (1280x720): 16 frames/s MPEG4 DECODING (1920x1080): 15 frames/s MIXER WEAVE (1920x1080): 3293 frames/s MIXER BOB (1920x1080): 3878 fields/s MIXER TEMPORAL (1920x1080): 3884 fields/s MIXER TEMPORAL + IVTC (1920x1080): 3881 fields/s MIXER TEMPORAL + SKIP_CHROMA (1920x1080): 3895 fields/s MIXER TEMPORAL_SPATIAL (1920x1080): 3881 fields/s MIXER TEMPORAL_SPATIAL + IVTC (1920x1080): 3885 fields/s MIXER TEMPORAL_SPATIAL + SKIP_CHROMA (1920x1080): 3888 fields/s MIXER TEMPORAL_SPATIAL (720x576 video to 1920x1080 display): 3439 fields/s MULTITHREADED MPEG DECODING (1920x1080): 76 frames/s MULTITHREADED MIXER TEMPORAL (1920x1080): 3930 fields/s Make sure dpm is enabled. add radeon.dpm=1 to the kernel command line in grub. I'm not sure if it's related but make sure your xserver is patched to work with the latest mesa fd1b24a93e ("glx: Add support for the new DRI") (In reply to comment #0) > http://bpaste.net/show/148239/ In the future, please attach the output rather than referring to an external site that may go away at some point. > > kernel is 3.13 (~agd5f drm-next-3.13). > The whole graphic stack is from git except xorg-server which is 1.14.3. from your log: FATAL: get_bits failed : No backend implementation could be loaded.!! There's some problem with your build. (In reply to comment #4) > FATAL: get_bits failed : No backend implementation could be loaded.!! > > There's some problem with your build. That message is normal, just a function we haven't implemented yet. But I agree the numbers look like you are on the bootup clocks for UVD/graphics or something is going wrong with dpm. dpm is enabled of course (because I set radeon.dpm=1 and because 3.13 should have dpm enabled by default afaik). When using UVD with dpm set to auto it switches from the lowest state to the highest lots of times, again and again. *Anyway* when I did run the attached benchmark I forced dpm to "high" before starting. Myke I'm upgrading to 1.14.4 and patching with glx: Add support for the new DRI loader entrypoint: http://cgit.freedesktop.org/xorg/xserver/commit/?id=7ecfab47eb221dbb996ea6c033348b8eceaeb893 I applied 'glx: Add support for the new DRI loader entrypoint' to xorg-server-1.14.4 and I updated the rest of the graphic stack to latest snapshot from git master: nothing changes. While the test was running I got a "Bus error": Fontconfig warning: "/etc/fonts/conf.d/50-user.conf", line 14: reading configurations from ~/.fonts.conf is deprecated. qvdpautest 0.5.2 Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz Unknown GPU VDPAU API version : 1 VDPAU implementation : G3DVL VDPAU Driver Shared Library version 1.0 FATAL: get_bits failed : No backend implementation could be loaded.!! MPEG DECODING (1920x1080): 8 frames/s MPEG DECODING (1280x720): 5 frames/s Errore di bus Also I noticed that despite I did "echo high > /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/power_dpm_force_performance_level" I still get lots of power states switching in dmesg. Please see attached dmesg. I also noticed lots of "HDMI: ELD buf size is 0, force 128" and "HDMI: invalid ELD data byte 0" in my dmesg. Maybe something audio related? Monitor is attached using DVI, not HDMI. Created attachment 89017 [details]
dmesg
dmesg after running qvdpautest
This is probably related to dpm and gpu clocks - if I run "vblank_mode=0 glxgears" in parallel with the benchmark the results are significantly better for me: w/o gears: MPEG DECODING (1920x1080): 13 frames/s MPEG DECODING (1280x720): 13 frames/s H264 DECODING (1920x1080): 12 frames/s H264 DECODING (1280x720): 13 frames/s with gears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 118 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 92 frames/s (In reply to comment #6) > *Anyway* when I did run the attached benchmark I forced dpm to "high" before > starting. Setting power_dpm_force_performance_level to "high" doesn't really work for me in this case - AFAICS the driver resets it back to "auto" when the benchmark starts, probably when switching to uvd state. (In reply to comment #5) > (In reply to comment #4) > > FATAL: get_bits failed : No backend implementation could be loaded.!! > > > > There's some problem with your build. > > That message is normal, just a function we haven't implemented yet. > As far as I can see it's actually illegal API usage in qvdpautest. It's trying to read from uninitialized video surfaces, which is not guaranteed to work. Swapping around the order of tests so that it does the PutBits test first fixes it. (In reply to comment #8) > Also I noticed that despite I did "echo high > > /sys/devices/pci0000:00/0000:00:1c.6/0000:03:00.0/ > power_dpm_force_performance_level" I still get lots of power states > switching in dmesg. As Vadim correctly noted forcing any power state doesn't work here, because we need to switch to the UVD power state anyway. BTW: Is this a regression? (In reply to comment #10) > This is probably related to dpm and gpu clocks - if I run "vblank_mode=0 > glxgears" in parallel with the benchmark the results are significantly > better for me: > > w/o gears: > > MPEG DECODING (1920x1080): 13 frames/s > MPEG DECODING (1280x720): 13 frames/s > H264 DECODING (1920x1080): 12 frames/s > H264 DECODING (1280x720): 13 frames/s > > with gears: > > MPEG DECODING (1920x1080): 77 frames/s > MPEG DECODING (1280x720): 118 frames/s > H264 DECODING (1920x1080): 51 frames/s > H264 DECODING (1280x720): 92 frames/s ^^ very valuable comment, thx. So do I get that right that generation 3D load affects UVD decoding performance here? (In reply to comment #11) > (In reply to comment #5) > > (In reply to comment #4) > > > FATAL: get_bits failed : No backend implementation could be loaded.!! > > > > > > There's some problem with your build. > > > > That message is normal, just a function we haven't implemented yet. > > > > As far as I can see it's actually illegal API usage in qvdpautest. It's > trying to read from uninitialized video surfaces, which is not guaranteed to > work. Swapping around the order of tests so that it does the PutBits test > first fixes it. Thx for the into. qvdpautest is badly written in many aspects (takes to many time, is inaccurate etc...). Would be nice if somebody could sit down and either write something new from scratch or start to improve it. Some rather stupid command-line tool with a couple of options for testing different decoding profile and output methods should be perfectly sufficient. Here is mine while running glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 16 frames/s H264 DECODING (1280x720): 91 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 72 frames/s No, it isn't a reggression: it never worked for me. (In reply to comment #13) > So do I get that right that generation 3D load affects UVD decoding > performance here? Yes, I think the explanation of the difference in benchmark results is that glxgears triggers higher gpu power level, probably the benchmark alone simply doesn't provide enough load for that or maybe something is wrong with dpm logic. Here is what I see while running the benchmark: w/o gears: power level 0 sclk: 45000 mclk: 120000 vddc: 900 vddci: 975 pcie gen: 2 with gears: power level 2 sclk: 100000 mclk: 120000 vddc: 1219 vddci: 975 pcie gen: 2 uvd clocks are the same in both cases: uvd vclk: 72000 dclk: 56000 when the system is completely idle I see the following values: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 vddc: 825 vddci: 850 pcie gen: 2 Anyway even with glxgears it's far from being perfect. See this video: http://www.youtube.com/watch?v=aM3aRiKgxwM With kernel 3.13, the driver retains the user selected performance level across state changes. Additionally, when using a UVD state, the sclk and mclk are always forced to their highest levels. This isn't reflected in the debugfs output since that just prints the unpatched power state. Does plain video playback work ok (i.e., not qvdpautest)? Alex from the tests I did 3.13 doesn't seem to behave the way you described. With power state set to "high", desktop effect OFF and no glxgears: MPEG DECODING (1920x1080): 5 frames/s MPEG DECODING (1280x720): 21 frames/s H264 DECODING (1920x1080): 8 frames/s H264 DECODING (1280x720): 5 frames/s With power state set to "high", desktop effect OFF and glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 16 frames/s H264 DECODING (1280x720): 91 frames/s With power state set to "high", desktop effect ON and glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 91 frames/s It seems even glxgears wasn't able to keep highest power state in all tests: enabling desktop effects was enough to keep higher state in the second-last test. What do you mean by "plain video playback"? If you mean without using vdpau it works flawlessly. (In reply to comment #19) > > What do you mean by "plain video playback"? If you mean without using vdpau > it works flawlessly. Just play a video with vdpau using mplayer or some other app that supports vdpau. I'm wondering if perhaps the way qvdpautest works causes the driver to switch between power states too often so the clocks never get a chance to stablize. No, I have the very same problem with mplayer2 + vdapu and adobe flash + vdpau. (In reply to comment #14) > Some rather stupid command-line tool with a couple of options for testing > different decoding profile and output methods should be perfectly sufficient. mplayer -benchmark ? http://www.w6rz.net/1080p25.zip kwin desktop effect ON and no glxgears mplayer -vo gl -benchmark -nosound 1080p25.ts ... BENCHMARKs: VC: 79.531s VO: 27.184s A: 0.000s Sys: 2.558s = 109.273s BENCHMARK%: VC: 72.7814% VO: 24.8773% A: 0.0000% Sys: 2.3413% = 100.0000% kwin desktop effect ON and no glxgears mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts ... BENCHMARKs: VC: 2.425s VO: 38.371s A: 0.000s Sys: 3.190s = 43.986s BENCHMARK%: VC: 5.5141% VO: 87.2335% A: 0.0000% Sys: 7.2523% = 100.0000% kwin desktop effect ON and glxgears mplayer -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts ... BENCHMARKs: VC: 2.449s VO: 38.074s A: 0.000s Sys: 3.748s = 44.271s BENCHMARK%: VC: 5.5325% VO: 86.0010% A: 0.0000% Sys: 8.4665% = 100.0000% I used to use mplayer benchmark before switching to qvdpautest, unfortunately results are not comparable because other peoples have to have the very same videos (which tends to go offline after some months). vdpautest is the right way to go in my opinion, we just need someone to fix the remaining bugs. kwin desktop effects ON and glxgears mplayer2 -benchmark -vo vdpau -vc ffmpeg12vdpau -nosound 1080p25.ts BENCHMARKs: VC: 5.073s VO: 105.829s A: 0.000s Sys: 3.263s = 114.164s BENCHMARK%: VC: 4.4433% VO: 92.6989% A: 0.0000% Sys: 2.8578% = 100.0000% Mplayer better shows the behaviour I wanted to show with the previous video: it takes *ALOT* of time to start the benchmark, but then it's quite fast. At least while glxgears is running: in fact without glxgears it takes ages. Here is a second video showing the lag I'm talking about: http://www.youtube.com/watch?v=BDhB61U9S0A As you can see when it starts decoding is quite fast. DPM doesn't still work with 3.14-rc0 :(:(:( Does forcing the power state to high help? As root: echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level Power state high doesn't help. qvdpautest + 3.14-rc0 + AUTO + KWIN OFF: MPEG DECODING (1920x1080): 8 frames/s MPEG DECODING (1280x720): 9 frames/s H264 DECODING (1920x1080): 8 frames/s H264 DECODING (1280x720): 8 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 8 frames/s qvdpautest + 3.14-rc0 + HIGH + KWIN OFF: the same qvdpautest + 3.14-rc0 + HIGH + KWIN ON + glxgears: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 90 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 71 frames/s Fortunately I noticed a great improvement with 3.14: I don't have the huge lag before the start of the benchmark and glxgears doesn't freeze anymore like in http://www.youtube.com/watch?v=BDhB61U9S0A Created attachment 93072 [details]
Fix.
Sorry that it took me so long to find this. It's a rather simple issue that the IRQ support for UVD on SI wasn't activated.
With this patch in place I now get 52fps with 1080p H264 decoding.
(In reply to comment #28) > Created attachment 93072 [details] > Fix. > > Sorry that it took me so long to find this. It's a rather simple issue that > the IRQ support for UVD on SI wasn't activated. > > With this patch in place I now get 52fps with 1080p H264 decoding. With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, llvm-svn, Archlinux x86. Created attachment 93084 [details]
dmesg output
Created attachment 93085 [details]
Xorg log
(In reply to comment #29) > With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, > llvm-svn, Archlinux x86. Are you sure this lockup isn't caused by some other upgrade you did such as mesa? The patch works fine here. I confirm the patch works on top of drm-next: MPEG DECODING (1920x1080): 77 frames/s MPEG DECODING (1280x720): 117 frames/s H264 DECODING (1920x1080): 51 frames/s H264 DECODING (1280x720): 91 frames/s Profile unsupported. MPEG4 DECODING (1920x1080): 72 frames/s Any chance to get it merged in time for 3.14? (In reply to comment #33) > Any chance to get it merged in time for 3.14? Yes, it'll show up in 3.14 and the stable kernels. Sounds like we can close this bug. The GPU lockup of the GFX ring seems to be unreleated, please open up a new bugreport if that's really a regression. (In reply to comment #32) > (In reply to comment #29) > > With this patch I get gpu lockup and Xorg crash. kernel 3.13.1, mesa-git, > > llvm-svn, Archlinux x86. > > Are you sure this lockup isn't caused by some other upgrade you did such as > mesa? The patch works fine here. Yes i am sure. I run qvdpautest before and after upgrade kernel. But now i run test again several times and all fine, no more gpu lockup. If this happen again i open new bug. Ok, thanks allot. Looks like we can close it. We need to reopen, it hangs for me too. Here is a video which hangs 100% of the times: https://mega.co.nz/#!eQhSjJQR!EEe8-taN5IspIu-RW0WQzmvKzc5fkCn282kS5ugZ_as Play with mplayer2 -vo vdpau, -vc ffmpeg12vdpau,ffwmv3vdpau,ffvc1vdpau,ffh264vdpau,ffodivxvdpau, PlanetEarthBirds.mkv As already mentioned then please open up a new bugreport, cause that's clearly a different issue. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.