Bug 97988 - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API
Summary: [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasin...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-30 14:35 UTC by Kai
Modified: 2017-03-15 13:33 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
mpv --hwdec=vdpau --vo=opengl-hq (2.32 MB, image/png)
2016-09-30 14:35 UTC, Kai
Details
mpv --hwdec=vaapi --vo=opengl-hq (good rendering for reference) (2.06 MB, image/png)
2016-09-30 14:36 UTC, Kai
Details
Weave shader dumps (19.65 KB, text/plain)
2016-10-01 14:38 UTC, Grigori Goronzy
Details
piglit test (1.60 KB, patch)
2016-10-04 15:40 UTC, Marek Olšák
Details | Splinter Review
Mesa part of fix (5.35 KB, patch)
2016-11-07 11:10 UTC, Nicolai Hähnle
Details | Splinter Review
Updated version of attachment 127812 (3.07 KB, patch)
2016-11-11 21:17 UTC, Kai
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Kai 2016-09-30 14:35:22 UTC
Created attachment 126902 [details]
mpv --hwdec=vdpau --vo=opengl-hq

I noticed, that when I play videos with mpv using VDPAU I'm getting an image that looks like deinterlacing/anti-aliasing is not working, making fine structures/patterns or text look ugly/unreadable. Playing the same video with the same stack but using VA-API doesn't exhibit this issue.

Attached you'll find a screenshot taken with VDPAU when playing a video using
 mpv --hwdec=vdpau --vo=opengl-hq
The only difference for the VA-API playback is using "--hwdec=vaapi". I would say that this is a regression, but I can't remember when it started. :-(

The stack showing this issue is (Debian testing as a base):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/e4b585f009
libdrm: 2.4.70-1
LLVM: SVN:trunk/r282761 (4.0 devel)
X.Org: 2:1.18.4-1
Linux: 4.7.5
Firmware: firmware-amd-graphics/20160824-1
libclc: Git:master/88b82a6f70
DDX (amdgpu): 1.1.2-1

Let me know, if you need anything else.
Comment 1 Kai 2016-09-30 14:36:17 UTC
Created attachment 126903 [details]
mpv --hwdec=vaapi --vo=opengl-hq (good rendering for reference)
Comment 2 Christian König 2016-09-30 14:37:58 UTC
Does that happen as well with "mpv -hwdec vdpau -vo vdpau" ?

Could as well be a problem in mpv.
Comment 3 Kai 2016-09-30 14:43:24 UTC
(In reply to Christian König from comment #2)
> Does that happen as well with "mpv -hwdec vdpau -vo vdpau" ?

No, it doesn't!

> Could as well be a problem in mpv.

So this is a MPV issue then?
Comment 4 Grigori Goronzy 2016-09-30 14:44:56 UTC
Yeah, this is an issue with the GL interop. I can reproduce it here, so maybe I should take a look at it.
Comment 5 Christian König 2016-09-30 17:48:07 UTC
Could as well be a problem within Mesa.

Grigori if you could take a look that would be great.

Thanks in advance,
Christian.
Comment 6 andreaskem 2016-09-30 18:55:20 UTC
That is strange. This sounds very much like a problem I and another user encountered with an older version of Mesa:

https://github.com/mpv-player/mpv/issues/3317
https://bugs.freedesktop.org/show_bug.cgi?id=96860

Upgrading to Mesa 12.0.0 fixed it, however. I cannot really provide further clues as I am currently not seeing the issue.
Comment 7 Andy Furniss 2016-09-30 20:48:31 UTC
Still not quite right with vaapi or before this, but that's a different hard to see issue without a res test of the correct height.

This one seems to start with -

commit 16be87c904293c2e53d50cc3519789a604a6a33b
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Tue Sep 13 14:25:44 2016 +0200

    radeonsi: get rid of img/buf/sampler descriptor preloading (v2)
    
    26011 shaders in 14651 tests
    Totals:
    SGPRS: 1251920 -> 1152636 (-7.93 %)
    VGPRS: 728421 -> 728198 (-0.03 %)
    Spilled SGPRs: 16644 -> 3776 (-77.31 %)
    Spilled VGPRs: 369 -> 369 (0.00 %)
    Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
    Code Size: 36001064 -> 35835152 (-0.46 %) bytes
    LDS: 767 -> 767 (0.00 %) blocks
    Max Waves: 222221 -> 222372 (0.07 %)
    Wait states: 0 -> 0 (0.00 %)
    
    v2: merge codepaths where possible
Comment 8 Grigori Goronzy 2016-09-30 22:58:22 UTC
Here's a bit of background:

Earlier, mpv used interop with OutputSurfaces. That is, it rendered the video into an RGB surface and then used that for its own procesing (scaling, rendering OSD on top, etc.).

Now recently, mpv has switched to interop with VideoSurfaces. That is, it maps four fields of luma/chroma planes. These are split up into top and bottom field, just as the spec for the interop extension says:

https://www.opengl.org/registry/specs/NV/vdpau_interop.txt

mpv then weaves the fields and does color space conversion, scaling, etc.


Things get wrong with the weaving, and I found some very suspicious code in Mesa:

https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_vdpau.c#n82

Here a sampler is returned based on texture name index, and the same plane sampler is used for top and bottom field. That can't be right at all. Is the sampler possibly mangled somewhere else to make it point to the right fields? It does not look like it, but maybe I'm just missing something.

This really makes me wonder whether VideoSurface interop ever worked correctly.
Comment 9 Grigori Goronzy 2016-09-30 23:34:57 UTC
Sorry, it's something else. The field is set up later by pointing to the right layer. So there's something else going wrong. Andy, I can confirm that reverting to that commit fixes it. I'll keep looking.
Comment 10 Grigori Goronzy 2016-10-01 14:38:46 UTC
Created attachment 126946 [details]
Weave shader dumps

It looks like LLVM miscompiles the weave fragment shader due to the changes to descriptor loading. See the attached file. With these changes, LLVM generates some really odd looking and inefficient code in this case, and the two sample instructions are somehow folded together, which cannot possibly work.

Someone proficient in LLVM can probably figure out what is going wrong.
Comment 11 Marek Olšák 2016-10-01 18:41:03 UTC
Thanks Grigori.

GLSL:
  color = fract(gl_FragCoord.y / 2) < 0.5 ?
             texture(texture0, texcoord0) :
             texture(texture1, texcoord0);

texture0 and texture1 are SMEM loads.

LLVM (SimplifyCFG) transforms it to:
  color = texture(fract(gl_FragCoord.y / 2) < 0.5 ? texture0 : texture1,
                  texcoord0);

That's a nice transformation. You don't have to use 2 SMEM loads, you can just use one SMEM load depending on the result of the condition.

The problem is gl_FragCoord.y is a VGPR and texture0/1 are SGPRs, therefore flat VMEM loads are used to load the descriptors. However, image_sample requires descriptors in SGPRs, so v_readfirstlane is used. That effectively uses the result of the condition from the first active lane, discarding the results from all other lanes. The result would be exactly the same if the compiler did: v_readfirstlane s0, gl_FragCoord.y;

The test case seems pretty trivial I wonder how many other apps are affected.
Comment 12 Tom Stellard 2016-10-03 17:29:38 UTC
(In reply to Marek Olšák from comment #11)
> Thanks Grigori.
> 
> GLSL:
>   color = fract(gl_FragCoord.y / 2) < 0.5 ?
>              texture(texture0, texcoord0) :
>              texture(texture1, texcoord0);
> 
> texture0 and texture1 are SMEM loads.
> 
> LLVM (SimplifyCFG) transforms it to:
>   color = texture(fract(gl_FragCoord.y / 2) < 0.5 ? texture0 : texture1,
>                   texcoord0);
> 
> That's a nice transformation. You don't have to use 2 SMEM loads, you can
> just use one SMEM load depending on the result of the condition.
> 
> The problem is gl_FragCoord.y is a VGPR and texture0/1 are SGPRs, therefore
> flat VMEM loads are used to load the descriptors. However, image_sample
> requires descriptors in SGPRs, so v_readfirstlane is used. That effectively
> uses the result of the condition from the first active lane, discarding the
> results from all other lanes. The result would be exactly the same if the
> compiler did: v_readfirstlane s0, gl_FragCoord.y;
> 
> The test case seems pretty trivial I wonder how many other apps are affected.

I think the best solution here would be to teach the backend how to do a scalar select based on value of vccz.
Comment 13 Marek Olšák 2016-10-03 17:55:57 UTC
(In reply to Tom Stellard from comment #12)
> 
> I think the best solution here would be to teach the backend how to do a
> scalar select based on value of vccz.

You are missing the point. The condition code (VCC) isn't the same across all threads. The problem is that the conditional assignment is transformed into a form that makes descriptor load addresses non-uniform (dependent on a VGPR). There is nothing the AMDGPU backend can do about it. It's not a problem in the backend.
Comment 14 Christian König 2016-10-04 12:27:15 UTC
(In reply to Marek Olšák from comment #13)
> (In reply to Tom Stellard from comment #12)
> > 
> > I think the best solution here would be to teach the backend how to do a
> > scalar select based on value of vccz.
> 
> You are missing the point. The condition code (VCC) isn't the same across
> all threads. The problem is that the conditional assignment is transformed
> into a form that makes descriptor load addresses non-uniform (dependent on a
> VGPR). There is nothing the AMDGPU backend can do about it. It's not a
> problem in the backend.

Well the backend could iterate over all the variants to handle that, but I have strong doubts that this would result in optimal performance.

So I think the best solution for now would be to mark the texture intrinsic in a way which disallows such optimizations.
Comment 15 Marek Olšák 2016-10-04 15:40:28 UTC
Created attachment 126996 [details] [review]
piglit test

The attached test reproduces the issue. CFGSimplificationPass is the culprit.

Note that CFGSimplificationPass also doesn't preserve the !amdgpu.uniform metadata when transforming the loads, which is a change in behavior.
Comment 16 Nicolai Hähnle 2016-10-04 16:01:59 UTC
Looks like we'd need the opposite of the convergent attribute. And that attribute should only apply to one argument of the texture sampling intrinsic (i.e. not to the texture coordinate). I don't know if something like that already exists in LLVM.
Comment 17 Andy Furniss 2016-10-25 09:46:41 UTC
(In reply to Tom Stellard from comment #12)
> (In reply to Marek Olšák from comment #11)

> > The test case seems pretty trivial I wonder how many other apps are affected.

I don't know the answer to that, but I've just noticed that a debug build of mesa will catch the issue with mpv -

mpv: state_tracker/st_sampler_view.c:481: st_get_texture_sampler_view_from_stobj: Assertion `stObj->base.MinLayer == view->u.tex.first_layer' failed.
Comment 18 Marek Olšák 2016-10-25 10:19:54 UTC
(In reply to Andy Furniss from comment #17)
> mpv: state_tracker/st_sampler_view.c:481:
> st_get_texture_sampler_view_from_stobj: Assertion `stObj->base.MinLayer ==
> view->u.tex.first_layer' failed.

That's unrelated.
Comment 19 Andy Furniss 2016-10-25 12:46:33 UTC
(In reply to Marek Olšák from comment #18)
> (In reply to Andy Furniss from comment #17)
> > mpv: state_tracker/st_sampler_view.c:481:
> > st_get_texture_sampler_view_from_stobj: Assertion `stObj->base.MinLayer ==
> > view->u.tex.first_layer' failed.
> 
> That's unrelated.

Oops, yea, should have checked better - just it needed the same version/command line with mpv as this bug to trigger, but it does start later =

commit e5cc84dd43be066c1dd418e32f5ad258e31a150a

Author: Brian Paul <brianp@vmware.com>
Date:   Fri Sep 30 16:41:46 2016 -0600

    st/mesa: optimize pipe_sampler_view validation
Comment 20 Nicolai Hähnle 2016-11-07 11:10:14 UTC
Created attachment 127812 [details] [review]
Mesa part of fix

LLVM patch https://reviews.llvm.org/D26348 together with the attached patch for Mesa fix this bug.
Comment 21 Andy Furniss 2016-11-07 11:38:18 UTC
(In reply to Nicolai Hähnle from comment #20)
> Created attachment 127812 [details] [review] [review]
> Mesa part of fix
> 
> LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> for Mesa fix this bug.

Nice, will test later, but is there a actually a nice way to get a patch from Phabricator that will apply from top level with git apply?

I usually fail with "download raw diff" and have to search/sort out the paths by hand.
Comment 22 Andy Furniss 2016-11-07 13:40:25 UTC
(In reply to Andy Furniss from comment #21)
> (In reply to Nicolai Hähnle from comment #20)
> > Created attachment 127812 [details] [review] [review] [review]
> > Mesa part of fix
> > 
> > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > for Mesa fix this bug.
> 
> Nice, will test later

Working OK for me.
Comment 23 Dieter Nützel 2016-11-07 14:20:20 UTC
(In reply to Andy Furniss from comment #21)
> (In reply to Nicolai Hähnle from comment #20)
> > Created attachment 127812 [details] [review] [review] [review]
> > Mesa part of fix
> > 
> > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > for Mesa fix this bug.
> 
> Nice, will test later, but is there a actually a nice way to get a patch
> from Phabricator that will apply from top level with git apply?
> 
> I usually fail with "download raw diff" and have to search/sort out the
> paths by hand.

Hello Andy,

how did you solved this?
Comment 24 Nicolai Hähnle 2016-11-07 14:58:59 UTC
Andy, I usually use the arc command line tool for this kind of stuff. Thanks for testing!
Comment 25 Andy Furniss 2016-11-07 15:09:08 UTC
(In reply to Dieter Nützel from comment #23)
> (In reply to Andy Furniss from comment #21)
> > (In reply to Nicolai Hähnle from comment #20)
> > > Created attachment 127812 [details] [review] [review] [review] [review]
> > > Mesa part of fix
> > > 
> > > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > > for Mesa fix this bug.
> > 
> > Nice, will test later, but is there a actually a nice way to get a patch
> > from Phabricator that will apply from top level with git apply?
> > 
> > I usually fail with "download raw diff" and have to search/sort out the
> > paths by hand.
> 
> Hello Andy,
> 
> how did you solved this?

Turned out to be easier than I remembered from my previous few efforts when (rarely) applying patches for llvm from there.

All I needed to do, was wherever there is a +++ or --- followed by a path, add a leading / to the path.

I guess there is a cool command line way like Nicolai says - but I would probably end up hosing the whole thing if I tried like that :-)
Comment 26 Kai 2016-11-11 21:17:59 UTC
Created attachment 127922 [details] [review]
Updated version of attachment 127812 [details] [review]

(In reply to Nicolai Hähnle from comment #20)
> Created attachment 127812 [details] [review]
> Mesa part of fix
> 
> LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> for Mesa fix this bug.

I can confirm, that this seems to fix the issue for me. You can have my
 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Note: I had to update attachment 127812 [details] [review] to the attached version in order to apply it on top of current Mesa master. See below for the full stack details.

(In reply to Andy Furniss from comment #21)
> (In reply to Nicolai Hähnle from comment #20)
> > Created attachment 127812 [details] [review] [review] [review]
> > Mesa part of fix
> > 
> > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > for Mesa fix this bug.
> 
> Nice, will test later, but is there a actually a nice way to get a patch
> from Phabricator that will apply from top level with git apply?
> 
> I usually fail with "download raw diff" and have to search/sort out the
> paths by hand.

patch -p0 is your friend. (If you use quilt: quilt import -p0)


The stack used to verify the issue is fixed was (Debian testing as a base):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/3ff9f8c532 + the modified version of attachment 127812 [details] [review]
libdrm: 2.4.71-1
LLVM: SVN:trunk/r282761 (4.0 devel) + <https://reviews.llvm.org/D26348?download=true>
X.Org: 2:1.18.4-2
Linux: 4.8.7
Firmware: Git:master/a179db9791
libclc: Git:master/520743b0b7
DDX (amdgpu): 1.1.2-1
Comment 27 Kai 2017-02-06 12:35:54 UTC
Ping.

This is still an issue with a current stack (Debian testing as a base), unless I include attachment 127922 [details] [review] in my Mesa build and the LLVM patch <https://reviews.llvm.org/D26348?download=true> in my LLVM build):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/7c5629a269 + attachment 127922 [details] [review] and revert of 7b32ae4df5bc19c378598d6a950a6019fa64ece6 (see bug 99542)
libdrm: Git:master/d4b8344363 (tag libdrm-2.4.75)
LLVM: SVN:trunk/r294119 (5.0 devel) + <https://reviews.llvm.org/D26348?download=true>
X.Org: 2:1.19.1-4
Linux: 4.9.8
Firmware (firmware-amd-graphics): 20160824-1
libclc: Git:master/2ec7d80d5e
DDX (xserver-xorg-video-amdgpu): 1.2.0-1+b1
Comment 28 Kai 2017-02-06 13:01:14 UTC
(In reply to Kai from comment #27)
> Mesa: Git:master/7c5629a269 + attachment 127922 [details] [review] and revert of 7b32ae4df5bc19c378598d6a950a6019fa64ece6 (see bug 99542)

The Mesa revision given was wrong, I meant 02264bc6f9.
Comment 29 Kai 2017-03-05 20:09:37 UTC
Ping.

@nha: is there any progress on this? Or more specifically the LLVM change D26348?
Comment 30 Samuel Pitoiset 2017-03-15 13:33:51 UTC
Just pushed a workaround for this issue in Mesa.

https://cgit.freedesktop.org/mesa/mesa/commit/?id=7751ed39e40e08e5aa0633d018c9f25ad17f9bb0


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.