Bug 99158

Summary: vdpau segfaults and gpu locks with kodi on R9285
Product: Mesa Reporter: Andy Furniss <adf.lists>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: maraeo
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: some segfaults
some gpu locks

Description Andy Furniss 2016-12-20 11:22:39 UTC
Created attachment 128583 [details]
some segfaults

This is an initial report no bisects as I have no clue when it started yet.

It seems there is an unlucky timing situation with kodi that may segfault radeonsi or lock gpu playing videos with vdpau h/w decode + temporal vdpau deint which is default kodi settings.

I can't reproduce with mplayer or mpv.

It seems to need HD interlaced + deint, but that may be just because it changes timing.

Running with VDPAU_TRACE=1 so far avoids the crash/lock, as does setting cpus to perf.

Running kodi git, but it's reproducable with 11 month old git as well.

It's easiest to provoke starting kodi with a file on command line, but is possible starting from running kodi menu - possible improved chance of crash/lock by moving mouse after clicking file to make overlay instantly render over video.

Attaching some segfaults and locks.

I am running git mesa/llvm/kernel and the crash is rare enough that I may have missed it for ages given the amount I test kodi. Additionally the best way to provoke = repeated command line start, only started being possible again ecently due to a kodi bug.

The segfaults are a bit random.
Comment 1 Andy Furniss 2016-12-20 11:23:09 UTC
Created attachment 128584 [details]
some gpu locks
Comment 2 Michel Dänzer 2016-12-20 14:41:44 UTC
Can you try reproducing it in valgrind and see if that raises some interesting errors?
Comment 3 Andy Furniss 2016-12-22 00:00:02 UTC
Couldn't get it to crash with valgrind.

Bisect not 100% sure as it seemed to get harder to provoke going back in time, but it does seem to need temporal deint and I got -

first bad commit: [d0d5f7600c2e8ab8d0c153787185f7a534753edd]
Revert "st/vdpau: use linear layout for output surfaces

The revert reverts from current head, so I will test for a while like that.
Comment 4 Andy Furniss 2016-12-29 11:03:57 UTC
(In reply to Andy Furniss from comment #3)
> Couldn't get it to crash with valgrind.
> 
> Bisect not 100% sure as it seemed to get harder to provoke going back in
> time, but it does seem to need temporal deint and I got -
> 
> first bad commit: [d0d5f7600c2e8ab8d0c153787185f7a534753edd]
> Revert "st/vdpau: use linear layout for output surfaces
> 
> The revert reverts from current head, so I will test for a while like that.

I've been running OK with the revert reverted for some time now, testing with a script that repeatedly starts kodi, so it does seem to be this commit that causes some timing issue with vdpau temporal deint.
Comment 5 Christian König 2017-01-04 09:02:57 UTC
Yeah and that actually makes sense, by using linear layout you also disable DCC.

Does using R600_DEBUG=nodcc help as well?

Marek please take a look at the backtraces in the first post. It sounds like one thread is getting the handle of a texture which results in a DCC decompress.

Could it be that we have a race with that?

Regards,
Christian.
Comment 6 Marek Olšák 2017-01-04 10:18:48 UTC
We could get a race if vlVdpOutputSurfaceDMABuf is called from a different thread than the context is normally used with. Can that happen?

You can try to test this:

diff --git a/src/gallium/state_trackers/vdpau/output.c b/src/gallium/state_trackers/vdpau/output.c
index 64574b2..96474fb 100644
--- a/src/gallium/state_trackers/vdpau/output.c
+++ b/src/gallium/state_trackers/vdpau/output.c
@@ -804,7 +804,7 @@ VdpStatus vlVdpOutputSurfaceDMABuf(VdpOutputSurface surface,
    whandle.type = DRM_API_HANDLE_TYPE_FD;
 
    pscreen = vlsurface->surface->texture->screen;
-   if (!pscreen->resource_get_handle(pscreen, vlsurface->device->context,
+   if (!pscreen->resource_get_handle(pscreen, NULL,
                                      vlsurface->surface->texture, &whandle,
                                     PIPE_HANDLE_USAGE_READ_WRITE))
       return VDP_STATUS_NO_IMPLEMENTATION;
Comment 7 Marek Olšák 2017-01-04 10:49:13 UTC
Nevermind. This series should fix it for VDPAU and VAAPI:

https://patchwork.freedesktop.org/series/17480/
Comment 8 Andy Furniss 2017-01-05 00:00:36 UTC
(In reply to Marek Olšák from comment #7)
> Nevermind. This series should fix it for VDPAU and VAAPI:
> 
> https://patchwork.freedesktop.org/series/17480/

Yea, it's OK with these, thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.