Bug 81548

Summary: [HSW] suspend/resume sometimes leaves DRI crippled
Product: DRI Reporter: Tobias Jakobi <liquid.acid>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED INVALID QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: freedesktop, intel-gfx-bugs
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
full backtrace (with debug) none

Description Tobias Jakobi 2014-07-19 19:35:23 UTC
Hello,

got a new laptop system, which is still dealing with some "teething problems". After some suspend/resume cycles the DRI support gets crippled, in the following sense:
E.g. glxgears just displays are black screen and doesn't provide any status output. So it's not like it is rendering something which is not shown on the screen then. Something is blocking here. Investigating dmesg and the Xorg log didn't provide anything useful. No GPU hang detected or anything out of the ordinary. I'm going to check again though when it happens again.

I should be using DRI3 here. Also using PRIME (a dedicated Radeon R9 M265X) shows the same results, no rendering.

Software stack:
xorg-server-1.16.0
mesa git tip
libdrm git tip
vanilla 3.15.6

CPU is a i7-4700HQ, which should make the GPU a hsw/gen7.5 (?)

Uploading dmesg and Xorg log later.

Any ideas how to debug this, should it happen again? I thought about just attaching gdb to glxgears and checking where it's looping for starters.


With best wishes,
Tobias
Comment 1 Tobias Jakobi 2014-07-19 19:40:05 UTC
xf86-video-intel: git tip as well
Comment 2 Chris Wilson 2014-07-19 20:00:08 UTC
step 1: disable DRI3
Comment 3 Tobias Jakobi 2014-07-19 20:32:28 UTC
So, you want to know if this also happens with DRI2?
Comment 4 Chris Wilson 2014-07-19 20:46:22 UTC
Yes, the behaviour you describe sounds like broken userspace and we know that DRI3/Present has a feel issues.
Comment 5 Tobias Jakobi 2014-07-28 22:29:23 UTC
OK, the issue happened again.

Using LIBGL_DRI3_DISABLE=1 works, so it's really DRI3 specific.

I attached gdb, here's the bt:
#0  0x00007fa40950507d in poll () from /lib64/libc.so.6
#1  0x00007fa4084f5c22 in ?? () from /usr/lib64/libxcb.so.1
#2  0x00007fa4084ff811 in xcb_wait_for_special_event ()
   from /usr/lib64/libxcb.so.1
#3  0x00007fa40a80babd in ?? () from /usr/lib64/libGL.so.1
#4  0x00007fa40a80c802 in ?? () from /usr/lib64/libGL.so.1
#5  0x00007fa3fd6d54dd in ?? () from /usr/lib64/dri/i965_dri.so
#6  0x00007fa3fd6d5a83 in ?? () from /usr/lib64/dri/i965_dri.so
#7  0x00007fa3fd6da9c6 in ?? () from /usr/lib64/dri/i965_dri.so
#8  0x00007fa3fd57143d in ?? () from /usr/lib64/dri/i965_dri.so
#9  0x0000000000439f7e in ?? ()
#10 0x000000000043a576 in ?? ()
#11 0x000000000043aa2f in ?? ()
#12 0x000000000049d469 in ?? ()
#13 0x000000000049f8b8 in ?? ()
#14 0x0000000000416029 in ?? ()
#15 0x00007fa409445bf5 in __libc_start_main () from /lib64/libc.so.6
#16 0x000000000041653d in ?? ()

Yeah, this is not terribly good (most of my stack isn't build with debug symbols), from from the looks of it, this seems similar to bug #81623

libxcb is git tip though, so the fix proposed in the other bug doesn't work for me (I doesn't appear to work for the WebKit issue either).
Comment 6 Tobias Jakobi 2014-08-16 23:53:20 UTC
Looks like suspend/resume is only one way to trigger this. I can get it to "lock up" more reliable by using HL2.

Then thread 2 (MatQueue0), which looks to be the rendering threads, gets stuck at:
#0  0xf7768d10 in __kernel_vsyscall ()
#1  0xf739afbc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xf763396c in pthread_cond_wait () from /lib/libc.so.6
#3  0xf7125507 in ?? () from /usr/lib/libxcb.so.1
#4  0xf7131965 in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
#5  0xf74e9be7 in ?? () from /usr/lib/libGL.so.1
#6  0xf74ea3e3 in ?? () from /usr/lib/libGL.so.1
#7  0xf74eb058 in ?? () from /usr/lib/libGL.so.1
#8  0xf43bd36b in ?? () from /usr/lib/dri/radeonsi_dri.so
#9  0xf43ba240 in ?? () from /usr/lib/dri/radeonsi_dri.so
#10 0xf42f7578 in ?? () from /usr/lib/dri/radeonsi_dri.so
#11 0xf42f8760 in ?? () from /usr/lib/dri/radeonsi_dri.so
#12 0xf42c3e3d in ?? () from /usr/lib/dri/radeonsi_dri.so
#13 0xf42ca145 in ?? () from /usr/lib/dri/radeonsi_dri.so
#14 0xf4173697 in ?? () from /usr/lib/dri/radeonsi_dri.so
#15 0xf589fe13 in ?? ()
   from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libtogl.so
Comment 7 Olivier Blin 2014-08-20 15:15:43 UTC
This looks like bug 81623, for which there is a fix in the xcb present extension.
Comment 8 Tobias Jakobi 2014-08-20 15:30:07 UTC
presentproto is git master tip here, so no, this is not fixed.
Comment 9 Tobias Jakobi 2014-09-08 16:46:01 UTC
Updated to vanilla 3.16.2 but the issue is still present.

I can still reliably trigger it with HL2, but only in fullscreen mode. When in window mode it works perfectly.
Comment 10 Tobias Jakobi 2014-09-23 14:06:48 UTC
I enabled a bit of debugging in mesa and got a much better backtrace:
#0  0xf771cd10 in __kernel_vsyscall ()
(gdb) bt
#0  0xf771cd10 in __kernel_vsyscall ()
#1  0xf7348c8c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xf75e55cc in pthread_cond_wait () from /lib/libc.so.6
#3  0xf70b751f in ?? () from /usr/lib/libxcb.so.1
#4  0xf70c13a5 in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
#5  0xf7498b49 in dri3_find_back (c=c@entry=0xa410000, priv=priv@entry=0x9fe23f0) at dri3_glx.c:1191
#6  0xf749932b in dri3_get_buffer (format=4099, buffer_type=buffer_type@entry=dri3_buffer_back, loaderPrivate=loaderPrivate@entry=0x9fe23f0, 
    driDrawable=<optimized out>) at dri3_glx.c:1217
#7  0xf7499fd0 in dri3_get_buffers (driDrawable=0xa455110, format=4099, stamp=0x9e706e0, loaderPrivate=0x9fe23f0, buffer_mask=<optimized out>, 
    buffers=0xb8ce6bec) at dri3_glx.c:1394
#8  0xf43a4cf3 in dri_image_drawable_get_buffers (statts_count=<optimized out>, statts=<optimized out>, images=<optimized out>, drawable=<optimized out>)
    at dri2.c:254
#9  dri2_allocate_textures (ctx=0xa455590, drawable=0x9e706e0, statts=0x9ff2b80, statts_count=2) at dri2.c:377
#10 0xf43a199c in dri_st_framebuffer_validate (stctx=0xa579000, stfbi=0x9e706e0, statts=0x9ff2b80, count=2, out=0xb8ce6cc0) at dri_drawable.c:83
#11 0xf42d0378 in st_framebuffer_validate (stfb=stfb@entry=0x9ff2800, st=st@entry=0xa579000) at ../../src/mesa/state_tracker/st_manager.c:200
#12 0xf42d1bb7 in st_manager_validate_framebuffers (st=st@entry=0xa579000) at ../../src/mesa/state_tracker/st_manager.c:862
#13 0xf4292bf6 in st_validate_state (st=st@entry=0xa579000) at ../../src/mesa/state_tracker/st_atom.c:180
#14 0xf429a6e5 in st_BlitFramebuffer (ctx=0x9ef0000, srcX0=0, srcY0=0, srcX1=1920, srcY1=1080, dstX0=0, dstY0=1080, dstX1=1920, dstY1=0, mask=16384, 
    filter=9728) at ../../src/mesa/state_tracker/st_cb_blit.c:94
#15 0xf41358c3 in _mesa_BlitFramebuffer (srcX0=0, srcY0=0, srcX1=1920, srcY1=1080, dstX0=0, dstY0=1080, dstX1=1920, dstY1=0, mask=16384, filter=9728)
    at ../../src/mesa/main/blit.c:509
#16 0xf5831e13 in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libtogl.so
#17 0xf5832617 in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libtogl.so
#18 0xf5823c3b in IDirect3DDevice9::Present(_RECT const*, _RECT const*, void*, RGNDATA const*) ()
   from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libtogl.so
#19 0xecd6646d in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/shaderapidx9.so
#20 0xf14edfde in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/materialsystem.so
#21 0xf14ee31d in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/materialsystem.so
#22 0xf14ce644 in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/materialsystem.so
#23 0xf14d684a in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/materialsystem.so
#24 0xf14d6697 in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/materialsystem.so
#25 0xf5d0b49a in ?? () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libvstdlib.so
#26 0xf5df29e0 in CThread::ThreadProc(void*) () from /mnt/extern/superNova/steam-native/SteamApps/common/Half-Life 2/bin/libtier0.so
#27 0xf7344f45 in start_thread () from /lib/libpthread.so.0
#28 0xf75d88ee in clone () from /lib/libc.so.6

------------------------------------------------------------

So it looks like that the "present idle notify" event from X never arrives here and that xcb_wait_for_special_event then waits forever.
Comment 11 Tobias Jakobi 2014-09-23 14:09:50 UTC
Created attachment 106739 [details]
full backtrace (with debug)
Comment 12 Tobias Jakobi 2014-09-23 14:54:59 UTC
And here's the rest of debug backtrace (into libxcb):
#0  0xf7737d10 in __kernel_vsyscall ()
No symbol table info available.
#1  0xf7364c8c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#2  0xf76015cc in pthread_cond_wait () from /lib/libc.so.6
No symbol table info available.
#3  0xf70c4fb4 in _xcb_conn_wait (c=0x9f7c000, cond=0x9b51148, vector=0x0, 
    count=0x0)
    at /var/tmp/portage/x11-libs/libxcb-9999/work/libxcb-9999/src/xcb_conn.c:447
        ret = 167231500
        fd = {fd = 0, events = 0, revents = 0}
#4  0xf70c76d8 in xcb_wait_for_special_event (c=c@entry=0x9f7c000, 
    se=0x9b51130)
    at /var/tmp/portage/x11-libs/libxcb-9999/work/libxcb-9999/src/xcb_in.c:716
        event = 0x0
#5  0xf74b4b49 in dri3_find_back (c=c@entry=0x9f7c000, 
    priv=priv@entry=0x9b4e3f0) at dri3_glx.c:1191
        b = <optimized out>
        ev = <optimized out>
        ge = <optimized out>
Comment 13 Tobias Jakobi 2014-09-23 16:08:37 UTC
keithp pointed me into the right direction on IRC. Switching AccelMethod from SNA to UXA removes the issue, to this most likely is an effect of bug #81551.

Please note that the SNA patch proposed there is already applied, so it doesn't fix my issue.
Comment 14 Tobias Jakobi 2014-09-23 16:35:58 UTC
Since this is no longer about suspend/resume (as mentioned in the description), I'm closing this one and opening a new (cleaned up) one.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.