Bug 27035 - [vblank, suspend/resume] glxgears window black after resuming (S3 and S4) or switching VT back
[vblank, suspend/resume] glxgears window black after resuming (S3 and S4) or ...
Status: VERIFIED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
unspecified
All Linux (All)
: high major
Assigned To: Jesse Barnes
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-12 00:13 UTC by fangxun
Modified: 2010-07-19 04:21 UTC (History)
2 users (show)

See Also:


Attachments
Screenshot showing the problem (87.69 KB, image/jpeg)
2010-03-12 00:13 UTC, fangxun
no flags Details
Xorg log (43.37 KB, text/plain)
2010-03-12 00:14 UTC, fangxun
no flags Details
dmesg_after_resume (38.73 KB, text/plain)
2010-03-12 00:15 UTC, fangxun
no flags Details
disable page flipping but leave events (1007 bytes, patch)
2010-04-12 10:16 UTC, Jesse Barnes
no flags Details | Splinter Review
don't sync redirected windows (562 bytes, patch)
2010-06-24 15:37 UTC, Jesse Barnes
no flags Details | Splinter Review
another approach to avoiding client hangs at VT switch time (2.33 KB, patch)
2010-06-28 17:03 UTC, Jesse Barnes
no flags Details | Splinter Review
keep DRI2 clients suspended at VT switch (3.94 KB, patch)
2010-06-28 17:04 UTC, Jesse Barnes
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2010-03-12 00:13:29 UTC
Created attachment 33978 [details]
Screenshot showing the problem

Platform:       G45
Mesa:           (7.8)54af54277a7a469ed2b9821ef6ed7ed464381f91
Xserver:        (master)f2eacb4646beb25d055de22868f93e6b24f229b6
Xf86_video_intel:(master)318aa9ed799197810e2039dbe3ec51559dcc888c
Libdrm:         (master)04fd3872ee8bd8d5e2c27740508c67c2d51dbc11
Kernel:  (master)60b341b778cc2929df16c0a504c91621b3c6a4ad


Bug detailed description:
-------------------------
Start glxgears on gnome desktop, then do S4(Suspend/resume from disk). After system restore, glxgears stop printing info like fps, and glxgears window is black. It works well on X window(don't start gnome). This issue happens on all platform. It is regression. It works fine with code on January 18th. I will bisect this on next week.


Reproduce steps:
----------------
1.Start X and gnome-session 
2.run glxgears
3.echo disk > /sys/power/state
4.press power button to restore
Comment 1 fangxun 2010-03-12 00:14:54 UTC
Created attachment 33979 [details]
Xorg log
Comment 2 fangxun 2010-03-12 00:15:28 UTC
Created attachment 33980 [details]
dmesg_after_resume
Comment 3 fangxun 2010-03-17 03:03:08 UTC
Bisect result:

It is Xf86_video_intel commit caused this problem. 
The last good commit is 4902f546be19e3d5bb47f6c75e2199dc4856c0f4.
Ater this commit, glxgears failed because DRI2 issue until commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1. This issue can be reproduced on commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1, so I think it is the first bad commit. 

commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1
Author: Keith Packard <keithp@keithp.com>
Date:   Fri Jan 29 23:28:46 2010 -0800

    Initialize DRI2 info rec version 4 list of driver names

    With DRI2 supporting multiple subsystems, the video driver must
    initialize the list of driver names instead of just passing the single
    driver name used by Mesa. Without this, the X server will fail to
    initialize DRI2 as the numDrivers field in this structure will be
    uninitialized.

    Signed-off-by: Keith Packard <keithp@keithp.com>
Comment 4 Ian Romanick 2010-03-29 18:24:06 UTC
(In reply to comment #3)
> Bisect result:
> 
> It is Xf86_video_intel commit caused this problem. 
> The last good commit is 4902f546be19e3d5bb47f6c75e2199dc4856c0f4.
> Ater this commit, glxgears failed because DRI2 issue until commit
> 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1. This issue can be reproduced on
> commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1, so I think it is the first bad
> commit. 
> 
> commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1
> Author: Keith Packard <keithp@keithp.com>
> Date:   Fri Jan 29 23:28:46 2010 -0800
> 
>     Initialize DRI2 info rec version 4 list of driver names
> 
>     With DRI2 supporting multiple subsystems, the video driver must
>     initialize the list of driver names instead of just passing the single
>     driver name used by Mesa. Without this, the X server will fail to
>     initialize DRI2 as the numDrivers field in this structure will be
>     uninitialized.
> 
>     Signed-off-by: Keith Packard <keithp@keithp.com>

It think this commit is a red herring.  It looks like this patch will just re-enable the DRI2 paths in the driver.  My guess is that the bug actually lies there.  I'm also suspicious that it has the same root cause as bug #27040 and / or bug #27190.  There is a patch series referenced in those bugs.  I'd like to see this bug tested with this patch series.
Comment 5 fangxun 2010-03-30 00:13:48 UTC
Tested with the patch series, it still fails.

By the way, with recent test, we find S3 and switch back to cosole mode may also cause glxgears window blank.
Comment 6 David Härdeman 2010-03-31 00:49:01 UTC
I've seen the same behaviour for at least 4 - 5 months (from the first time I tested it). Also, my computer doesn't blank the OpenGL app...it reboots (G45 graphics on a DG45FC motherboard), so I'm not sure this is a regression.

Anyways, my bug report (which is probably a dupe of this one) is bug #26451


Comment 7 Jesse Barnes 2010-04-02 08:50:40 UTC
After resume, do you see interrupts coming in for the i915 device (just grep i915 /proc/interrupts)?  It would be good to see where glxgears is blocked in the server too, what was the last request it sent before the hang?
Comment 8 fangxun 2010-04-06 02:47:52 UTC
After resume, I see interrupts coming in for the i915 device. Following is Backtrace: 
glxgears Backtrace:
#0  0x00000030de2d4f38 in poll () from /lib64/libc.so.6
#1  0x00007fba53aba88a in _xcb_conn_wait (c=0x1ce9b20, cond=<value optimized out>, vector=0x0, count=0x0) at xcb_conn.c:306
#2  0x00007fba53abc8fc in xcb_wait_for_reply (c=0x1ce9b20, request=2153, e=0x7fff6cd52bb8) at xcb_in.c:390
#3  0x00007fba5444362f in _XReply (dpy=0x1ce9010, rep=0x7fff6cd52c20, extra=0, discard=0) at xcb_io.c:454
#4  0x00007fba547b5e13 in DRI2GetBuffersWithFormat (dpy=0x1ce9010, drawable=<value optimized out>, width=0x1cfbc84, height=0x1cfbc88, 

attachments=0x7fff6cd52d20, count=2,
    outCount=0x7fff6cd52d5c) at dri2.c:441
#5  0x00007fba547b4729 in dri2GetBuffersWithFormat (driDrawable=<value optimized out>, width=0x1cfbc84, height=0x1cfbc88, attachments=<value 

optimized out>,
    count=<value optimized out>, out_count=0x7fff6cd52d5c, loaderPrivate=0x1cfbb90) at dri2_glx.c:444
#6  0x00007fba53116cca in intel_update_renderbuffers (context=<value optimized out>, drawable=0x1cfbc50) at intel_context.c:252
#7  0x00007fba53117313 in intel_prepare_render (intel=0x1d023e0) at intel_context.c:395
#8  0x00007fba531359e0 in brw_try_draw_prims (max_index=<value optimized out>, min_index=<value optimized out>, ib=<value optimized out>, 

nr_prims=<value optimized out>,
    prim=<value optimized out>, arrays=<value optimized out>, ctx=<value optimized out>) at brw_draw.c:340
#9  brw_draw_prims (max_index=<value optimized out>, min_index=<value optimized out>, ib=<value optimized out>, nr_prims=<value optimized 

out>,
    prim=<value optimized out>, arrays=<value optimized out>, ctx=<value optimized out>) at brw_draw.c:441
#10 0x00007fba531f4fc5 in vbo_exec_DrawArrays (mode=6, start=0, count=4) at vbo/vbo_exec_array.c:525
#11 0x00007fba53274354 in _mesa_meta_Clear (ctx=0x1d023e0, buffers=0) at drivers/common/meta.c:1466
#12 0x00007fba53115a47 in intelClear (ctx=0x1d023e0, mask=<value optimized out>) at intel_clear.c:182
#13 0x000000000040290e in draw () at glxgears.c:252
#14 0x00000000004031af in draw_gears () at glxgears.c:314
#15 draw_frame () at glxgears.c:339
#16 event_loop () at glxgears.c:689
#17 main () at glxgears.c:769

X server Backtrace:
#0  0x00000030de2d6f53 in __select_nocancel () from /lib64/libc.so.6
#1  0x000000000046b1bb in WaitForSomething (pClientsReady=0x38b56b0)
    at WaitFor.c:229
#2  0x0000000000429128 in Dispatch () at dispatch.c:375
#3  0x00000000004217c5 in main (argc=2, argv=0x7fff629107c8,
    envp=<value optimized out>) at main.c:286
Comment 9 Gordon Jin 2010-04-11 20:43:45 UTC
promoting to P1.

Jesse, can you reproduce?
Comment 10 fangxun 2010-04-12 01:41:40 UTC
With research we find this issue disappear if pageflip is disabled.
Comment 11 David Härdeman 2010-04-12 04:13:13 UTC
(In reply to comment #10)
> With research we find this issue disappear if pageflip is disabled.

How do you disable pageflip? Option "PageFlip" "false" in xorg.conf seems to be ignored...
Comment 12 fangxun 2010-04-12 04:41:32 UTC
I disable OptionPageFlip on drmmode_display.c(xf86_video_intel component). 

--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -1461,6 +1461,7 @@ Bool drmmode_pre_init(ScrnInfoPtr scrn, int fd, int cpp)
        gp.value = &has_flipping;
        (void)drmCommandWriteRead(intel->drmSubFD, DRM_I915_GETPARAM, &gp,
                                  sizeof(gp));
+       has_flipping=0;
        if (has_flipping) {
                xf86DrvMsg(scrn->scrnIndex, X_INFO,
                           "Kernel page flipping support detected, enabling\n");
Comment 13 Jesse Barnes 2010-04-12 10:16:41 UTC
Created attachment 34921 [details] [review]
disable page flipping but leave events

Can you try this patch instead?  It should disable page flipping but leave the vblank event code in place, which could narrow down the problem.
Comment 14 fangxun 2010-04-12 19:42:09 UTC
With your patch, this issue still happens.
Comment 15 Jesse Barnes 2010-04-13 09:19:27 UTC
Ok, so that means it's probably related to the vblank event code.  Thanks for the update.
Comment 16 Jesse Barnes 2010-05-10 11:46:39 UTC
Current 2D driver has some workarounds for vblank event handling & suspend/resume, can you test again with the latest bits?
Comment 17 fangxun 2010-05-13 03:59:19 UTC
Tested on G45 with current bits. It still fails.
Comment 18 Jesse Barnes 2010-06-01 09:54:31 UTC
The X 1.8 branch just got some fixes for issues like this, I'm retesting now with the latest bits to see if I can reproduce.
Comment 19 Jesse Barnes 2010-06-01 10:22:53 UTC
Works for me now on GM45 with current X server master (with the autoconf patch applied) & xf86-video-intel (with the patch from bug 28252 applied).
Comment 20 fangxun 2010-06-04 02:57:47 UTC
Tested on piketon and GM45, with compiz enabled, it still fails after resuming (S3 and S4) or switching VT back. If compiz disabled, it works when switching VT back, but fails after resuming (S3 and S4).
Comment 21 zhao jian 2010-06-07 03:31:48 UTC
Jesse, it seems not work on all platforms. We tested with the newest kernel on for-linus and code. And it works well on Piketon, but it still fails on GM45. And on G45 it can't be tested because with the newest kernel it will be black screen when it boot. As bug #27733 shows. So reopen it just for tracking it until it works well on G45, GM45.
Comment 22 Jesse Barnes 2010-06-07 10:28:39 UTC
Which versions were you running?  GM45 worked for me with compiz; I didn't see any failures.
Comment 23 zhao jian 2010-06-07 18:13:51 UTC
(In reply to comment #22)
> Which versions were you running?  GM45 worked for me with compiz; I didn't see
> any failures.

I tested with kernel in for-linus branch (e3a815fcd38043b8f1bb526123d8ab6ae01deb77). And other components as following: 
Libdrm:         (master)73a42a645201a85ce2fe4fc77754df67e5097fc9
Mesa:           (master)31a74a6df77daea9084c34b86f217f23a55e6b91
Xserver:                (master)5d4e2c594059ffb536c8e506c2623320d3c6a787
Xf86_video_intel:       (master)6db1e5231b7a0e79611f771d4efea686f7849e04
Comment 24 Jesse Barnes 2010-06-24 09:41:45 UTC
If you still see this can you capture some more information?  If you can VT switch after resume, you can probably ssh in as well and gdb the server or glxgears to see what they're waiting for.  If they're stuck in a "poll" or "select" call, please check the /proc/<pid>/wchan file to see what kernel mutex they're waiting on.
Comment 25 Jesse Barnes 2010-06-24 12:24:34 UTC
Ah I see this issue on my GM45 now with master of everything.  Checking it out...
Comment 26 Jesse Barnes 2010-06-24 14:36:26 UTC
Seems I can reproduce it with a simple VT switch too, so something is wrong with the way X consumes DRM events.

Also ignore comment #24 I see you already collected backtraces.
Comment 27 Jesse Barnes 2010-06-24 15:37:14 UTC
Created attachment 36481 [details] [review]
don't sync redirected windows

I don't know why yet, but somehow running under compiz causes this problem.  If both clients and the compositor are using events, when you VT switch back the client hangs.

This patch worked around the problem for me, can you confirm?
Comment 28 fangxun 2010-06-25 01:56:52 UTC
Yes, I confirm this patch fixes.
Comment 29 Jesse Barnes 2010-06-28 17:03:43 UTC
Created attachment 36587 [details] [review]
another approach to avoiding client hangs at VT switch time

Here's a server patch that should also fix the problem, closer to the root cause this time.
Comment 30 Jesse Barnes 2010-06-28 17:04:14 UTC
Created attachment 36588 [details] [review]
keep DRI2 clients suspended at VT switch

This one isn't strictly necessary, but makes DRI2 behave like GLX across VT switch.
Comment 31 Jesse Barnes 2010-06-30 10:56:43 UTC
Bug fixed in X master:

commit 28e33ae6f69f716ece5d68e63fc52557236c5f6e
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Wed Jun 30 07:59:04 2010 -0700

    OS support: fix writeable client vs IgnoreClient behavior

I'll request that it go into the 1.8 branch as well.
Comment 32 fangxun 2010-07-19 04:21:51 UTC
Works fine with current code, so marking it as verified.