Bug 32288 - [bisected piketon] changing resolution cause ut2004 to hang, with VGA and compiz enabled
[bisected piketon] changing resolution cause ut2004 to hang, with VGA and com...
Status: VERIFIED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
unspecified
All Linux (All)
: high major
Assigned To: Chris Wilson
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-09 23:35 UTC by fangxun
Modified: 2011-06-09 03:43 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg file (33.88 KB, text/plain)
2010-12-09 23:35 UTC, fangxun
no flags Details
dmesg with 'echo t > /proc/sysrq-trigger' (123.16 KB, text/plain)
2010-12-12 23:32 UTC, fangxun
no flags Details
ut2004.txt is dmesg infomation with drm.debug=0xe on sugarbay (124.48 KB, text/plain)
2011-03-20 20:14 UTC, bo.b.wang
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description fangxun 2010-12-09 23:35:34 UTC
Created attachment 40982 [details]
dmesg file

System Environment:
--------------------------
Arch:           x86_64
Platform:       piketon
Libdrm:		(master)2.4.22-21-g537703fd4805e9cd352965fce642670986822d22
Mesa:		(master)05e534e6c4395269b1ca3a9694a1f437363dd186
Xserver:  (server-1.9-branch)xorg-server-1.9.2.902
Xf86_video_intel: (master)2.13.901-25-g9b967807c2d240488a715509649663aac3583532
Kernel:	(drm-intel-fixes) 1b39d6f37622f1da70aa2cfd38bfff9a52c13e05


Bug detailed description:
------------------------
The game hangs when change resolution or exit the game. It's not GPU hang. No issue happens if compiz disabled. It's kernel regression. 
Backtrace:
#0  0x0000003c102d7dd8 in poll () from /lib64/libc.so.6
#1  0x00007fcd39cdb87a in _xcb_conn_wait (c=0x1a19ea0, cond=<value optimized out>, vector=0x0, count=0x0) at xcb_conn.c:306
#2  0x00007fcd39cdd57a in xcb_wait_for_event (c=0x1a19ea0) at xcb_in.c:437
#3  0x00007fcd3a146bc8 in _XReadEvents (dpy=0x1a18980) at xcb_io.c:342
#4  0x00007fcd3a13420f in XMaskEvent (dpy=0x1a18980, mask=131072, event=0x7fffaf26e170) at MaskEvent.c:75
#5  0x00007fcd3a74d3dc in ?? ()
#6  0x0000000001a25370 in ?? ()
#7  0x00007fcd3a14685f in _XFreeReplyData (dpy=0x1a18390, rep=0x7fffaf26e170, extra=0, discard=1) at xcb_io.c:490
#8  _XReply (dpy=0x1a18390, rep=0x7fffaf26e170, extra=0, discard=1) at xcb_io.c:648
#9  0x00007fcd3a142013 in XSync (dpy=0x1a23dc0, discard=0) at Sync.c:44
#10 0x0000000090000002 in ?? ()
#11 0x0000000001a18790 in ?? ()
#12 0x0000000001a18390 in ?? ()
#13 0x0000000090000002 in ?? ()
#14 0x00007fcd3a74fcb6 in ?? ()
#15 0x0000000000000000 in ?? ()

Bisect shows the first bad commit is 1b39d6f37622f1da70aa2cfd38bfff9a52c13e05.
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Mon Dec 6 11:20:45 2010 +0000
Commit:     Chris Wilson <chris@chris-wilson.co.uk>
CommitDate: Tue Dec 7 22:46:11 2010 +0000

    drm/i915/dp: Only apply the workaround if the select is still active

    As we may try to power down the link at various times, it is not
    necessarily still coupled with an encoder and so we must be careful not
    to depend upon an operation that is only valid when the link is still
    attached to a pipe.

    Fixes regression in 5bddd17.


Reproduce steps:
----------------
1. run ut2004
2. exit the game or change resolution in game "settins".
Comment 1 Chris Wilson 2010-12-10 03:03:37 UTC
You're doing better than me, I'm hitting the glx DrawableGone crash in the xserver first...
Comment 2 Chris Wilson 2010-12-10 04:59:17 UTC
Can you please do a 'echo t > /proc/sysrq-trigger' and grab the dmesg? Do you see a similar hang with just changing the resolution using xrandr?
Comment 3 fangxun 2010-12-12 23:32:59 UTC
Created attachment 41052 [details]
dmesg with 'echo t > /proc/sysrq-trigger'

I don't see a similar hang with just changing the resolution using xrandr.
Comment 4 Chris Wilson 2010-12-13 04:13:45 UTC
It was a long shot. Lots of silly processes adding to the noise, we only managed to capture that compiz was idle (in poll()) and not what X was doing. Though it should be safe to conclude that ut2004 itself had finised.
Comment 5 Chris Wilson 2010-12-14 04:22:37 UTC
I think I uncovered a related bug on drm-intel-next, where we are waiting for an IRQ with interrupts disabled during modesetting.

The only question is how much of the fix is also applicable to -fixes and how on it might relate to this bug/bisection?
Comment 6 Chris Wilson 2010-12-15 04:28:56 UTC
How widespread is the regression? Have you seen similar failures on the other stable platforms? (After reverting the PIPE_CONTROL removal) Do you see a similar failure on drm-intel-next (which in theory has the related bug fix)?
Comment 7 fangxun 2010-12-16 01:59:29 UTC
Tested on 965gm and capella with stable kernel(drm-intel-fixes), I don't see similar failures. Tested on pikteton with unstable kernel(drm-intel-next) that after reverting the PIPE_CONTROL removal, similar failure happens. BTW, use drm-intel-next(8d5203ca62539c6ab36a5bc2402c2de1de460e30) that before reverting the PIPE_CONTROL removal, similar failure also happens.
Comment 8 Chris Wilson 2010-12-16 04:09:52 UTC
Thanks, what's the display connected to the piketon? Any DP?
Comment 9 Chris Wilson 2010-12-16 08:17:32 UTC
I've not reproduced this so far on any platform, the closest to piketon I have is Arrandale+LVDS.
Comment 10 Chris Wilson 2010-12-17 01:34:30 UTC
And I don't see it on SNB either. ;-)
Comment 11 fangxun 2010-12-17 02:26:28 UTC
VGA connected on piketon. I see the similar failures on SNB with VGA connected.
But with DP connected, it works fine both on piketon and SNB.
Comment 12 Chris Wilson 2010-12-17 03:08:42 UTC
* scratches head.

I've got a VGA panel hooked up to the SNB as well. The bisection simply makes no sense, can I ask you to double check?
Comment 13 Chris Wilson 2010-12-17 07:03:47 UTC
Even just to confirm that a revert of 1b39d6f3 fixes ut2004.
Comment 14 fangxun 2010-12-20 01:02:16 UTC
Confirm that the revert fixes ut2004.
Comment 15 Chris Wilson 2010-12-20 12:38:08 UTC
Any chance this is related to:

commit 541cc966915b6756e54c20eebe60ae957afdb537
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 6 11:24:07 2010 +0000

    drm: Don't try and disable an encoder that was never enabled

    Prevents code that assumes that the encoder is active when asked to be
    disabled from dying a horrible death.

    Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

i.e. does reverting that (which has been identified to cause other modesetting failures) help?
Comment 16 fangxun 2010-12-21 01:05:14 UTC
After revert 541cc966915b6756e54c20eebe60ae957afdb537, the failures still happens.
Comment 17 Gordon Jin 2011-01-24 22:12:55 UTC
Chris, do you need access this machine?
Comment 18 Chris Wilson 2011-03-02 02:16:20 UTC
I'm still no wiser as to why a change on what should be an unused code path (for this system) would be causing this regression. Nevertheless there have been the usual bug fixes, and in particular the uninterruptible modesetting fix.
Comment 19 fangxun 2011-03-04 00:35:03 UTC
Retest with latest code, it still happens on piketon and SNB.
Comment 20 Chris Wilson 2011-03-20 04:08:30 UTC
(In reply to comment #19)
> Retest with latest code, it still happens on piketon and SNB.

Does Jesse's modesetting checks detect anything amiss? Can you update the dmesg (with drm.debug=0xe) and include one for the SNB?

I'm still at a loss as to the cause here - this code should not even be touched for your non-DP system configuration! :|
Comment 21 bo.b.wang 2011-03-20 20:14:21 UTC
Created attachment 44643 [details]
ut2004.txt is dmesg infomation with drm.debug=0xe on sugarbay
Comment 22 Chris Wilson 2011-03-21 00:13:17 UTC
So this is probably the cause of the SNB behaviour:

[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 94239, at 94239], missed IRQ?

I suspect the SNB has a comppletely different bug to piketon and should be filed separately.
Comment 23 Gordon Jin 2011-03-23 18:56:24 UTC
Bo, please file a separate bug for SNB.
Comment 24 bo.b.wang 2011-03-23 19:07:28 UTC
(In reply to comment #23)
> Bo, please file a separate bug for SNB.

I have reported a new Bug.Bug number: 35535
Comment 25 Chris Wilson 2011-05-05 00:10:30 UTC
*crosses fingers*

Is this fixed by:

commit 31acbcc408f412d1ba73765b846c38642be553c3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Apr 17 06:38:35 2011 +0100

    drm/i915/dp: Be paranoid in case we disable a DP before it is attached
    
    Given that the hardware may be left in a random condition by the BIOS,
    it is conceivable that we then attempt to clear the DP_PIPEB_SELECT bit
    without us ever enabling/attaching the DP encoder to a pipe. Thus
    causing a NULL deference when we attempt to wait for a vblank on that
    crtc.
    
    Reported-and-tested-by: Bryan Christ <bryan.christ@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36314
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36456
    Reported-and-tested-by: Bo Wang <bo.b.wang@intel.com>
    Cc: stable@kernel.org
    Signed-off-by: Keith Packard <keithp@keithp.com>
Comment 26 fangxun 2011-05-06 03:40:33 UTC
Tested on piketon and huronriver, no issue happens.  I need more testing to confirm this.
Comment 27 Gordon Jin 2011-06-08 23:20:56 UTC
Closing.
Comment 28 fangxun 2011-06-09 03:43:06 UTC
Verified with drm-intel-next commit da3cc9202697a44057c1bd3ad685689375f1fe0c and drm-intel-fixes commit 2fb4e61d9471867677c97bf11dba8f1e9dfa7f7c.