Bug 31401 - [bisected Xfree86 GPU hang]ut2004’s running will cause GPU hang
Summary: [bisected Xfree86 GPU hang]ut2004’s running will cause GPU hang
Status: VERIFIED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium normal
Assignee: Carl Worth
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-04 18:45 UTC by wang,jinjin
Modified: 2010-11-07 22:25 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
i915_error_state (933.83 KB, text/plain)
2010-11-04 19:19 UTC, wang,jinjin
no flags Details
Xorg.0.log (29.45 KB, text/plain)
2010-11-04 19:21 UTC, wang,jinjin
no flags Details

Description wang,jinjin 2010-11-04 18:45:38 UTC
System Environment:
Platform:        Piketon
Libdrm:         (master)2.4.22-12-ga52e61b5c888444435929a2770f14109c3a94f2f
Mesa:           (master)d3fcadf8400360f4db45a4deb45b3b260e880b49
Xserver:                (master)xorg-server-1.9.0-184-ga52efb096e166e325deb3d6b502671f339a4fa15
Xf86_video_intel:               (master)2.12.902-43-g52b32436b9e14a3e13818f80102150ff5bc3c002
Cairo:          (master)84a7fe8a5c5326d77b0954be439799202e947d6b
Kernel: (drm-intel-next)46168f39360f419e59952d58cd08a862886ec8cd

Detailed description:
-----------------------------------
When ut2004 had run over, I found some GPU hang messages from dmesg.
Before ut2004’s running, I cleared the dmesg with “dmesg –c”.So,that issue should due to ut2004. 
So, I tried bisected it and found the first bad commit was:
commit 8ff37667bf864b771d16a58fc5041cb48408b6a8
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Nov 2 10:36:03 2010 -0700

    Remove the intermittent GEM_THROTTLE call.

Information
--------------------------------------
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  129 (XFree86-VidModeExtension)
  Minor opcode of failed request:  10 (XF86VidModeSwitchToMode)
  Value in failed request:  0x400014
  Serial number of failed request:  203
  Current serial number in output stream:  205
Dmesg:
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 645545 at 645035, next 645546)
[drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:3]

Reproduce steps:
----------------
1. xinit &
2. ut2004-bin 'dm-rankin?spectatoronly=1?numbots=12?quickstart=1?attractcam=1' -benchmark -seconds=77 -ini=/usr/local/games/ut2004demo/Benchmark/default_xpk.ini -exec=../Benchmark/Stuff/botmatchexec.txt
Comment 1 wang,jinjin 2010-11-04 19:19:25 UTC
Created attachment 40057 [details]
i915_error_state
Comment 2 wang,jinjin 2010-11-04 19:21:15 UTC
Created attachment 40058 [details]
Xorg.0.log
Comment 3 Chris Wilson 2010-11-05 02:10:27 UTC
This is the classic GPU hang on modeset [0x01820000: MI_WAIT_FOR_EVENT] (we switch framebuffers whilst there is still pending rendering in the pipeline). The throttling would have kept that pipeline small enough that we only rarely hit this race before. However, drm-intel-next was supposed to contain magic to fix this without resorting to a GPU reset.

Can you trigger the hang again and attach the contents of /sys/kernel/debug/dri/0/i915_ringbuffer_info?
Comment 4 wang,jinjin 2010-11-05 02:30:00 UTC
I just only run Urban terror on pk2.In fact, it also cause GPU hang. But openarena did not had the issue.

The i915_ringbuffer_info will be :
Ring render ring:
  Head :    00000268
  Tail :    00000268
  Size :    00020000
  Active :  00000268
  Control : 0001f003
  Start :   02001000
either Urban terror or ut2004 run over
Comment 5 Chris Wilson 2010-11-05 02:42:35 UTC
*sigh. Another example of the documentation not matching reality.

"RB_WAIT: Indicates that this ring has executed a WAIT_FOR_EVENT instruction and is currently waiting."

Obviously this does not apply if the command was executed via a batchbuffer. Time for plan B.
Comment 6 Chris Wilson 2010-11-05 03:10:02 UTC
I applied:


commit a44a63d2ff6c01c3dc61de6f736dd441ddd25e52
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 5 09:58:45 2010 +0000

    Wait for any pending rendering before switching modes.
    
    A perennial problem we have is the accursed WAIT_FOR_EVENT hangs, which
    occur when we switch the framebuffer before the WAIT_FOR_EVENT completes
    and upsets the GPU.
    
    We have tried more subtle approaches to detected these and fix them up in
    the kernel, to no avail. What we need to do is to delay the framebuffer
    flip until the WAIT completes, which is quite tricky in the kernel
    without new ioctls and round-trips. Instead, apply the big hammer from
    userspace and synchronise all rendering before changing the framebuffer.
    I expect this not to cause noticeable latency on switching modes (far
    less than the actual modeswitch) and should stop these hangs once and
    for all.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31401 (...)
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

to xf86-video-intel, which I think should prevent these hangs (once and for all!)
Comment 7 wang,jinjin 2010-11-07 22:24:46 UTC
I verified it with newest commit, that issue did not happen.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.