Bug 28785

Summary: [945GM] [pf] GPU hang on 2010Q2
Product: xorg Reporter: Vasily Khoruzhick <anarsoul>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: cfeck
Version: 7.5 (2009.10)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915_error_state from debugfs
none
kdm.log with some errors from intel driver
none
Xorg.0.log
none
part of dmesg output none

Description Vasily Khoruzhick 2010-06-28 01:09:19 UTC
Created attachment 36560 [details]
i915_error_state from debugfs

I'm getting GPU hang with 2010Q2 package (i.e. kernel-2.6.34, xf86-video-intel-2.12.0, libdrm-2.4.21, mesa-7.8.2) in KDE4 with effects (opengl-based) enabled, there's no special steps to reproduce, GPU hangs with no visible reason.
Comment 1 Vasily Khoruzhick 2010-06-28 01:10:08 UTC
Created attachment 36561 [details]
kdm.log with some errors from intel driver
Comment 2 Vasily Khoruzhick 2010-06-28 01:10:47 UTC
Created attachment 36562 [details]
Xorg.0.log
Comment 3 Vasily Khoruzhick 2010-06-28 01:11:29 UTC
Created attachment 36563 [details]
part of dmesg output
Comment 4 Chris Wilson 2010-06-28 01:57:47 UTC
Hmm, the batch buffer implicated seems sane. The GPU state looks like it is doing an instruction flush on the ringbuffer prior to updating the seqno. In short, quite baffling.
Comment 5 Vasily Khoruzhick 2010-06-28 02:05:50 UTC
(In reply to comment #4)
> Hmm, the batch buffer implicated seems sane. The GPU state looks like it is
> doing an instruction flush on the ringbuffer prior to updating the seqno. In
> short, quite baffling.

What about complaint about memory domain in kdm.log?
Btw, if you need more info - just ask.
Comment 6 Chris Wilson 2010-06-29 00:32:12 UTC
(In reply to comment #5)
> What about complaint about memory domain in kdm.log?

That's just a subsequent error after detecting the hung GPU.
Comment 7 Vasily Khoruzhick 2010-07-05 08:26:10 UTC
This patch http://openelec.git.sourceforge.net/git/gitweb.cgi?p=openelec/openelec;a=blob;f=packages/x11/driver/xf86-video-intel/patches/intel-2.11-no-pageflipping.diff seems to fix problem for me (no hangs for 1.5 day), so this bug is pageflipping related. I'd like to test driver with this patch for some days and then close bug.
Comment 8 Chris Wilson 2010-07-05 08:42:01 UTC
That's a debugging patch Jesse wrote so that the distributions could actually ship a driver whilst the root cause was resolved.

Jesse had also pushed a couple of page-flipping patches for the kernel in 2.6.35-rc4 (and a few more to the xserver).

Thanks for identifying the cause as being page-flipping!
Comment 9 Chris Wilson 2010-07-11 07:14:36 UTC
I think the last pair of patches Jesse pushed for 2.6.35-rc4 resolves the remaining page-flip issues on i945:


commit 1afe3e9d4335bf3bc5615e37243dc8fef65dac8f
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Mar 26 10:35:20 2010 -0700

    drm/i915: gen3 page flipping fixes
    
    Gen3 chips have slightly different flip commands, and also contain a bit
    that indicates whether a "flip pending" interrupt means the flip has
    been queued or has been completed.
    
    So implement support for the gen3 flip command, and make sure we use the
    flip pending interrupt correctly depending on the value of ECOSKPD bit
    0.
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>

commit 83f7fd055eb3f1e843803cd906179d309553967b
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Apr 5 14:03:51 2010 -0700

    drm/i915: don't queue flips during a flip pending event
    
    Hardware will set the flip pending ISR bit as soon as it receives the
    flip instruction, and (supposedly) clear it once the flip completes
    (e.g. at the next vblank).  If we try to send down a flip instruction
    while the ISR bit is set, the hardware can become very confused, and we
    may never receive the corresponding flip pending interrupt, effectively
    hanging the chip.
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.