Bug 28274

Summary: xscreensaver's glknots hangs GPU (945GME/Pineview)
Product: Mesa Reporter: Stefan Dirsch <sndirsch>
Component: Drivers/DRI/i915Assignee: Eric Anholt <eric>
Status: RESOLVED DUPLICATE QA Contact:
Severity: normal    
Priority: medium CC: eich, mat, oceans112
Version: unspecified   
Hardware: Other   
OS: All   
See Also: https://bugzilla.novell.com/show_bug.cgi?id=608810
Whiteboard:
i915 platform: i915 features:
Attachments: intel_reg_dumper output
intel_gpu_dump output

Description Stefan Dirsch 2010-05-27 01:26:15 UTC
xscreensaver's glknots hangs GPU. This occurs on 945GM (8086:27ae), but also on Pineview (8086:a001). Easy to reproduce (might need two attempts)

/usr/lib/xscreensaver/glknots

==> window remains black, only cursor still updates on screen (even after
    restarting Xserver) ==> requires reboot to fix the issue

Xorg.0.log
[...]
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
[...]

/var/log/messages:
[...]
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 24709 at 24706)

I don't see that issue with any other GL xscreensaver.
Comment 1 Stefan Dirsch 2010-05-27 01:40:08 UTC
Software stack is:

- libdrm 2.4.17 including
  - libdrm-intel-d1308f4+85fb3e5+e6c136c+04f90a4-getXXX-retry.diff
    * Fix xserver crashes due to race conditions in getResources & co.
      (bnc #590596, 590596, 590596)
  - libdrm-intel-fdcde59-Account-for-potential-pinned-buffers-hogging.patch
    * Fix potential kernel oopses
  - libdrm-commit-4f0f871.diff
    * fixes a bug in the intel support which was causing dramatic
      failures with at least version 2.10 of the xf86-video-intel
      driver (bfo #25475, bfo #25554, bnc #579489)
- xf86-video-intel 2.10.0
- xorg-server 1.6.5
- Mesa 7.7
- Kernel 2.6.32
- DRM kernel drivers (drm/drm_kms_helper/i915) from Kernel 2.6.34rc7
Comment 2 Stefan Dirsch 2010-05-27 01:42:30 UTC
I could try with a newer software stack, but I doubt that this would fix the issue. @Intel: Any chance to attempt to reproduce this on any of your Pineview
machine? I mean Pineview is the current Netbook platform from Intel, so that 
should still be available for Intel engineers ...
Comment 3 Matthias Hopf 2010-05-27 03:26:51 UTC
Created attachment 35879 [details]
intel_reg_dumper output

intel_reg_snapshot freezed the machine completely.
Comment 4 Matthias Hopf 2010-05-27 03:27:53 UTC
Created attachment 35880 [details]
intel_gpu_dump output
Comment 5 Gordon Jin 2010-05-27 18:40:08 UTC
I get different result on my 945GM and Pineview with latest upstream code:

it sometimes works ok, and sometimes aborts with error message:
glknots: intel_batchbuffer.c:164: _intel_batchbuffer_flush: Assertion `used <= batch->buf->size' failed.
Aborted
Comment 6 Stefan Dirsch 2010-05-27 20:37:13 UTC
Could be that we disabled assertions in Mesa, but I can't see that at the moment. I might mix this up with a different library or part of X stack.
Comment 7 Matthias Hopf 2010-05-28 09:10:19 UTC
(In reply to comment #5)
> I get different result on my 945GM and Pineview with latest upstream code:
> 
> it sometimes works ok, and sometimes aborts with error message:
> glknots: intel_batchbuffer.c:164: _intel_batchbuffer_flush: Assertion `used <=
> batch->buf->size' failed.
> Aborted

It looks like you're running a composition manager - we get the same error (appart from being in line 165) when run under mutter.

Can you please retry without a composition manager?
Comment 8 Chris Wilson 2010-05-31 02:02:56 UTC
commit 8accf0a891c85c7d747c5f7f4a4d8a99adb91b2a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon May 31 09:58:25 2010 +0100

    intel: Initialize batch->reserved_space on allocation
    
    Fixes the assert (and buffer overrun):
    
      glknots: intel_batchbuffer.c:164: _intel_batchbuffer_flush: Assertion
      'used >= batch->buf->size' failed.
    
    Reported in bug:
    
      Bug 28274 - xscreensaver's glknots hangs GPU (945GME/Pineview)
      https://bugs.freedesktop.org/show_bug.cgi?id=28274

I suspect this is the cause of the GPU hang as well.
Comment 9 Stefan Dirsch 2010-05-31 03:08:39 UTC
Chris, in which git repo can we find that commit? It isn't really obvious to us.
Comment 10 Matthias Hopf 2010-05-31 03:23:08 UTC
It's in mesa master.

Chris, I'm thankful that you dug into this issue :-)

Still, this can only be understood as a workaround, in the long term the kernel must be fixed. Though this shouldn't be your concern.

A user space program MUST NOT be able to crash hardware, otherwise this is a DoS security issue. The intel drm implementation still seems to lack proper input validation...
Comment 11 Tomas M. 2010-05-31 04:24:32 UTC

*** This bug has been marked as a duplicate of bug 26645 ***
Comment 12 Chris Wilson 2010-06-30 01:47:52 UTC
*** Bug 28195 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.