Bug 86721

Summary: [GEN4] Graphics driver crashes occassionally
Product: Mesa Reporter: Maarten Jacobs <maarten256>
Component: Drivers/DRI/i965Assignee: Ian Romanick <idr>
Status: RESOLVED DUPLICATE QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: crash dump (/sys/class/drm/card0/error)

Description Maarten Jacobs 2014-11-26 05:20:38 UTC
Recently I started encountering prblems where my screen will go dark at the most inopportune moments.

I gathered the following snippet from /var/log/kern.log:

Nov 25 23:34:06 sony-laptop kernel: [  563.992062] [drm] stuck on render ring
Nov 25 23:34:06 sony-laptop kernel: [  563.992070] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 25 23:34:06 sony-laptop kernel: [  563.992073] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 25 23:34:06 sony-laptop kernel: [  563.992076] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 25 23:34:06 sony-laptop kernel: [  563.992078] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 25 23:34:06 sony-laptop kernel: [  563.992080] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 25 23:34:06 sony-laptop kernel: [  563.993312] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x547c000 ctx 0) at 0x547cff0
Nov 25 23:34:06 sony-laptop kernel: [  564.504051] [drm:i915_reset] *ERROR* Failed to reset chip.
Nov 25 23:34:08 sony-laptop kernel: [  565.892050] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2

So in following the instructions I am opening a bug on this :). Since this is not a problem related to Radeon (I appear to have an integrated Intel GPU), most other "similar" bugs are not relevant (or do not appear to be).

Unfortunately I have not yet saved the crashdump (/sys/class/drm/card0/error) so when I rebooted my machine the last dump appears to have been overwritten.

I'll attach a dump when the problem occurs again - assuming I can recreate it.

Some information on my system:

$ lscpu
Architecture:          i686
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 15
Stepping:              13
CPU MHz:               800.000
BogoMIPS:              2925.93
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K

$ uname -a
Linux sony-laptop 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:08:14 UTC 2014 i686 i686 i686 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=17
DISTRIB_CODENAME=qiana
DISTRIB_DESCRIPTION="Linux Mint 17 Qiana"

Please let me know what other information is required.
Comment 1 Maarten Jacobs 2014-11-26 05:36:07 UTC
I managed to recreate the problem (more on that in a bit). The kern.log indicates again that a crash dump was saved in /sys/class/drm/card0/error (like before), however a listing of that location shows:

$ ls -l /sys/class/drm/card0/
total 0
drwxr-xr-x 4 root root    0 Nov 26 00:15 card0-LVDS-1
drwxr-xr-x 3 root root    0 Nov 26 00:15 card0-VGA-1
-r--r--r-- 1 root root 4096 Nov 26 00:15 dev
lrwxrwxrwx 1 root root    0 Nov 26 00:15 device -> ../../../0000:00:02.0
-rw------- 1 root root    0 Nov 26 00:15 error
drwxr-xr-x 2 root root    0 Nov 26 00:15 power
lrwxrwxrwx 1 root root    0 Nov 26 00:15 subsystem -> ../../../../../class/drm
-rw-r--r-- 1 root root 4096 Nov 26 00:15 uevent

So maybe it didn't save anything. The machine is still in it's "darkened" state.

I recreated the issue by selecting a particular icon on the MS Outlook website... In Google Chrome. When I select the icon that is supposed to show me the contact list, the graphics driver goes into this state and the screen goes blank.

I have seen the same issue on other occasions, so I know there are other interactions that can cause the same. However this "contact list" function seems to recreate the issue consistently - at least it did so twice in a row.
Comment 2 Chris Wilson 2014-11-26 07:47:41 UTC
ls lies; /sys/class/drm/card0/error is a virtual file and doesn't have a size until it is opened. Just compress it and attach, after reproducing the hang.
Comment 3 Maarten Jacobs 2014-11-26 16:40:20 UTC
Created attachment 110076 [details]
crash dump (/sys/class/drm/card0/error)
Comment 4 Maarten Jacobs 2014-11-26 16:42:30 UTC
Comment on attachment 110076 [details]
crash dump (/sys/class/drm/card0/error)

I added the crash bump to this bug (I think). I executed:

# gzip -c /sys/class/drm/card0/error > crash_dump.gz
gzip: /sys/class/drm/card0/error: file size changed while zipping

I don't know if the error I got from gzip is relevant or not.

Let me know if this does the trick. If not please send instructions (obviously I'm out of my depth here).
Comment 5 Daniel Vetter 2014-11-27 19:23:18 UTC
For next time around:

cat /sys/class/drm/card0/error | gzip -c  > crash_dump.gz

Everything more intelligent than cat will try to seek around in the file and get pissed about the size being 0.

But it worked.
Comment 6 IxI_JOKER_IxI 2014-11-27 20:12:44 UTC
!!!***
Carry this message to developers with them is simply impossible to be written off to foreigners.
!!!***

Why is there no Ukrainian version of the site libra office ?? Although it has long been translated! Add it please.

You have a very intricate system of reporting bugs (errors) Everything is very closed and difficult! Is it not possible to make a more friendly system? No support for the Ukrainian language (

https://translations.documentfoundation.org/uk/website/multiplanet-uk.po

http://uk.libreoffice.org/ - no open((
Comment 7 Matt Turner 2015-03-06 23:28:40 UTC
I suspect this may be another duplicate of the bug 80568, fixed (worked-around) by this commit:

commit c4fd0c9052dd391d6f2e9bb8e6da209dfc7ef35b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jan 17 23:21:15 2015 -0800

    i965: Work around mysterious Gen4 GPU hangs with minimal state changes.
    
    Gen4 hardware appears to GPU hang frequently when using Chromium, and
    also when running 'glmark2 -b ideas'.  Most of the error states contain
    3DPRIMITIVE commands in quick succession, with very few state packets
    between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
    
    I trimmed an apitrace of the glmark2 hang down to two draw calls with a
    glUniformMatrix4fv call between the two.  Either draw by itself works
    fine, but together, they hang the GPU.  Removing the glUniform call
    makes the hangs disappear.  In the hardware state, this translates to
    removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
    
    Flushing before emitting CONSTANT_BUFFER packets also appears to make
    the hangs disappear.  I observed a slowdown in glxgears by doing it all
    the time, so I've chosen to only do it when BRW_NEW_BATCH and
    BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
    already flushed the whole pipeline).
    
    I'd much rather understand the problem, but at this point, I don't see
    how we'd ever be able to track it down further.  We have no real tools,
    and the hardware people moved on years ago.  I've analyzed 20+ error
    states and read every scrap of documentation I could find.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Acked-by: Matt Turner <mattst88@gmail.com>
    Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>

It's in git, and backports are in Mesa 10.4.x for x > 3. Please try upgrading to >10.4.3. If it's resolved by such an upgrade, please mark as a duplicate of bug 80568.
Comment 8 Matt Turner 2015-05-14 04:58:30 UTC
No reply. Marking as duplicate.

*** This bug has been marked as a duplicate of bug 80568 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.