90603 – GPU crash dump (Asus Notebook, Intel GMA 4500)

Bug 90603 - GPU crash dump (Asus Notebook, Intel GMA 4500)

Summary: GPU crash dump (Asus Notebook, Intel GMA 4500)

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-05-23 10:39 UTC by Peter Klotz
Modified:	2017-07-24 22:46 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
gzipped content of /sys/class/drm/card0/error (216.92 KB, text/plain) 2015-05-23 10:39 UTC, Peter Klotz	no flags	Details
View All

Description Peter Klotz 2015-05-23 10:39:41 UTC

Created attachment 115984 [details]
gzipped content of /sys/class/drm/card0/error

Got this crash dump today:

Mai 23 09:14:07 host kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Mai 23 09:14:07 host kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mai 23 09:14:07 host kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mai 23 09:14:07 host kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mai 23 09:14:07 host kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Mai 23 09:14:07 host kernel: i915: render error detected, EIR: 0x00000010
Mai 23 09:14:07 host kernel: i915:   IPEIR: 0x00000000
Mai 23 09:14:07 host kernel: i915:   IPEHR: 0x02000000
Mai 23 09:14:07 host kernel: i915:   INSTDONE_0: 0xffffffff
Mai 23 09:14:07 host kernel: i915:   INSTDONE_1: 0xbfbbffff
Mai 23 09:14:07 host kernel: i915:   INSTDONE_2: 0x00000000
Mai 23 09:14:07 host kernel: i915:   INSTDONE_3: 0x00000000
Mai 23 09:14:07 host kernel: i915:   INSTPS: 0x8001e035
Mai 23 09:14:07 host kernel: i915:   ACTHD: 0x37609bac
Mai 23 09:14:07 host kernel: i915: page table error
Mai 23 09:14:07 host kernel: i915:   PGTBL_ER: 0x00000001
Mai 23 09:14:07 host kernel: [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking

Environment:

  Arch Linux, kernel 3.14.43-2-lts
  libdrm 2.4.61-1
  xorg-server 1.17.1-5

lspci output:

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 1862
        Flags: bus master, fast devsel, latency 0, IRQ 49
        Memory at fd000000 (64-bit, non-prefetchable) [size=4M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at bc00 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 3
        Kernel driver in use: i915
        Kernel modules: i915

00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
        Subsystem: ASUSTeK Computer Inc. Device 1862
        Flags: bus master, fast devsel, latency 0
        Memory at fd500000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [d0] Power Management version 3

Comment 1 Chris Wilson 2015-05-23 12:07:57 UTC

Any other symptoms? That gpu error should have been non-fatal and I believe is finally fixed by


commit 983d308cb8f602d1920a8c40196eb2ab6cc07bd2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 26 10:47:10 2015 +0000

    agp/intel: Serialise after GTT updates
    
    An interesting bug occurs on Pineview through which the root cause is
    that the writes of the PTE values into the GTT is not serialised with
    subsequent memory access through the GTT (when using WC updates of the
    PTE values). This is despite there being a posting read after the GTT
    update. However, by changing the address of the posting read, the memory
    access is indeed serialised correctly.
    
    Whilst we are manipulating the memory barriers, we can remove the
    compiler :memory restraint on the intermediate PTE writes knowing that
    we explicitly perform a posting read afterwards.
    
    v2: Replace posting reads with explicit write memory barriers - in
    particular this is advantages in case of single page objects. Update
    comments to mention this issue is only with WC writes.

Comment 2 Chris Wilson 2015-05-23 12:08:18 UTC

Following my hunch.

Comment 3 Peter Klotz 2015-05-23 13:10:33 UTC

There were no other symptoms, just the crash message in the journal.

Thanks for pointing out that the issue is already fixed (and for the fix itself of course).

Regards, Peter.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.