Bug 74193 - [uxa gen4] GPU crash running kernel 3.13
Summary: [uxa gen4] GPU crash running kernel 3.13
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-29 17:50 UTC by Mathieu
Modified: 2017-07-24 22:56 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
i915 crash dump (199.65 KB, text/plain)
2014-01-29 17:50 UTC, Mathieu
no flags Details
drm/i915: add reason for capturing the error state (7.91 KB, patch)
2014-02-05 13:41 UTC, Mika Kuoppala
no flags Details | Splinter Review

Description Mathieu 2014-01-29 17:50:54 UTC
Created attachment 93008 [details]
i915 crash dump

Hello,

Running kernel version 3.13 (3.13.0 #1 SMP Mon Jan 20 11:36:57 CET 2014 x86_64 x86_64 x86_64 GNU/Linux) with the Xorg intel driver version:
[    48.945] (II) Module intel: vendor="X.Org Foundation"
[    48.945]    compiled for 1.14.4, module version = 2.21.15
[    48.945]    Module class: X.Org Video Driver
[    48.945]    ABI class: X.Org Video Driver, version 14.1

This is a fedora, package version is xorg-x11-drv-intel-2.21.15-5.fc20.x86_64, I got the attached crash.

If you need anymore data, feel free to ask.

Cheers,
Matt
Comment 1 Chris Wilson 2014-01-29 18:10:50 UTC
There's no rationale given as to why that GPU dump was captured. Perhaps there is some more information in your dmesg and Xorg.0.log?
Comment 2 Mathieu 2014-01-29 19:14:29 UTC
There's nothing in my Xorg log file.

I mean if I grep -v II there's nothing since my X server started, it hasn't crashed either:
root       496  0.7  1.4 358280 58744 tty1     Ss+  Jan24  54:21 /usr/bin/Xorg :0 -background none -verbose -auth /run/gdm/auth-for-gdm-hvm5HO/database -seat seat0 -nolisten tcp vt1

And in dmesg or my kernel logs, the only stuff I saw is:

Jan 29 14:26:26 foo kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jan 29 14:26:26 foo kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jan 29 14:26:26 foo kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jan 29 14:26:26 foo kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jan 29 14:26:26 foo kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.


If you cannot do anything just with the crash dump, I'd say you can go ahead and resolve this ticket.
Comment 3 Chris Wilson 2014-01-29 21:46:45 UTC
The buffer it was writing to at the time of the suspected hang is odd sized - but the error state indicates that the GPU was not hung nor had reported an error, so I was a little surprised that it decided to report a hang and had hoped that there was a precursor in the log.

Mika, mind just confirming that the hangcheck code hasn't gone completely mad? And if you can not spot a problem, let's file this as impossible until proven otherwise.

Mathieu, please keep an eye for further hangs.
Comment 4 Mathieu 2014-01-30 10:46:33 UTC
I shall be on the lookout.
Comment 5 Mika Kuoppala 2014-02-05 13:41:12 UTC
Created attachment 93447 [details] [review]
drm/i915: add reason for capturing the error state
Comment 6 Mika Kuoppala 2014-02-05 13:48:34 UTC
(In reply to comment #3)

> Mika, mind just confirming that the hangcheck code hasn't gone completely
> mad? And if you can not spot a problem, let's file this as impossible until
> proven otherwise.

My hypothesis is that this is not a hangcheck triggered error state capture
but a command parser error interrupt triggering one.

> Mathieu, please keep an eye for further hangs.

Running into same hang with the attached patch, will leave more clues
in the crash dump.
Comment 7 Chris Wilson 2014-02-05 14:12:38 UTC
The GPU fault interrupts leaves a trail in dmesg and also should be recorded in PGTBL_ER in the dump. Hence why I was puzzled, because the dump has neither PGTBL_ER nor a hangcheck score.
Comment 8 Mika Kuoppala 2014-02-06 08:36:01 UTC
in static irqreturn_t i965_irq_handler(int irq, void *arg)

we have

if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT)
			i915_handle_error(dev, false);


All the other callsites would leave a dmesg trace, but not this one.
Comment 9 Jani Nikula 2014-09-05 12:15:48 UTC
Mathieu, does the problem still persist with recent kernels?
Comment 10 Mathieu 2014-09-07 18:54:56 UTC
(In reply to comment #9)
> Mathieu, does the problem still persist with recent kernels?

It doesn't anymore so I guess this can be marked as resolved.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.