Created attachment 93008 [details] i915 crash dump Hello, Running kernel version 3.13 (3.13.0 #1 SMP Mon Jan 20 11:36:57 CET 2014 x86_64 x86_64 x86_64 GNU/Linux) with the Xorg intel driver version: [ 48.945] (II) Module intel: vendor="X.Org Foundation" [ 48.945] compiled for 1.14.4, module version = 2.21.15 [ 48.945] Module class: X.Org Video Driver [ 48.945] ABI class: X.Org Video Driver, version 14.1 This is a fedora, package version is xorg-x11-drv-intel-2.21.15-5.fc20.x86_64, I got the attached crash. If you need anymore data, feel free to ask. Cheers, Matt
There's no rationale given as to why that GPU dump was captured. Perhaps there is some more information in your dmesg and Xorg.0.log?
There's nothing in my Xorg log file. I mean if I grep -v II there's nothing since my X server started, it hasn't crashed either: root 496 0.7 1.4 358280 58744 tty1 Ss+ Jan24 54:21 /usr/bin/Xorg :0 -background none -verbose -auth /run/gdm/auth-for-gdm-hvm5HO/database -seat seat0 -nolisten tcp vt1 And in dmesg or my kernel logs, the only stuff I saw is: Jan 29 14:26:26 foo kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jan 29 14:26:26 foo kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jan 29 14:26:26 foo kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jan 29 14:26:26 foo kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jan 29 14:26:26 foo kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. If you cannot do anything just with the crash dump, I'd say you can go ahead and resolve this ticket.
The buffer it was writing to at the time of the suspected hang is odd sized - but the error state indicates that the GPU was not hung nor had reported an error, so I was a little surprised that it decided to report a hang and had hoped that there was a precursor in the log. Mika, mind just confirming that the hangcheck code hasn't gone completely mad? And if you can not spot a problem, let's file this as impossible until proven otherwise. Mathieu, please keep an eye for further hangs.
I shall be on the lookout.
Created attachment 93447 [details] [review] drm/i915: add reason for capturing the error state
(In reply to comment #3) > Mika, mind just confirming that the hangcheck code hasn't gone completely > mad? And if you can not spot a problem, let's file this as impossible until > proven otherwise. My hypothesis is that this is not a hangcheck triggered error state capture but a command parser error interrupt triggering one. > Mathieu, please keep an eye for further hangs. Running into same hang with the attached patch, will leave more clues in the crash dump.
The GPU fault interrupts leaves a trail in dmesg and also should be recorded in PGTBL_ER in the dump. Hence why I was puzzled, because the dump has neither PGTBL_ER nor a hangcheck score.
in static irqreturn_t i965_irq_handler(int irq, void *arg) we have if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT) i915_handle_error(dev, false); All the other callsites would leave a dmesg trace, but not this one.
Mathieu, does the problem still persist with recent kernels?
(In reply to comment #9) > Mathieu, does the problem still persist with recent kernels? It doesn't anymore so I guess this can be marked as resolved.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.