Created attachment 104646 [details]
contents of /sys/class/drm/card0/error
I'm getting frequent GPU hangs. It looks like somehow google chrome is either the cause or a "catalyst" of the bugs, because the bugs occur after I have a lot if chrome tabs opened. I'm running Debian "sid". The hangs were occurring on linux 3.14, causing a lot of problems, and I was unable to start chrome after such hangs until the system is rebooted. After upgrading to linux 3.16, I've got the following message in the log (but chrome appears to be usable without a reboot):
[101452.978884] [drm] GPU HANG: ecode -1:0x00000000, reason: Command parser error, iir 0x07fcc000, action: continue
[101452.978888] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[101452.978890] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[101452.978891] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[101452.978892] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[101452.978894] [drm] GPU crash dump saved to /sys/class/drm/card0/error
I'm attaching the contents of /sys/class/drm/card0/error
Hopefully someone will be able to fix this nasty problem.
Hm a command parser error on Cantiga... have you tried different versions of the i965 DRI driver in Mesa? Or are you sure it's related to a kernel upgrade? If so, a bisect would really help narrow things down.
I'm using the package libgl1-mesa-dri version 10.2.5-1 from Debian sid. I
can try a different version if you'll let me know which one I should try.
I'm not sure at all what is the cause of this bug (whether it is related to
a kernel upgrade). I've been observing such problem for a several months
now. Usually it is triggered after opening many tabs in Google Chrome (i'm
using chrome unstable builds, such as 38.0.2125). When this was happening
on linux 3.14 and below, the problem was manifesting in having a segfault
in google chrome process, after which chrome was still running, but was
stuck (its windows became unresponsive). Restarting google chrome was not
possible, as it continued to be unresponsive. I needed a reboot to restore
things to normal operation. However after upgrading to linux 3.16, even
after receiving a segfault, chrome remained responsive, and I can still use
it (restarting it also works). I'm not at all sure where this GPU HANG
problem comes from.
On Tue, Aug 19, 2014 at 9:58 PM, <email@example.com> wrote:
> Jesse Barnes <firstname.lastname@example.org> changed bug 82640
> What Removed Added Summary GPU HANG: ecode -1:0x00000000, reason:
> Command parser error, iir 0x07fcc000, action: continue [CTG] GPU HANG:
> ecode -1:0x00000000, reason: Command parser error, iir 0x07fcc000, action:
> *Comment # 1 <https://bugs.freedesktop.org/show_bug.cgi?id=82640#c1> on
> bug 82640 <https://bugs.freedesktop.org/show_bug.cgi?id=82640> from Jesse
> Barnes <email@example.com> *
> Hm a command parser error on Cantiga... have you tried different versions of
> the i965 DRI driver in Mesa? Or are you sure it's related to a kernel upgrade?
> If so, a bisect would really help narrow things down.
> You are receiving this mail because:
> - You reported the bug.
With the hang attached, there really is no cause for it to generate an error - and PGTBL_ER is 0 as well. Most odd. So wrt to that hang, I would run memtest overnight and see if they are any faulty RAM cells.
Sounds like a memory issue. Vladimir, please re-open if memtest passes and the system looks otherwise healthy but you still see errors. In that case, please attach an updated error log.