Summary: | [drm] GPU HANG: ecode 9:0:0x85dffffb, in compiz [1755], reason: Ring hung, action: reset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | john.stultz | ||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | normal | ||||||||||
Priority: | medium | CC: | abby_millward, anowlcalledjosh, intel-gfx-bugs, k.vrban | ||||||||
Version: | XOrg git | ||||||||||
Hardware: | Other | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | SKL | i915 features: | GPU hang | ||||||||
Attachments: |
|
Description
john.stultz
2016-04-14 00:23:57 UTC
Created attachment 122906 [details]
full dmesg
Created attachment 122907 [details]
/sys/class/drm/card0/error log
Oh, this is on my NUC6i5SYH with the latest (v42) BIOS. However it started happening with the v28 BIOS the system came with. The odd part is that this only cropped up around the start of the month. Prior to that, the system had worked well for months. I've tried jumping back to an older kernel, but that didn't help. I don't see any userspace updates that would have effected the system between the time it was working fine and when it wasn't. If I don't log in, and switch to the console VT I can use the system w/o issue. But since I need the system for graphical use, its sort of a brick now. Also, using the 15.10 live-cd, I've triggered the same problem. Which seems odd, as I don't recall seeing it when I originally installed the machine. I've also run memcheck and didn't find any issues there, so it doesn't seem like the memory in the system has suddenly gone bad. Using i915.enable_rc6=0 doesn't seem to solve the issue. Nor intel_pstate=disable. Sometimes those allow the system to run for longer, w/o the graphical corruption, but when a hang occurs its a hard hang, and I can't recover. Tried the drm-nightly kernel here: http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-04-13-wily/ No real change. Still hangs quickly after logging in. Just tried booting the Ubuntu 16.04 live-usb installation media. After logging in I see the same blocky graphic glitches and long stalls. The warning in dmesg looks the same as well there. I did try the latest nightly, as it looks like it has some recent fixes for skylake gpu hangs. http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-04-22-wily/ But I still saw the blocky graphic noise after logging in, and then it hung. I rebooted and hopped over to the VT console after logging it and there it hit some "general protection fault: 0000 [#1] SMP" errors that seem to trigger from generic_permission(). (In reply to john.stultz from comment #7) > I did try the latest nightly, as it looks like it has some recent fixes for > skylake gpu hangs. > > http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-04-22- > wily/ > > But I still saw the blocky graphic noise after logging in, and then it hung. > I rebooted and hopped over to the VT console after logging it and there it > hit some "general protection fault: 0000 [#1] SMP" errors that seem to > trigger from generic_permission(). GPFs are a far more serious problem. Did anything make it to the syslog, or can you grab a photograph of whatever error remains on screen? Oh joy. Hopefully this isn't the linux equivalent of the WHEA error issue these NUCS are seeing in Windows. :/ I'll try to capture a picture here shortly. I do appreciate the feedback! Hrm. So the GPFs didn't reproduce the next few boots. I did see the same blocky graphics noise and, and this time I did see very similar GPU HANG messages as the original report (though w/o the backtrace now). I dunno what else to do. I'm doing another round of memtest (it made it through successful runs previously after I started having this issue) just to make sure. *** Bug 94464 has been marked as a duplicate of this bug. *** *** Bug 95167 has been marked as a duplicate of this bug. *** Just FYI here, it seems this is caused by a hardware failure (at least in my case). After reading about the similar suddenly appearing WHEA errors folks were seeing on Windows with the Skylake NUCs, I went through the RMA process, and the replaced NUC (with new bios and new -503 model number) does not show the problem with the exact same disk drive. FYI my hangs completely stopped after adding kernel parameter: i915.enable_rc6=0 https://wiki.ubuntu.com/Kernel/KernelBootParameters (In reply to Abby from comment #14) > FYI my hangs completely stopped after adding kernel parameter: > > i915.enable_rc6=0 > > https://wiki.ubuntu.com/Kernel/KernelBootParameters Some extra details; I'm running ubuntu 16.04 on a Dell XPS 13 9350, with a second screen connected via a dell usb-c port hub Created attachment 125760 [details]
cat /sys/class/drm/card0/error
Marking this resolved+fixed based on the comment 13 and comment 15. Sverd - if your problem still exists on the latest kernels (preferable drm-tip from git://anongit.freedesktop.org/git/drm-tip), please create new bug (or reuse this if failure is exactly the same than on description) with attachments (see https://01.org/linuxgraphics/documentation/how-report-bugs) and description of the use case you are exercising. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.