Created attachment 139429 [details] GPU crash dump I get GPU hangs on Sky Lake integrated graphics when not booting with either: i915.modeset=0 or video=vesafb:off The hardware is a Dell 7040 with Intel i7-6700 CPU. Xorg does not crash, but everything except the cursor locks up each time the GPU resets. Here his the drm message: [ 41.823218] [drm] GPU HANG: ecode 9:0:0x859ffffb, in Xorg [2092], reason: hang on rcs0, action: reset [ 41.823219] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 41.823220] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 41.823220] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 41.823220] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 41.823221] [drm] GPU crash dump saved to /sys/class/drm/card0/error Subsequent to this there are frequent repeated resets: [ 92.731613] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 100.731570] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 108.731500] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 116.731410] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 606.840675] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 617.816522] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 625.816467] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [ 633.816354] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 Kernel: 4.17.0 Xorg server: 1.19.6 Intel driver: 2.99.917+git20171229-1 Mesa: 18.0.0 GuC firmware: i915/skl_guc_ver9_33.bin HuC firmware: i915/skl_huc_ver01_07_1398.bin I started out with a stock Ubuntu Xenial and progressively upgraded bits with no avail. Turning off modeset/vesafb was an acceptable solution until I needed dual display support. GPU crash dump is attached
Nothing stands out as being an old bug resurfaced; so time for some fresh debugging. What was the old kernel/userspace this reproduced on? i.e. do you still have the original kernel you installed? Could you capture that error state for comparison?
Created attachment 139437 [details] GPU crash dump with 4.4.0 kernel
The old kernel was Ubuntu's 4.4.0-122.146-generic. Based on what is currently in Xenial, I believe the old userspace had: Intel driver: 2.99.917+git20160325 Mesa: 11.2.0 Let me know if accuracy is critical (I think can search /var/log/apt/ to find the exact versions). I booted the old kernel (same upgraded userspace), and reproduced the error. This time Xorg crashed once after login and then started again. This time kernel messages where a little more colorful than before: [ 54.764325] [drm] stuck on render ring [ 54.764435] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [2102], reason: Engine(s) hung, action: reset [ 54.764436] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 54.764437] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 54.764438] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 54.764438] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 54.764439] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 54.766472] drm/i915: Resetting chip after gpu hang [ 56.764626] [drm] RC6 on [ 72.756913] [drm] stuck on render ring [ 72.757189] [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [2102], reason: Engine(s) hung, action: reset [ 72.759030] drm/i915: Resetting chip after gpu hang [ 73.792203] [drm] RC6 on [ 75.850856] usb 1-8: reset high-speed USB device number 3 using xhci_hcd [ 84.749848] [drm] stuck on render ring [ 84.750103] [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [2102], reason: Engine(s) hung, action: reset [ 84.751978] drm/i915: Resetting chip after gpu hang [ 86.748842] [drm] RC6 on [ 165.726069] [drm] stuck on render ring [ 165.726437] [drm] GPU HANG: ecode 9:0:0x86dffffd, in Xorg [3066], reason: Engine(s) hung, action: reset [ 165.728202] drm/i915: Resetting chip after gpu hang [ 167.724909] [drm] RC6 on [ 185.713779] [drm] stuck on render ring [ 185.714098] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [3066], reason: Engine(s) hung, action: reset [ 185.715884] drm/i915: Resetting chip after gpu hang [ 187.712852] [drm] RC6 on I uploaded the corresponding error state.
HI, If Chris agrees and makes sense, you could try also using latest drm-tip: https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M, please also send debug dmesg from the kernel you see issues on.
(In reply to Simeon Miteff from comment #2) > Created attachment 139437 [details] > GPU crash dump with 4.4.0 kernel Hmm, also switched to -modesetting. I would make sure that mesa is uptodate (18.0; at least 17.3 to be sure of having the majority of hang fixes for -modesetting on Skylake). But the switch defeats the purpose of testing the old kernel :)
So, maybe try drm-tip then?
No problem. I think I can try that tomorrow.
OK, thanks.
ping, testing drm-tip?
Any updates on this testing drm-tip?
Reporter, any upddates on this?
(In reply to Simeon Miteff from comment #7) > No problem. I think I can try that tomorrow. Hi, do you have an update?
Hi So sorry for the delay. I retested with the 4.17 kernel built by Ubuntu from drm-tip as of yesterday, with drm.debug=0x1e log_buf_len=4M as requested. I attach the new dmesg and crashdump. Regards, Simeon
Created attachment 139952 [details] dmesg with drm.debug=0x1e log_buf_len=4M
Created attachment 139953 [details] Latest GPU crashdump
Simeon, Sorry for the delay.. Can you try to reproduce the issue using latest drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot. AT this point, this will help to proceed with this bug.
The last dumps indicate the problem is in mesa, and many related bugs have been fixed, hopefully yours included.
Sorry guys, I have changed jobs and no longer have access to this machine. If someone else has access to a Dell 7040 with the Intel i7-6700 CPU, maybe they can test?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1722.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.