Created attachment 123539 [details] /sys/class/drm/card0/error.bz2 This could be a dupe of bug #93710 (same hardware, same kernel .config and without explicit intel_iommu=off). [798005.164877] snd_hda_intel 0000:00:1b.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. [800008.420528] [drm] GPU HANG: ecode 6:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [800008.420592] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [800008.420593] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [800008.420594] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [800008.420595] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [800008.420596] [drm] GPU crash dump saved to /sys/class/drm/card0/error [848558.214001] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# uname -a Linux vostro 4.6.0-rc5-default-pciehp #3 SMP Wed Apr 27 16:57:10 CEST 2016 x86_64 Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz GenuineIntel GNU/Linux # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz stepping : 7 microcode : 0x1b cpu MHz : 3299.999 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts bugs : bogomips : 5586.99 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ... This is a laptop Dell Vostro 3550 with BIOS A12 (SandyBridge). External LCD hooked via HDMI, internal LCD panel turned off. What could trigger the issue? Fully loaded both CPU cores (for a week or so), time to time my external screen goes blank after I leave the computer for a while. It could have happened when I came back and woke up the X11 screen or even, turned on the power button of the external HDMI. In general both scenarios happen and I do not know what I was doing at about that time. Any of these could me related to the issue but it is a wild guess.
Created attachment 123540 [details] .config.gz
Created attachment 123541 [details] dmesg
Created attachment 123542 [details] /sys/kernel/debug/dri/0/i915_error_state.bz2 Interestingly, I see I have another, other error state as well?! # ls -latr /sys/kernel/debug/dri/0/i915_error_state -rw-r--r-- 1 root root 0 Apr 27 19:05 /sys/kernel/debug/dri/0/i915_error_state # ls -latr /sys/class/drm/card0/error -rw------- 1 root root 0 May 7 22:45 /sys/class/drm/card0/error # # w 23:13:14 up 10 days, 6:07, 32 users, load average: 2.97, 3.22, 3.29 ... Is the timestamp on /sys/kernel/debug/dri/0/i915_error_state set at bootup? No, not really: # grep -a 'syslog-ng starting up' /var/log/messages Apr 27 16:11:54 vostro syslog-ng[3641]: syslog-ng starting up; version='3.6.2' Apr 27 16:21:46 vostro syslog-ng[3301]: syslog-ng starting up; version='3.6.2' Apr 27 16:27:58 vostro syslog-ng[3328]: syslog-ng starting up; version='3.6.2' Apr 27 16:31:54 vostro syslog-ng[3639]: syslog-ng starting up; version='3.6.2' Apr 27 17:06:08 vostro syslog-ng[3642]: syslog-ng starting up; version='3.6.2' # grep -a '^Apr 27 19:0' /var/log/messages ... [ gives me no clue why i915_error_state was created, syslogd was running but that is all I can say ] # But from /var/log/messages I see during previous bootup I had a different issue: Apr 27 16:12:46 vostro kernel: [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO underrun Apr 27 16:12:46 vostro kernel: [drm:intel_set_pch_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A Apr 27 16:12:46 vostro kernel: [drm:intel_pch_fifo_underrun_irq_handler] *ERROR* PCH transcoder A FIFO underrun Apr 27 16:12:46 vostro kernel: [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe B FIFO underrun Apr 27 16:12:46 vostro kernel: [drm:intel_check_pch_fifo_underruns] *ERROR* pch fifo underrun on pch transcoder B but I was inserting and ejecting my ExpressCards into the slot a few second after this message was logged in, probably that is related to it.
Created attachment 123544 [details] Xorg.0.log (for currently running kernel)
Both gpu crash dumps are describing same issue. From error dump, there is no hung in render ring batch with active head at 0x0006a028, with 0x01800100 (MI_WAIT_FOR_EVENT) as IPEHR (could it be issue linked to dpms?) Moreover we can note that we have ERROR: 0x00000012 Context page GTT translation generated a fault (GTT entry not valid) TLB page VTD translation generated an error for reference, batch extract (around 0x0006a028): 0x0006a018: 0x11000001: MI_LOAD_REGISTER_IMM 0x0006a01c: 0x00002050: dword 1 0x0006a020: 0x00010001: dword 2 0x0006a024: 0x01800100: MI_WAIT_FOR_EVENT, plane B scan line wait 0x0006a028: 0x11000001: MI_LOAD_REGISTER_IMM 0x0006a02c: 0x00002050: dword 1 0x0006a030: 0x00010000: dword 2 0x0006a034: 0x11000001: MI_LOAD_REGISTER_IMM
Reporter, is this still valid with latest kernel?
I was just trying to connect to bugzilla now. After I baked my CPU for a day or so (seemed to be associated with high CPU load) and I did not hit this issue anymore, I conclude 4.10.8 is fixed.
(In reply to Martin Mokrejs from comment #8) > I was just trying to connect to bugzilla now. After I baked my CPU for a day > or so (seemed to be associated with high CPU load) and I did not hit this > issue anymore, I conclude 4.10.8 is fixed. Thanks Martin
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.