Bug 103553

Summary: X crash and reboot
Product: DRI Reporter: Vaielab <elabelle>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: KBL i915 features: GPU hang

Description Vaielab 2017-11-03 00:09:35 UTC
I'm using Debian 9 with mate desktop 1.16.2. My cpu is a i7-7700k and I’m also using the cpu for graphics.
But about twice a week, X become unresponsive except for the mouse but I can’t click on anything for about 20-30secs then X restart. It close every open program, and I lose a tons of stuff. Not sure if it has any real relation but It usully happend while I’m moving or uncompressing files.

I found that I had multiple dmesg errors messages

[56352.602322] sd 6:0:0:0: [sdd] Attached SCSI removable disk
[56353.186953] UDF-fs: warning (device sdd1): udf_load_vrs: No anchor found
[56353.186954] UDF-fs: Rescanning with blocksize 2048
[56353.197640] UDF-fs: warning (device sdd1): udf_load_vrs: No anchor found
[56353.197641] UDF-fs: Rescanning with blocksize 2048
[56353.202329] UDF-fs: INFO Mounting volume 'GParted-live', timestamp 2017/10/11 02:08 (1000)
[56544.007285] GPT:Primary header thinks Alt. header is not at the end of the disk.
[56544.007285] GPT:1108031 != 2015231
[56544.007286] GPT:Alternate GPT header not at the end of the disk.
[56544.007286] GPT:1108031 != 2015231
[56544.007286] GPT: Use GNU Parted to correct GPT errors.
[56544.007293]  sdd: sdd1 sdd2 sdd3
[56548.912038] usb 1-7: USB disconnect, device number 9
[61571.500125] perf: interrupt took too long (2527 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[78761.896850] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[82649.408506] perf: interrupt took too long (3160 > 3158), lowering kernel.perf_event_max_sample_rate to 63250
[92508.136386] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[99004.421060] perf: interrupt took too long (3990 > 3950), lowering kernel.perf_event_max_sample_rate to 50000
[131065.011117] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[131065.284401] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
[131066.736574] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[134329.016759] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[141447.752846] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[153214.448685] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[153627.647726] br0: port 2(vnet0) entered disabled state
[153627.648047] device vnet0 left promiscuous mode
[153627.648052] br0: port 2(vnet0) entered disabled state
[153630.655499] br0: port 2(vnet0) entered blocking state
[153630.655500] br0: port 2(vnet0) entered disabled state
[153630.655594] device vnet0 entered promiscuous mode
[153630.675637] br0: port 2(vnet0) entered blocking state
[153630.675640] br0: port 2(vnet0) entered forwarding state
[153631.251589] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x900
[153633.806513] kvm: zapping shadow pages for mmio generation wraparound
[153633.807127] kvm: zapping shadow pages for mmio generation wraparound
[168270.816764] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[168910.297856] br0: port 2(vnet0) entered disabled state
[168910.298424] device vnet0 left promiscuous mode
[168910.298425] br0: port 2(vnet0) entered disabled state
[172204.688659] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[177392.090944] [drm] GPU HANG: ecode 9:0:0x84dfbffc, in Xorg [934], reason: Hang on render ring, action: reset
[177392.090947] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[177392.090948] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[177392.090949] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[177392.090951] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[177392.090952] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[177392.091020] drm/i915: Resetting chip after gpu hang
[177392.091086] [drm] RC6 on
[177392.110049] [drm] GuC firmware load skipped
[177402.106586] drm/i915: Resetting chip after gpu hang
[177402.106650] [drm] RC6 on
[177402.122137] [drm] GuC firmware load skipped
[177402.148921] Chrome_ChildThr[10211]: segfault at c ip 00007f8557e88802 sp 00007f854d9fe670 error 4 in libxul.so[7f85571d2000+43ea000]



I tried to sudo cat /sys/class/drm/card0/error, but it output
no error state collected

Thank you
Comment 1 Chris Wilson 2017-11-03 10:02:24 UTC
/sys/class/drm/card0/error is stored in memory and so lost on reboot. You need to capture it beforehand, thanks.
Comment 2 Elizabeth 2017-11-07 17:46:42 UTC
This looks exactly like bug 103602
Comment 3 Vaielab 2017-11-07 18:20:41 UTC
Hello,
Sorry for the delay, the only time it crashed I was working and had my password manager open. I did not know if part of the screen was in the debug, so I prefer to wait for the next time it will crash (when I won't have sensitive data open).

I do share the same cpu with #103602, but do not have a second video card in my computer. Only using the cpu.

I will upgrade my keynel and try intel_iommu=igfx_off in grub as mentionned in bug #103602.

Thank you
Comment 4 Jani Saarinen 2018-03-29 07:10:23 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 5 Jani Saarinen 2018-04-20 14:43:23 UTC
Closing, please re-open if still occurs.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.