Bug 98049 - intel GPU driver crashes after resume
Summary: intel GPU driver crashes after resume
Status: CLOSED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: 11.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-05 00:48 UTC by Yuriy
Modified: 2016-10-06 07:02 UTC (History)
1 user (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
"/sys/class/drm/card0/error" GPU crash dump (2.12 MB, text/plain)
2016-10-05 00:48 UTC, Yuriy
Details

Description Yuriy 2016-10-05 00:48:27 UTC
Created attachment 127011 [details]
"/sys/class/drm/card0/error" GPU crash dump

On kubuntu system the suspend/resume process works solely. 
After resume, dmesg shows this errors:

[ 9757.919203] [drm] stuck on render ring
[ 9757.919604] [drm] GPU HANG: ecode 7:0:0x85fffff8, in kscreenlocker_g [18707], reason: Ring hung, action: reset
[ 9757.919605] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 9757.919606] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 9757.919607] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 9757.919607] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 9757.919608] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 9757.921071] drm/i915: Resetting chip after gpu hang
[ 9763.919250] [drm] stuck on render ring
[ 9763.919657] [drm] GPU HANG: ecode 7:0:0x85fffff8, in kscreenlocker_g [18707], reason: Ring hung, action: reset
[ 9763.921762] drm/i915: Resetting chip after gpu hang




The output of /var/log/syslog for the resume time:

Oct  5 10:06:48 uyras-note kernel: [ 9730.228262] PM: Finishing wakeup.
Oct  5 10:06:48 uyras-note kernel: [ 9730.228264] Restarting tasks ... 
Oct  5 10:06:48 uyras-note kernel: [ 9730.228478] usb 1-1.1: USB disconnect, device number 11
Oct  5 10:06:48 uyras-note kernel: [ 9730.233097] done.
Oct  5 10:06:48 uyras-note kernel: [ 9730.307000] usb 1-1.1: new full-speed USB device number 12 using ehci-pci
Oct  5 10:06:48 uyras-note kernel: [ 9730.401108] usb 1-1.1: New USB device found, idVendor=13d3, idProduct=3402
Oct  5 10:06:48 uyras-note kernel: [ 9730.401129] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Oct  5 10:06:48 uyras-note kernel: [ 9730.401133] usb 1-1.1: Product: Bluetooth USB Host Controller
Oct  5 10:06:48 uyras-note kernel: [ 9730.401136] usb 1-1.1: Manufacturer: Atheros Communications
Oct  5 10:06:48 uyras-note kernel: [ 9730.401139] usb 1-1.1: SerialNumber: Alaska Day 2006
Oct  5 10:06:48 uyras-note kernel: [ 9731.523069] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct  5 10:06:48 uyras-note kernel: [ 9731.623563] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct  5 10:06:48 uyras-note kernel: [ 9731.623640] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
Oct  5 10:06:48 uyras-note kernel: [ 9731.623646] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct  5 10:06:48 uyras-note kernel: [ 9731.625256] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct  5 10:06:48 uyras-note kernel: [ 9731.625327] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
Oct  5 10:06:48 uyras-note kernel: [ 9731.625345] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct  5 10:06:48 uyras-note kernel: [ 9731.625953] ata1.00: configured for UDMA/133
Oct  5 10:06:49 uyras-note systemd[1]: Time has been changed
Oct  5 10:06:49 uyras-note systemd-sleep[18756]: Failed to connect to non-global ctrl_ifname: (nil)  error: No such file or directory
Oct  5 10:06:49 uyras-note systemd-sleep[18756]: System resumed.
Oct  5 10:06:49 uyras-note systemd[1]: snapd.refresh.timer: Adding 33min 7.796046s random time.
Oct  5 10:06:49 uyras-note systemd[1]: apt-daily.timer: Adding 7h 58min 53.828604s random time.
Oct  5 10:06:49 uyras-note systemd[14621]: Time has been changed
Oct  5 10:06:49 uyras-note systemd[1]: bluetooth.target: Unit not needed anymore. Stopping.
Oct  5 10:06:49 uyras-note systemd[1]: Stopped target Bluetooth.
Oct  5 10:06:49 uyras-note systemd[1]: Starting Load/Save RF Kill Switch Status...
Oct  5 10:06:49 uyras-note systemd[1563]: Time has been changed
Oct  5 10:06:50 uyras-note bluetoothd[1388]: Endpoint unregistered: sender=:1.146 path=/MediaEndpoint/A2DPSource
Oct  5 10:06:50 uyras-note bluetoothd[1388]: Endpoint unregistered: sender=:1.146 path=/MediaEndpoint/A2DPSink
Oct  5 10:06:50 uyras-note systemd[1]: Started Load/Save RF Kill Switch Status.
Oct  5 10:06:52 uyras-note kernel: [ 9735.464966] usb 1-1.1: USB disconnect, device number 12
Oct  5 10:06:52 uyras-note kernel: [ 9735.663065] usb 1-1.1: new full-speed USB device number 13 using ehci-pci
Oct  5 10:06:52 uyras-note systemd-sleep[18756]: /dev/sda:
Oct  5 10:06:52 uyras-note systemd-sleep[18756]:  setting Advanced Power Management level to 0xfe (254)
Oct  5 10:06:52 uyras-note systemd-sleep[18756]:  APM_level#011= 254
Oct  5 10:06:53 uyras-note systemd-sleep[18903]: /lib/systemd/system-sleep/wpasupplicant failed with error code 255.
Oct  5 10:06:53 uyras-note systemd[1]: Started Suspend.
Oct  5 10:06:53 uyras-note systemd[1]: sleep.target: Unit not needed anymore. Stopping.
Oct  5 10:06:53 uyras-note systemd[1]: Stopped target Sleep.
Oct  5 10:06:53 uyras-note systemd[1]: Reached target Suspend.
Oct  5 10:06:53 uyras-note systemd[1]: suspend.target: Unit is bound to inactive unit systemd-suspend.service. Stopping, too.
Oct  5 10:06:53 uyras-note systemd[1]: Stopped target Suspend.
Oct  5 10:06:53 uyras-note systemd[1]: Started Run anacron jobs at resume.
Oct  5 10:06:53 uyras-note systemd[1]: Started Run anacron jobs.
Oct  5 10:06:53 uyras-note kernel: [ 9736.274698] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Oct  5 10:06:53 uyras-note NetworkManager[892]: <info>  [1475626013.3095] manager: wake requested (sleeping: yes  enabled: yes)
Oct  5 10:06:53 uyras-note NetworkManager[892]: <info>  [1475626013.3096] manager: waking up...
Oct  5 10:06:53 uyras-note NetworkManager[892]: <info>  [1475626013.3097] device (wlp2s0): state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Oct  5 10:06:53 uyras-note kernel: [ 9736.292838] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Oct  5 10:06:53 uyras-note NetworkManager[892]: <info>  [1475626013.4246] device (enp3s0): state change: unmanaged -> unavailable (reason 'managed') [10 20 2]
Oct  5 10:06:53 uyras-note NetworkManager[892]: <info>  [1475626013.4267] manager: NetworkManager state is now DISCONNECTED
Oct  5 10:06:53 uyras-note kernel: [ 9736.389620] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
Oct  5 10:06:53 uyras-note kernel: [ 9736.390067] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
Oct  5 10:06:53 uyras-note anacron[18967]: Anacron 2.3 started on 2016-10-05
Oct  5 10:06:54 uyras-note org.kde.KScreen[14710]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )
Oct  5 10:06:54 uyras-note org.kde.KScreen[14710]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )
Oct  5 10:06:54 uyras-note anacron[18967]: Normal exit (0 jobs run)
Oct  5 10:06:54 uyras-note wpa_supplicant[1084]: dbus: wpa_dbus_get_object_properties: failed to get object properties: (none) none
Oct  5 10:06:54 uyras-note wpa_supplicant[1084]: dbus: Failed to construct signal
Oct  5 10:06:55 uyras-note NetworkManager[892]: <info>  [1475626015.0690] device (wlp2s0): supplicant interface state: starting -> ready
Oct  5 10:06:55 uyras-note NetworkManager[892]: <info>  [1475626015.0691] device (wlp2s0): state change: unavailable -> disconnected (reason 'supplicant-available') [20 30 42]
Oct  5 10:06:55 uyras-note kernel: [ 9738.033980] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
Oct  5 10:06:57 uyras-note kernel: [ 9740.735135] usb 1-1.1: device descriptor read/64, error -110
Oct  5 10:07:08 uyras-note kernel: [ 9751.319174] usb 1-1.1: device not accepting address 13, error -32
Oct  5 10:07:08 uyras-note kernel: [ 9751.391181] usb 1-1.1: new full-speed USB device number 14 using ehci-pci
Oct  5 10:07:08 uyras-note kernel: [ 9751.485533] usb 1-1.1: New USB device found, idVendor=13d3, idProduct=3402
Oct  5 10:07:08 uyras-note kernel: [ 9751.485540] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Oct  5 10:07:08 uyras-note kernel: [ 9751.485544] usb 1-1.1: Product: Bluetooth USB Host Controller
Oct  5 10:07:08 uyras-note kernel: [ 9751.485547] usb 1-1.1: Manufacturer: Atheros Communications
Oct  5 10:07:08 uyras-note kernel: [ 9751.485550] usb 1-1.1: SerialNumber: Alaska Day 2006
Oct  5 10:07:08 uyras-note systemd[1]: Starting Load/Save RF Kill Switch Status...
Oct  5 10:07:10 uyras-note kernel: [ 9753.535166] Bluetooth: hci0 command 0x1001 tx timeout
Oct  5 10:07:12 uyras-note kernel: [ 9755.539184] Bluetooth: hci0 command 0x1009 tx timeout
Oct  5 10:07:14 uyras-note kernel: [ 9757.919203] [drm] stuck on render ring
Oct  5 10:07:14 uyras-note kernel: [ 9757.919604] [drm] GPU HANG: ecode 7:0:0x85fffff8, in kscreenlocker_g [18707], reason: Ring hung, action: reset
Oct  5 10:07:14 uyras-note kernel: [ 9757.919605] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Oct  5 10:07:14 uyras-note kernel: [ 9757.919606] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Oct  5 10:07:14 uyras-note kernel: [ 9757.919607] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Oct  5 10:07:14 uyras-note kernel: [ 9757.919607] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Oct  5 10:07:14 uyras-note kernel: [ 9757.919608] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Oct  5 10:07:14 uyras-note kernel: [ 9757.921071] drm/i915: Resetting chip after gpu hang
Oct  5 10:07:20 uyras-note kernel: [ 9763.919250] [drm] stuck on render ring
Oct  5 10:07:20 uyras-note kernel: [ 9763.919657] [drm] GPU HANG: ecode 7:0:0x85fffff8, in kscreenlocker_g [18707], reason: Ring hung, action: reset
Oct  5 10:07:20 uyras-note kernel: [ 9763.921762] drm/i915: Resetting chip after gpu hang
Oct  5 10:07:21 uyras-note org.kde.KScreen[14710]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )
Oct  5 10:07:25 uyras-note org.kde.KScreen[14710]: message repeated 15 times: [ kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )]
Oct  5 10:07:28 uyras-note systemd-udevd[18997]: Process '/bin/hciconfig hci0 up' failed with exit code 1.
Oct  5 10:07:28 uyras-note systemd[1]: Reached target Bluetooth.
Oct  5 10:07:28 uyras-note systemd[1]: Started Load/Save RF Kill Switch Status.
Oct  5 10:07:29 uyras-note org.kde.KScreen[14710]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )
Oct  5 10:07:32 uyras-note org.kde.KScreen[14710]: message repeated 23 times: [ kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" ) to KScreen::Output(Id: 67 , Name: "LVDS1" ) ( "LVDS1" )]



Hardware: Asus S300CA,
kubuntu 16.04
Comment 1 yann 2016-10-05 12:28:06 UTC
There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.

In parallel, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).

Kernel: 4.4.0-34-generic
Platform: IvyBrdige (pci id: 0x0166)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0x02dcf6ec, with 0x7a000003 (PIPE_CONTROL) as IPEHR.

Batch extract (around 0x02dcf6ec):

0x02dcf6bc:      0x7b000005: 3DPRIMITIVE:
0x02dcf6c0:      0x0000000f:    rect list sequential
0x02dcf6c4:      0x00000003:    vertex count
0x02dcf6c8:      0x00000000:    start vertex
0x02dcf6cc:      0x00000001:    instance count
0x02dcf6d0:      0x00000000:    start instance
0x02dcf6d4:      0x00000000:    index bias
0x02dcf6d8:      0x7a000003: PIPE_CONTROL
0x02dcf6dc:      0x00101c11:    no write, cs stall, render target cache flush, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, depth cache flush,
0x02dcf6e0:      0x00000000:    destination address
0x02dcf6e4:      0x00000000:    immediate dword low
0x02dcf6e8:      0x00000000:    immediate dword high
0x02dcf6ec:      0x78210000: 3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP
0x02dcf6f0:      0x00007d80:    pointer to SF_CLIP viewport
0x02dcf6f4:      0x78240000: 3DSTATE_BLEND_STATE_POINTERS
0x02dcf6f8:      0x00007d41:    pointer to BLEND_STATE at 0x00007d40 (changed)
Comment 2 Yuriy 2016-10-06 00:30:58 UTC
Thank you!
An updates works great!
(new kernel version is 4.4.0-38-generic)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.