Bug 93390 - [IVB] GPU hang after resume
Summary: [IVB] GPU hang after resume
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 92862 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-12-15 19:34 UTC by Marcin Slusarz
Modified: 2016-11-04 00:37 UTC (History)
3 users (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (2.09 MB, text/plain)
2015-12-15 19:35 UTC, Marcin Slusarz
Details
full dmesg (102.34 KB, text/plain)
2015-12-15 19:36 UTC, Marcin Slusarz
Details
dmesg 4.4-rc6 (166.85 KB, text/plain)
2015-12-29 12:54 UTC, Marcin Slusarz
Details
gpu crash dump (2.07 MB, text/plain)
2016-01-12 11:06 UTC, Don Bowman
Details

Description Marcin Slusarz 2015-12-15 19:34:09 UTC
Since I upgraded to 4.4-rc4 I got two gpu hangs after resume from s2r.

Relevant snippsets:

[drm] stuck on render ring
[drm] GPU HANG: ecode 7:0:0x85ffbff8, in chromium-browse [32615], reason: Ring hung, action: reset
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
------------[ cut here ]------------
WARNING: CPU: 1 PID: 23642 at drivers/gpu/drm/i915/intel_display.c:11277 intel_mmio_flip_work_func+0x391/0x3d0 [i915]()
WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
Modules linked in:
 drbg ctr ccm rfcomm bnep binfmt_misc arc4 iwldvm mac80211 iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt uvcvideo rtsx_usb btusb snd_hda_codec_generic btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth snd_hwdep videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_v4l2 videobuf2_core v4l2_common videodev dell_wmi snd_pcm sparse_keymap dell_laptop dcdbas snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal snd_timer coretemp serio_raw snd_seq_device snd soundcore lpc_ich parport_pc ppdev lp parport hid_generic nouveau psmouse i915 mxm_wmi ttm ahci libahci i2c_algo_bit usbhid drm_kms_helper hid syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi video
CPU: 1 PID: 23642 Comm: kworker/1:2 Not tainted 4.4.0-rc4 #61
Hardware name: Dell Inc.          Inspiron 7720/04M3YM, BIOS A07 08/16/2012
Workqueue: events intel_mmio_flip_work_func [i915]
 ffffffffa0294538 ffff880118cabca0 ffffffff813cda2c ffff880118cabce8
 ffff880118cabcd8 ffffffff810b4996 ffff880136952900 ffff88013f255740
 ffff88013f25a900 0000000000000040 ffff880136952900 ffff880118cabd38
Call Trace:
 [<ffffffff813cda2c>] dump_stack+0x4e/0x82
 [<ffffffff810b4996>] warn_slowpath_common+0x86/0xc0
 [<ffffffff810b4a1c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffffa022c2e1>] intel_mmio_flip_work_func+0x391/0x3d0 [i915]
 [<ffffffff810d09b7>] process_one_work+0x1e7/0x640
 [<ffffffff810d0926>] ? process_one_work+0x156/0x640
 [<ffffffff810d15db>] worker_thread+0x4b/0x440
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d7563>] kthread+0xf3/0x110
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
 [<ffffffff817456bf>] ret_from_fork+0x3f/0x70
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
---[ end trace 53a2b620365ba5b9 ]---

and after 2nd resume:

[drm] stuck on render ring
[drm] GPU HANG: ecode 7:0:0x85ffbff8, in compiz [2015], reason: Ring hung, action: reset
------------[ cut here ]------------
WARNING: CPU: 1 PID: 20255 at drivers/gpu/drm/i915/intel_display.c:11277 intel_mmio_flip_work_func+0x391/0x3d0 [i915]()
WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
Modules linked in:
 drbg ctr ccm rfcomm bnep binfmt_misc arc4 iwldvm mac80211 iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt uvcvideo rtsx_usb btusb snd_hda_codec_generic btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth snd_hwdep videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_v4l2 videobuf2_core v4l2_common videodev dell_wmi snd_pcm sparse_keymap dell_laptop dcdbas snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal snd_timer coretemp serio_raw snd_seq_device snd soundcore lpc_ich parport_pc ppdev lp parport hid_generic nouveau psmouse i915 mxm_wmi ttm ahci libahci i2c_algo_bit usbhid drm_kms_helper hid syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi video
CPU: 1 PID: 20255 Comm: kworker/1:1 Tainted: G        W       4.4.0-rc4 #61
Hardware name: Dell Inc.          Inspiron 7720/04M3YM, BIOS A07 08/16/2012
Workqueue: events intel_mmio_flip_work_func [i915]
 ffffffffa0294538 ffff880080fffca0 ffffffff813cda2c ffff880080fffce8
 ffff880080fffcd8 ffffffff810b4996 ffff88005bef3480 ffff88013f255740
 ffff88013f25a900 0000000000000040 ffff88005bef3480 ffff880080fffd38
Call Trace:
 [<ffffffff813cda2c>] dump_stack+0x4e/0x82
 [<ffffffff810b4996>] warn_slowpath_common+0x86/0xc0
 [<ffffffff810b4a1c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffffa022c2e1>] intel_mmio_flip_work_func+0x391/0x3d0 [i915]
 [<ffffffff810d09b7>] process_one_work+0x1e7/0x640
 [<ffffffff810d0926>] ? process_one_work+0x156/0x640
 [<ffffffff810d15db>] worker_thread+0x4b/0x440
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d7563>] kthread+0xf3/0x110
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
 [<ffffffff817456bf>] ret_from_fork+0x3f/0x70
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
---[ end trace 53a2b620365ba5ba ]---
drm/i915: Resetting chip after gpu hang
[drm] stuck on render ring
[drm] GPU HANG: ecode 7:0:0x85ffbff8, in compiz [2015], reason: Ring hung, action: reset
------------[ cut here ]------------
WARNING: CPU: 1 PID: 20255 at drivers/gpu/drm/i915/intel_display.c:11277 intel_mmio_flip_work_func+0x391/0x3d0 [i915]()
WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
Modules linked in:
 drbg ctr ccm rfcomm bnep binfmt_misc arc4 iwldvm mac80211 iwlwifi cfg80211 snd_hda_codec_hdmi snd_hda_codec_idt uvcvideo rtsx_usb btusb snd_hda_codec_generic btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth snd_hwdep videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_v4l2 videobuf2_core v4l2_common videodev dell_wmi snd_pcm sparse_keymap dell_laptop dcdbas snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal snd_timer coretemp serio_raw snd_seq_device snd soundcore lpc_ich parport_pc ppdev lp parport hid_generic nouveau psmouse i915 mxm_wmi ttm ahci libahci i2c_algo_bit usbhid drm_kms_helper hid syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi video
CPU: 1 PID: 20255 Comm: kworker/1:1 Tainted: G        W       4.4.0-rc4 #61
Hardware name: Dell Inc.          Inspiron 7720/04M3YM, BIOS A07 08/16/2012
Workqueue: events intel_mmio_flip_work_func [i915]
 ffffffffa0294538 ffff880080fffca0 ffffffff813cda2c ffff880080fffce8
 ffff880080fffcd8 ffffffff810b4996 ffff88005bef3840 ffff88013f255740
 ffff88013f25a900 0000000000000040 ffff88005bef3840 ffff880080fffd38
Call Trace:
 [<ffffffff813cda2c>] dump_stack+0x4e/0x82
 [<ffffffff810b4996>] warn_slowpath_common+0x86/0xc0
 [<ffffffff810b4a1c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffffa022c2e1>] intel_mmio_flip_work_func+0x391/0x3d0 [i915]
 [<ffffffff810d09b7>] process_one_work+0x1e7/0x640
 [<ffffffff810d0926>] ? process_one_work+0x156/0x640
 [<ffffffff810d15db>] worker_thread+0x4b/0x440
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d1590>] ? cancel_delayed_work_sync+0x20/0x20
 [<ffffffff810d7563>] kthread+0xf3/0x110
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
 [<ffffffff817456bf>] ret_from_fork+0x3f/0x70
 [<ffffffff810d7470>] ? kthread_create_on_node+0x230/0x230
---[ end trace 53a2b620365ba5bb ]---
drm/i915: Resetting chip after gpu hang
Comment 1 Marcin Slusarz 2015-12-15 19:35:40 UTC
Created attachment 120532 [details]
/sys/class/drm/card0/error
Comment 2 Marcin Slusarz 2015-12-15 19:36:18 UTC
Created attachment 120533 [details]
full dmesg
Comment 3 Marcin Slusarz 2015-12-15 19:37:48 UTC
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 0578
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 30
	Region 0: Memory at f1000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 4000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0100c  Data: 4122
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
Comment 4 Marcin Slusarz 2015-12-29 12:52:53 UTC
I'm on 4.4-rc6 and the hangs occur at least once a day, even without suspend & resume cycle. 2 or 3 times I also lost mouse cursor (X restart restores it).

Do you need any information?
Comment 5 Marcin Slusarz 2015-12-29 12:54:09 UTC
Created attachment 120725 [details]
dmesg 4.4-rc6
Comment 6 Don Bowman 2016-01-12 11:05:33 UTC
I started seeing this ~4.4rc7 (might have been on rc6 but i don't think so... for sure it was there in rc8).
I am on kubuntu 15.10 on Intel i5-3317U, using ubuntu kernel-mainline 4.4

Its not clear that suspend/resume is involved. It will happen while using it w/o a resume, but may happen on each resume too(?)

its relatively frequent, ~1-2 times / hr.

$ uname -a
Linux don-s9 4.4.0-040400-generic #201601101930 SMP Mon Jan 11 00:32:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


don@don-s9:~$ dmesg |grep -i drm
[    1.719271] [drm] Initialized drm 1.1.0 20060810
[    1.760871] [drm] Memory usable by graphics device = 2048M
[    1.760875] [drm] Replacing VGA console driver
[    1.767520] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.767523] [drm] Driver supports precise vblank timestamp query.
[    1.791601] [drm] Initialized i915 1.6.0 20151010 for 0000:00:02.0 on minor 0
[    1.930190] fbcon: inteldrmfb (fb0) is primary device
[    3.441414] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[30372.607262] [drm] stuck on render ring
[30372.607938] [drm] GPU HANG: ecode 7:0:0x85ffbff8, in chrome [2873], reason: Ring hung, action: reset
[30372.607940] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[30372.607941] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[30372.607942] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[30372.607943] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[30372.607944] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[30372.608009] WARNING: CPU: 0 PID: 11390 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:11289 intel_mmio_flip_work_func+0x38e/0x3d0 [i915]()
[30372.608059]  snd_seq media iwlwifi x86_pkg_temp_thermal intel_powerclamp btrtl btbcm coretemp snd_seq_device snd_timer btintel joydev bluetooth input_leds serio_raw mei_me cfg80211 snd mei soundcore shpchp lpc_ich acpi_als kfifo_buf industrialio kvm_intel kvm mac_hid irqbypass arc4 ppp_mppe parport_pc ppdev lp parport autofs4 btrfs drbg ansi_cprng algif_skcipher af_alg dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i915 crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper i2c_algo_bit cryptd drm_kms_helper syscopyarea psmouse sysfillrect sysimgblt fb_sys_fops ahci drm libahci r8169 mii wmi fjes video
[30372.610069] drm/i915: Resetting chip after gpu hang
Comment 7 Don Bowman 2016-01-12 11:06:35 UTC
Created attachment 120986 [details]
gpu crash dump

gpu error dump
Comment 8 Julian Andres Klode 2016-01-18 08:27:00 UTC
This seems to be a duplicate of bug 92998
Comment 9 yann 2016-05-20 09:48:15 UTC
*** Bug 92862 has been marked as a duplicate of this bug. ***
Comment 10 yann 2016-09-20 14:18:49 UTC
There were improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring.

In the meantime, assigning to Mesa product (please let me know if I am mistaken with this GPU Hang).


From this error dump, hung is happening in render ring batch with active head at 0x7899c19c, with 0x7a000003 (PIPE_CONTROL) as IPEHR.

Kernel: 4.4.0-rc4
Platform: IvyBridge (pci id: 0x0166)
Mesa: [Please confirm your mesa version]


Batch extract (around 0x7899c19c):

0x7899c16c:      0x780f0000: 3DSTATE_SCISSOR_POINTERS
0x7899c170:      0x00007d80:    scissor rect offset
0x7899c174:      0x7a000003: PIPE_CONTROL
0x7899c178:      0x00002000:    no write, depth stall,
0x7899c17c:      0x00000000:    destination address
0x7899c180:      0x00000000:    immediate dword low
0x7899c184:      0x00000000:    immediate dword high
0x7899c188:      0x7a000003: PIPE_CONTROL
0x7899c18c:      0x00100001:    no write, cs stall, depth cache flush,
0x7899c190:      0x00000000:    destination address
0x7899c194:      0x00000000:    immediate dword low
0x7899c198:      0x00000000:    immediate dword high
0x7899c19c:      0x7a000003: PIPE_CONTROL
0x7899c1a0:      0x00002000:    no write, depth stall,
0x7899c1a4:      0x00000000:    destination address
0x7899c1a8:      0x00000000:    immediate dword low
0x7899c1ac:      0x00000000:    immediate dword high
0x7899c1b0:      0x78050005: 3DSTATE_DEPTH_BUFFER
0x7899c1b4:      0x204c17ff:    dword 1
0x7899c1b8:      0x77b24000:    dword 2
0x7899c1bc:      0x0dac5fe0:    dword 3
0x7899c1c0:      0x00000001:    dword 4
0x7899c1c4:      0x00000000:    dword 5
0x7899c1c8:      0x00000000:    dword 6
Comment 11 Marcin Slusarz 2016-09-20 14:24:20 UTC
I can't re-test it, because I got rid of this laptop.
Comment 12 yann 2016-09-20 14:27:13 UTC
(In reply to Marcin Slusarz from comment #11)
> I can't re-test it, because I got rid of this laptop.

thanks Marcin for your feedback. So let's mesa team decide how they want to proceed here :)
Comment 13 Matt Turner 2016-11-04 00:37:58 UTC
(In reply to Marcin Slusarz from comment #11)
> I can't re-test it, because I got rid of this laptop.

Okay. Not much we can do.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.