Bug 98860 - [SKL] GPU HANG: ecode 9:0:0x84dffff8, in X [2571], reason: Ring hung, action: reset
Summary: [SKL] GPU HANG: ecode 9:0:0x84dffff8, in X [2571], reason: Ring hung, action:...
Status: REOPENED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 98910 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-11-26 02:38 UTC by Resuto
Modified: 2016-12-07 17:38 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
(/sys/class/drm/card0/error) attached (93.79 KB, application/gzip)
2016-11-26 02:38 UTC, Resuto
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Resuto 2016-11-26 02:38:17 UTC
Created attachment 128197 [details]
(/sys/class/drm/card0/error) attached

Hello everyone, this is my first bug report so I hope I am doing this right. I have cut the applicable information from dmesg below, and attached /sys/class/drm/card0/error

[  219.706309] ------------[ cut here ]------------
[  219.706316] WARNING: CPU: 3 PID: 20 at drivers/gpu/drm/i915/intel_display.c:11313 intel_mmio_flip_work_func+0x378/0x3c0()
[  219.706318] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
[  219.706319] Modules linked in:
[  219.706321]  uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core ath10k_pci
[  219.706328] CPU: 3 PID: 20 Comm: kworker/3:0 Not tainted 4.4.26-gentoo #16
[  219.706329] Hardware name: ASUSTeK COMPUTER INC. X556UAM/X556UAM, BIOS X556UAM.305 07/06/2016
[  219.706332] Workqueue: events intel_mmio_flip_work_func
[  219.706334]  0000000000000000 ffff880169453d30 ffffffff812fed18 ffff880169453d78
[  219.706338]  ffffffff81c09480 ffff880169453d68 ffffffff8104f441 ffff8801691afc80
[  219.706341]  ffff88016ed94c40 00000000000000c0 ffff88016ed99400 ffff8801691afc80
[  219.706343] Call Trace:
[  219.706348]  [<ffffffff812fed18>] dump_stack+0x4d/0x65
[  219.706352]  [<ffffffff8104f441>] warn_slowpath_common+0x81/0xc0
[  219.706355]  [<ffffffff8104f4c7>] warn_slowpath_fmt+0x47/0x50
[  219.706358]  [<ffffffff8100dfab>] ? __switch_to_xtra+0x11b/0x120
[  219.706360]  [<ffffffff8146b598>] intel_mmio_flip_work_func+0x378/0x3c0
[  219.706364]  [<ffffffff81065308>] process_one_work+0x148/0x400
[  219.706366]  [<ffffffff810658d6>] worker_thread+0x46/0x440
[  219.706369]  [<ffffffff81065890>] ? rescuer_thread+0x2d0/0x2d0
[  219.706372]  [<ffffffff8106a3e4>] kthread+0xc4/0xe0
[  219.706374]  [<ffffffff8106a320>] ? kthread_park+0x50/0x50
[  219.706377]  [<ffffffff8181b45f>] ret_from_fork+0x3f/0x70
[  219.706379]  [<ffffffff8106a320>] ? kthread_park+0x50/0x50
[  219.706381] ---[ end trace 1d9ac7d9dd76653c ]---
[  219.708014] drm/i915: Resetting chip after gpu hang
[  221.706203] [drm] RC6 on

I occasionally encounter freezing while using various applications in x11. I am able to reproduce the freeze 100 percent of the time while using Firefox and visiting https://distrowatch.com/search.php

After scrolling through the page the gpu will hang in about 1-2 seconds of scrolling. This is not the only instance of the gpu freezing, however it is the only instance I can reproduce on demand.

(Linux ASUS-F556U 4.4.26-gentoo #16 SMP Fri Nov 25 18:41:43 EST 2016 x86_64 Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz GenuineIntel GNU/Linux)

I hope this information helps.
Comment 1 yann 2016-11-28 08:55:27 UTC
There were workaround for SKL and improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring: mark as REOPENED if you can reproduce and RESOLVED/* if you cannot reproduce.

 In parallel, assigning to Mesa product.

Kernel:  4.4.26-gentoo
Platform: Skylake (pci id: 0x1916)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0xf34b942c, with 0x7b000005 (3DPRIMITIVE) as IPEHR.

Batch extract (around 0xf34b942c):

0xf34b940c:      0x78090005: 3DSTATE_VERTEX_ELEMENTS
0xf34b9410:      0x02000000:    buffer 0: invalid, type 0x0000, src offset 0x0000 bytes
0xf34b9414:      0x22220000:    (0.0, 0.0, 0.0, 0.0), dst offset 0x00 bytes
0xf34b9418:      0x02f60000:    buffer 0: invalid, type 0x00f6, src offset 0x0000 bytes
0xf34b941c:      0x11230000:    (X, Y, 0.0, 1.0), dst offset 0x00 bytes
0xf34b9420:      0x02f60004:    buffer 0: invalid, type 0x00f6, src offset 0x0004 bytes
0xf34b9424:      0x11230000:    (X, Y, 0.0, 1.0), dst offset 0x00 bytes
Bad length 7 in (null), expected 6-6
0xf34b9428:      0x7b000005: 3DPRIMITIVE: fail sequential
0xf34b942c:      0x00000000:    vertex count
0xf34b9430:      0x0000000c:    start vertex
0xf34b9434:      0x0000017e:    instance count
0xf34b9438:      0x00000001:    start instance
0xf34b943c:      0x00000000:    index bias
0xf34b9440:      0x00000000: MI_NOOP
Comment 2 Resuto 2016-11-28 12:35:44 UTC
My current version of Mesa is 12.0.1
I downloaded the most recent version of mesa (13.0.1) and the most recent kernel (4.8.11) After installing both and rebooting into the new kernel and testing the issue, it still remained.

I was able to find a work around for the time being. I changed the following line to my xorg.conf.d "Device" section

Driver      "intel"

to

Driver      "modesetting"

After adding this line and restarting xorg, the issue was no longer present.
I am not sure if I should mark this as still a bug or not. I also tested the "modesetting" driver with my old kernel, and the older version of mesa and it worked under those conditions as well. If there are any fixes in the newer kernel and mesa that benefit my system under the "intel" driver, I have not come across them yet. I hope this information helps.
Comment 3 Matt Turner 2016-11-28 18:17:24 UTC
Perhaps Chris can take a look.
Comment 4 Resuto 2016-11-28 20:50:06 UTC
Another odd problem I have run into. If an application is fullscreen and is left alone for too long and xscreensaver pops up, it can also cause the same error upon the screensaver removal.

The error can also be reproduced if an application is fullscreen and I Ctrl-alt-f[1-6] to any other tty. Upon returning to the running X server, the gpu hang will occur. I am not sure if either issue is related, though for now I have simply stopped running programs in fullscreen mode, and still use the "modesetting" driver.
Comment 5 Resuto 2016-11-28 21:45:44 UTC
Just to add, if the GPU hang occurs, the tty (f1-6) framebuffer no longer provides output, but instead looks frozen. Although the tty appears to be frozen, commands are still able to be entered. Entering 'startx' will bring up my GUI again, however each tty on the system no longer provides any output until a reboot.
Comment 6 yann 2016-11-30 13:31:42 UTC
*** Bug 98910 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.