Bug 98860

Summary:

[SKL] GPU HANG: ecode 9:0:0x84dffff8, in X [2571], reason: Ring hung, action: reset

Product:

xorg

Reporter:

Resuto <jehtorosun>

Component:

Driver/intel

Assignee:

Chris Wilson <chris>

Status:

RESOLVED MOVED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

intel-gfx-bugs, jehtorosun, mark.a.janes, vinil

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

SKL

i915 features:

GPU hang

Attachments:

Description	Flags
(/sys/class/drm/card0/error) attached	none

Description Resuto 2016-11-26 02:38:17 UTC

Created attachment 128197 [details]
(/sys/class/drm/card0/error) attached

Hello everyone, this is my first bug report so I hope I am doing this right. I have cut the applicable information from dmesg below, and attached /sys/class/drm/card0/error

[  219.706309] ------------[ cut here ]------------
[  219.706316] WARNING: CPU: 3 PID: 20 at drivers/gpu/drm/i915/intel_display.c:11313 intel_mmio_flip_work_func+0x378/0x3c0()
[  219.706318] WARN_ON(__i915_wait_request(mmio_flip->req, mmio_flip->crtc->reset_counter, false, NULL, &mmio_flip->i915->rps.mmioflips))
[  219.706319] Modules linked in:
[  219.706321]  uvcvideo x86_pkg_temp_thermal videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core ath10k_pci
[  219.706328] CPU: 3 PID: 20 Comm: kworker/3:0 Not tainted 4.4.26-gentoo #16
[  219.706329] Hardware name: ASUSTeK COMPUTER INC. X556UAM/X556UAM, BIOS X556UAM.305 07/06/2016
[  219.706332] Workqueue: events intel_mmio_flip_work_func
[  219.706334]  0000000000000000 ffff880169453d30 ffffffff812fed18 ffff880169453d78
[  219.706338]  ffffffff81c09480 ffff880169453d68 ffffffff8104f441 ffff8801691afc80
[  219.706341]  ffff88016ed94c40 00000000000000c0 ffff88016ed99400 ffff8801691afc80
[  219.706343] Call Trace:
[  219.706348]  [<ffffffff812fed18>] dump_stack+0x4d/0x65
[  219.706352]  [<ffffffff8104f441>] warn_slowpath_common+0x81/0xc0
[  219.706355]  [<ffffffff8104f4c7>] warn_slowpath_fmt+0x47/0x50
[  219.706358]  [<ffffffff8100dfab>] ? __switch_to_xtra+0x11b/0x120
[  219.706360]  [<ffffffff8146b598>] intel_mmio_flip_work_func+0x378/0x3c0
[  219.706364]  [<ffffffff81065308>] process_one_work+0x148/0x400
[  219.706366]  [<ffffffff810658d6>] worker_thread+0x46/0x440
[  219.706369]  [<ffffffff81065890>] ? rescuer_thread+0x2d0/0x2d0
[  219.706372]  [<ffffffff8106a3e4>] kthread+0xc4/0xe0
[  219.706374]  [<ffffffff8106a320>] ? kthread_park+0x50/0x50
[  219.706377]  [<ffffffff8181b45f>] ret_from_fork+0x3f/0x70
[  219.706379]  [<ffffffff8106a320>] ? kthread_park+0x50/0x50
[  219.706381] ---[ end trace 1d9ac7d9dd76653c ]---
[  219.708014] drm/i915: Resetting chip after gpu hang
[  221.706203] [drm] RC6 on

I occasionally encounter freezing while using various applications in x11. I am able to reproduce the freeze 100 percent of the time while using Firefox and visiting https://distrowatch.com/search.php

After scrolling through the page the gpu will hang in about 1-2 seconds of scrolling. This is not the only instance of the gpu freezing, however it is the only instance I can reproduce on demand.

(Linux ASUS-F556U 4.4.26-gentoo #16 SMP Fri Nov 25 18:41:43 EST 2016 x86_64 Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz GenuineIntel GNU/Linux)

I hope this information helps.

Comment 1 yann 2016-11-28 08:55:27 UTC

There were workaround for SKL and improvements pushed in kernel and Mesa that will benefit to your system, so please re-test with latest kernel & Mesa to see if this issue is still occurring: mark as REOPENED if you can reproduce and RESOLVED/* if you cannot reproduce.

 In parallel, assigning to Mesa product.

Kernel:  4.4.26-gentoo
Platform: Skylake (pci id: 0x1916)
Mesa: [Please confirm your mesa version]

From this error dump, hung is happening in render ring batch with active head at 0xf34b942c, with 0x7b000005 (3DPRIMITIVE) as IPEHR.

Batch extract (around 0xf34b942c):

0xf34b940c:      0x78090005: 3DSTATE_VERTEX_ELEMENTS
0xf34b9410:      0x02000000:    buffer 0: invalid, type 0x0000, src offset 0x0000 bytes
0xf34b9414:      0x22220000:    (0.0, 0.0, 0.0, 0.0), dst offset 0x00 bytes
0xf34b9418:      0x02f60000:    buffer 0: invalid, type 0x00f6, src offset 0x0000 bytes
0xf34b941c:      0x11230000:    (X, Y, 0.0, 1.0), dst offset 0x00 bytes
0xf34b9420:      0x02f60004:    buffer 0: invalid, type 0x00f6, src offset 0x0004 bytes
0xf34b9424:      0x11230000:    (X, Y, 0.0, 1.0), dst offset 0x00 bytes
Bad length 7 in (null), expected 6-6
0xf34b9428:      0x7b000005: 3DPRIMITIVE: fail sequential
0xf34b942c:      0x00000000:    vertex count
0xf34b9430:      0x0000000c:    start vertex
0xf34b9434:      0x0000017e:    instance count
0xf34b9438:      0x00000001:    start instance
0xf34b943c:      0x00000000:    index bias
0xf34b9440:      0x00000000: MI_NOOP

Comment 2 Resuto 2016-11-28 12:35:44 UTC

My current version of Mesa is 12.0.1
I downloaded the most recent version of mesa (13.0.1) and the most recent kernel (4.8.11) After installing both and rebooting into the new kernel and testing the issue, it still remained.

I was able to find a work around for the time being. I changed the following line to my xorg.conf.d "Device" section

Driver      "intel"

to

Driver      "modesetting"

After adding this line and restarting xorg, the issue was no longer present.
I am not sure if I should mark this as still a bug or not. I also tested the "modesetting" driver with my old kernel, and the older version of mesa and it worked under those conditions as well. If there are any fixes in the newer kernel and mesa that benefit my system under the "intel" driver, I have not come across them yet. I hope this information helps.

Comment 3 Matt Turner 2016-11-28 18:17:24 UTC

Perhaps Chris can take a look.

Comment 4 Resuto 2016-11-28 20:50:06 UTC

Another odd problem I have run into. If an application is fullscreen and is left alone for too long and xscreensaver pops up, it can also cause the same error upon the screensaver removal.

The error can also be reproduced if an application is fullscreen and I Ctrl-alt-f[1-6] to any other tty. Upon returning to the running X server, the gpu hang will occur. I am not sure if either issue is related, though for now I have simply stopped running programs in fullscreen mode, and still use the "modesetting" driver.

Comment 5 Resuto 2016-11-28 21:45:44 UTC

Just to add, if the GPU hang occurs, the tty (f1-6) framebuffer no longer provides output, but instead looks frozen. Although the tty appears to be frozen, commands are still able to be entered. Entering 'startx' will bring up my GUI again, however each tty on the system no longer provides any output until a reboot.

Comment 6 yann 2016-11-30 13:31:42 UTC

*** Bug 98910 has been marked as a duplicate of this bug. ***

Comment 7 Martin Peres 2019-11-27 13:46:46 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/130.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.