Bug 88411

Summary: [HSW] stuck on render ring (screen is repeatedly freezing)
Product: DRI Reporter: Jan Viktorin <iviktorin>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: agg, doa379, intel-gfx-bugs, mort.yao, philippe
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: HSW i915 features: GPU hang
Attachments:
Description Flags
full dmesg from kernel 3.14.28
none
full dmesg from kernel 3.17.6
none
/sys/class/drm/card0/error from 3.14.28
none
/sys/class/drm/card0/error from 3.17.6 (gzipped)
none
cpuinfo
none
lspci -v
none
Xorg.log from 3.14.28
none
Xorg.log from 3.17.6
none
/sys/class/drm/card0/error from 4.11.3
none
/sys/class/drm/card0/error from 4.9.14
none
/sys/class/drm/card0/error from 4.12.3 none

Description Jan Viktorin 2015-01-14 17:13:47 UTC
My screen is freezing occasionally. I use Arch Linux with 3.17.6-1-ARCH kernel (and also 3.14.28-1-lts that seems to be much worse). I know about similar issues (bug 85760, bug 85466, bug 77104) but I am not able to tell whether they are related.

dmesg 3.17.6:
[ 2308.373694] [drm] stuck on render ring
[ 2308.374334] [drm] GPU HANG: ecode 0:0x86dffffd, in Xorg.bin [390], reason: Ring hung, action: reset
[ 2308.374337] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 2308.374338] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 2308.374339] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 2308.374340] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 2308.374341] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 3670.555269] [drm] stuck on render ring
[ 3670.555831] [drm] GPU HANG: ecode 0:0x85dffffd, in Xorg.bin [390], reason: Ring hung, action: reset

dmesg 3.14.28:
[ 1111.947940] [drm] stuck on render ring
[ 1111.947946] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1111.947947] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1111.947948] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1111.947949] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1111.947950] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 4181.641311] [drm] stuck on render ring
[ 5660.939259] [drm] stuck on render ring
Comment 1 Jan Viktorin 2015-01-14 17:14:27 UTC
Created attachment 112229 [details]
full dmesg from kernel 3.14.28
Comment 2 Jan Viktorin 2015-01-14 17:14:47 UTC
Created attachment 112230 [details]
full dmesg from kernel 3.17.6
Comment 3 Jan Viktorin 2015-01-14 17:15:23 UTC
Created attachment 112232 [details]
/sys/class/drm/card0/error from 3.14.28
Comment 4 Jan Viktorin 2015-01-14 17:17:17 UTC
Created attachment 112233 [details]
/sys/class/drm/card0/error from 3.17.6 (gzipped)
Comment 5 Jan Viktorin 2015-01-14 17:17:34 UTC
Created attachment 112234 [details]
cpuinfo
Comment 6 Jan Viktorin 2015-01-14 17:18:23 UTC
Created attachment 112236 [details]
lspci -v
Comment 7 Jan Viktorin 2015-01-14 17:20:52 UTC
Created attachment 112238 [details]
Xorg.log from 3.14.28
Comment 8 Jan Viktorin 2015-01-14 17:21:24 UTC
Created attachment 112239 [details]
Xorg.log from 3.17.6
Comment 9 Jani Nikula 2016-04-21 12:15:36 UTC
We seem to have completely neglected this bug. Apologies.

Does the problem persist with latest kernels and userspace?
Comment 10 Jan Viktorin 2016-04-21 14:39:07 UTC
Hello,

yes it does. I use Arch Linux so I've got quite new kernels. All of them were buggy.
But the problem changes overtime. Now, with 4.4.5-1-ARCH, it sometimes freezes in a
way that I cannot click anywhere (or the clicks are delayed) but I cannot see any
garbage on the screen anymore (as I could before).

I can see those messages in dmesg:

[931469.554401] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[931469.554411] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[931469.563758] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B
[931469.563770] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[931530.957849] [drm] stuck on render ring
[931530.958314] [drm] GPU HANG: ecode 7:0:0x87d7fefa, in firefox [20825], reason: Ring hung, action: reset
[931530.960386] drm/i915: Resetting chip after gpu hang

And yes, the problem is usually connected to a web browser. I prefer Firefox. It sometimes
does not draw itself immediatelly but with some delay. When playing online videos, it sometimes
stops drawing the video (I can just hear the sound). Chromium usually also behaves strange
(I cannot confirm at the moment).

Just while I am writing this text, a frezee has happened many times:

[1182811.842634] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[1182811.842643] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[1182811.856148] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B
[1182811.856159] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[1185127.973155] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[1185127.973167] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[1185127.984698] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B
[1185127.984710] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[1190430.697806] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[1190430.697817] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[1190430.707191] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B
[1190430.707202] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
Comment 11 Jani Saarinen 2016-12-09 11:22:25 UTC
Is this issue still seen with latest kernel?
Comment 12 Jan Viktorin 2016-12-09 11:38:33 UTC
Yes. With 4.4.31 LTS it does not happen so often. With 4.8.7, it is nearly unusable for me.

On both kernels, Chromium usually seems to be frozen but when it redraws it seems to work.

Linux version 4.8.7-1-ARCH

Nov 16 15:22:16 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [498], reason: Hang on render ring, action: reset
Nov 16 15:22:16 pcviktorin kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 16 15:22:16 pcviktorin kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 16 15:22:16 pcviktorin kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 16 15:22:16 pcviktorin kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 16 15:22:16 pcviktorin kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 16 15:22:16 pcviktorin kernel: drm/i915: Resetting chip after gpu hang
Nov 16 15:22:39 pcviktorin kernel: drm/i915: Resetting chip after gpu hang
Nov 16 15:22:53 pcviktorin kernel: drm/i915: Resetting chip after gpu hang

Linux version 4.4.31-1-lts

Nov 18 14:30:18 pcviktorin kernel: [drm] stuck on render ring
Nov 18 14:30:18 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [342], reason: Ring hung, action: reset
Nov 18 14:30:18 pcviktorin kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 18 14:30:18 pcviktorin kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 18 14:30:18 pcviktorin kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 18 14:30:18 pcviktorin kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 18 14:30:18 pcviktorin kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 18 14:30:18 pcviktorin kernel: drm/i915: Resetting chip after gpu hang
...
Nov 18 17:02:12 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffc, in chromium [23742], reason: Ring hung, action: reset
Nov 18 17:02:12 pcviktorin kernel: drm/i915: Resetting chip after gpu hang
...
Nov 18 17:39:02 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [342], reason: Ring hung, action: reset
Nov 18 17:39:02 pcviktorin kernel: drm/i915: Resetting chip after gpu hang

Sorry for such short report, I am connected remotely to that computer right now. When I am less busy, I can provide more details from both kernels if you are interested.
Comment 13 Ricardo 2017-02-23 21:50:41 UTC
removing needinfo status since information has been provided
Comment 14 Jani Saarinen 2017-03-08 07:33:41 UTC
Jan, is this still valid with latest kernel?
Comment 15 Jan Viktorin 2017-03-08 10:14:01 UTC
With 4.4.48-1-lts, I can see the following errors in dmesg:

[2214848.975943] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[2214848.975954] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[2214848.980066] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B
[2214848.980077] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun

The screen DOES NOT freeze anymore. At least, I did not notice it for some time now.

What is very strange, Chromium does not redraw. I can click on buttons and they seem work (e.g. close tab quits the application), however, the browser does not show any changes.

Please, can you specify the term "latest kernel" next time? Do you mean a certain latest release or the current master? I do not build my own kernels...
Comment 16 Jani Nikula 2017-03-08 10:50:41 UTC
By "latest kernel" we usually mean:

- the latest kernel release from https://kernel.org/, currently v4.10.x
- the latest mainline kernel, currently v4.11-rc1
- the drm-tip branch of https://cgit.freedesktop.org/drm-tip which contains all the very latest graphics changes headed for upstream in upcoming kernels

There's not much point in asking for testing on specific kernel versions, because it's not unusual for the response to come a kernel release or two later. ;)
Comment 17 Jan Viktorin 2017-03-10 15:50:00 UTC
After upgrade to 4.10.1-1-ARCH, the problem is still there.

Mar 10 15:00:41 kernel: Linux version 4.10.1-1-ARCH (builduser@heftig-13232) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP PREEMPT Sun Feb 26 21:08:53 UTC 2017
...
Mar 10 16:17:10 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [426], reason: Hang on render ring, action: reset
Mar 10 16:17:10 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 10 16:17:10 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mar 10 16:17:10 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 10 16:17:10 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Mar 10 16:17:10 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Mar 10 16:17:10 kernel: drm/i915: Resetting chip after gpu hang
Mar 10 16:17:18 kernel: drm/i915: Resetting chip after gpu hang

The system became irresponsible. After reboot, I have switched to 4.9.13-1-lts and the system crashes while starting the Mate session. I've added stack traces of the mate applets because they have appeared at the same time but they might be unrelated.

Mar 10 16:20:08 kernel: Linux version 4.9.13-1-lts (builduser@andyrtr) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP Mon Feb 27 21:32:16 CET 2017
...
Mar 10 16:20:59 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [415], reason: Hang on render ring, action: reset
Mar 10 16:20:59 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 10 16:20:59 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Mar 10 16:20:59 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 10 16:20:59 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Mar 10 16:20:59 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Mar 10 16:20:59 kernel: drm/i915: Resetting chip after gpu hang
Mar 10 16:21:27 kernel: drm/i915: Resetting chip after gpu hang
...
Mar 10 16:21:37 kernel: drm/i915: Resetting chip after gpu hang
Mar 10 16:21:37 kernel: mate-settings-d[587]: segfault at 38 ip 00007fc57749a6fd sp 00007ffc5c6b3310 error 4 in libmatemixer-pulse.so[7fc577491000+1f000]
Mar 10 16:21:37 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Mar 10 16:21:37 systemd[1]: Started Process Core Dump (PID 763/UID 0).
Mar 10 16:21:38 systemd-coredump[764]: Process 587 (mate-settings-d) of user 1000 dumped core.
                                                  Stack trace of thread 587:
                                                  #0  0x00007fc57749a6fd n/a (libmatemixer-pulse.so)
                                                  #1  0x00007fc598f6cf75 g_closure_invoke (libgobject-2.0.so.0)
                                                  #2  0x00007fc598f7ef82 n/a (libgobject-2.0.so.0)
                                                  #3  0x00007fc598f87bcc g_signal_emit_valist (libgobject-2.0.so.0)
                                                  #4  0x00007fc598f87faf g_signal_emit (libgobject-2.0.so.0)
                                                  #5  0x00007fc57749c211 n/a (libmatemixer-pulse.so)
                                                  #6  0x00007fc58c1607e3 n/a (libpulse.so.0)
                                                  #7  0x00007fc57fdbbba1 n/a (libpulsecommon-10.0.so)
                                                  #8  0x00007fc57728dfaf n/a (libpulse-mainloop-glib.so.0)
                                                  #9  0x00007fc598c945a7 g_main_context_dispatch (libglib-2.0.so.0)
                                                  #10 0x00007fc598c94810 n/a (libglib-2.0.so.0)
                                                  #11 0x00007fc598c94b32 g_main_loop_run (libglib-2.0.so.0)
                                                  #12 0x00007fc5998793a7 gtk_main (libgtk-x11-2.0.so.0)
                                                  #13 0x0000000000403dd8 main (mate-settings-daemon)
                                                  #14 0x00007fc5986a8511 __libc_start_main (libc.so.6)
                                                  #15 0x0000000000403e7a _start (mate-settings-daemon)
                                                  
                                                  Stack trace of thread 589:
                                                  #0  0x00007fc59876a67d poll (libc.so.6)
                                                  #1  0x00007fc598c947a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0)
                                                  #3  0x00007fc598c94901 n/a (libglib-2.0.so.0)
                                                  #4  0x00007fc598cbc175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007fc598a332e7 start_thread (libpthread.so.0)
                                                  #6  0x00007fc59877454f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 588:
                                                  #0  0x00007fc59876a67d poll (libc.so.6)
                                                  #1  0x00007fc598c947a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0)
                                                  #3  0x00007fc58e1bc4bd n/a (libdconfsettings.so)
                                                  #4  0x00007fc598cbc175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007fc598a332e7 start_thread (libpthread.so.0)
                                                  #6  0x00007fc59877454f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 590:
                                                  #0  0x00007fc59876a67d poll (libc.so.6)
                                                  #1  0x00007fc598c947a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007fc598c94b32 g_main_loop_run (libglib-2.0.so.0)
                                                  #3  0x00007fc59927a446 n/a (libgio-2.0.so.0)
                                                  #4  0x00007fc598cbc175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007fc598a332e7 start_thread (libpthread.so.0)
                                                  #6  0x00007fc59877454f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 600:
                                                  #0  0x00007fc59876a67d poll (libc.so.6)
                                                  #1  0x00007fc598c947a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0)
                                                  #3  0x00007fc596eaec7d n/a (libdconf.so.1)
                                                  #4  0x00007fc598cbc175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007fc598a332e7 start_thread (libpthread.so.0)
                                                  #6  0x00007fc59877454f __clone (libc.so.6)
...
Mar 10 16:21:47 kernel: drm/i915: Resetting chip after gpu hang
Mar 10 16:21:47 mate-netspeed-a[711]: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Mar 10 16:21:47 kernel: mate-volume-con[610]: segfault at 1 ip 00007f4b213bae45 sp 00007fff79c50638 error 6 in libglib-2.0.so.0.5000.3[7f4b2132b000+111000]
...
Mar 10 16:21:48 systemd-coredump[786]: Process 610 (mate-volume-con) of user 1000 dumped core.
                                                  Stack trace of thread 610:
                                                  #0  0x00007f4b213bae45 g_mutex_lock (libglib-2.0.so.0)
                                                  #1  0x00007f4b21373343 g_source_remove_poll (libglib-2.0.so.0)
                                                  #2  0x00007f4b15bdd041 n/a (libpulse-mainloop-glib.so.0)
                                                  #3  0x00007f4b15736c99 n/a (libpulsecommon-10.0.so)
                                                  #4  0x00007f4b1573728e pa_iochannel_free (libpulsecommon-10.0.so)
                                                  #5  0x00007f4b1574b3ad pa_pstream_unlink (libpulsecommon-10.0.so)
                                                  #6  0x00007f4b159999e5 n/a (libpulse.so.0)
                                                  #7  0x00007f4b1599a178 n/a (libpulse.so.0)
                                                  #8  0x00007f4b1599bae5 n/a (libpulse.so.0)
                                                  #9  0x00007f4b1599cfea n/a (libpulse.so.0)
                                                  #10 0x00007f4b15745ba1 n/a (libpulsecommon-10.0.so)
                                                  #11 0x00007f4b15bdcfaf n/a (libpulse-mainloop-glib.so.0)
                                                  #12 0x00007f4b213755a7 g_main_context_dispatch (libglib-2.0.so.0)
                                                  #13 0x00007f4b21375810 n/a (libglib-2.0.so.0)
                                                  #14 0x00007f4b21375b32 g_main_loop_run (libglib-2.0.so.0)
                                                  #15 0x00007f4b21e8d3a7 gtk_main (libgtk-x11-2.0.so.0)
                                                  #16 0x0000000000405700 main (mate-volume-control-applet)
                                                  #17 0x00007f4b20d89511 __libc_start_main (libc.so.6)
                                                  #18 0x00000000004057ba _start (mate-volume-control-applet)
                                                  
                                                  Stack trace of thread 614:
                                                  #0  0x00007f4b20e4b67d poll (libc.so.6)
                                                  #1  0x00007f4b213757a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007f4b213758bc g_main_context_iteration (libglib-2.0.so.0)
                                                  #3  0x00007f4b21375901 n/a (libglib-2.0.so.0)
                                                  #4  0x00007f4b2139d175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007f4b211142e7 start_thread (libpthread.so.0)
                                                  #6  0x00007f4b20e5554f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 615:
                                                  #0  0x00007f4b20e4b67d poll (libc.so.6)
                                                  #1  0x00007f4b213757a6 n/a (libglib-2.0.so.0)
                                                  #2  0x00007f4b21375b32 g_main_loop_run (libglib-2.0.so.0)
                                                  #3  0x00007f4b1fbc7446 n/a (libgio-2.0.so.0)
                                                  #4  0x00007f4b2139d175 n/a (libglib-2.0.so.0)
                                                  #5  0x00007f4b211142e7 start_thread (libpthread.so.0)
                                                  #6  0x00007f4b20e5554f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 780:
                                                  #0  0x00007f4b20e4b67d poll (libc.so.6)
                                                  #1  0x00007f4b159bdee1 n/a (libpulse.so.0)
                                                  #2  0x00007f4b159af6f1 pa_mainloop_poll (libpulse.so.0)
                                                  #3  0x00007f4b159afd8e pa_mainloop_iterate (libpulse.so.0)
                                                  #4  0x00007f4b159afe40 pa_mainloop_run (libpulse.so.0)
                                                  #5  0x00007f4b159bde29 n/a (libpulse.so.0)
                                                  #6  0x00007f4b1575bfe8 n/a (libpulsecommon-10.0.so)
                                                  #7  0x00007f4b211142e7 start_thread (libpthread.so.0)
                                                  #8  0x00007f4b20e5554f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 781:
                                                  #0  0x00007f4b20e55541 __clone (libc.so.6)
Comment 18 Jani Saarinen 2017-05-24 05:45:47 UTC
Reporter, is this still valid?
Comment 19 Jan Viktorin 2017-06-02 08:07:39 UTC
Hello,

for 4.11.3, there is an improvement, the Chromium now runs well. However there were more quite short but noticable freezes.

Jun 01 14:32:43 kernel: Linux version 4.11.3-1-ARCH (builduser@tobias) (gcc version 7.1.1 20170516 (GCC) ) #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017
...
Jun 01 14:34:33 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [418], reason: Hang on render ring, action: reset
Jun 01 14:34:33 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 01 14:34:33 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jun 01 14:34:33 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel iss
Jun 01 14:34:33 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jun 01 14:34:33 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jun 01 14:34:33 kernel: drm/i915: Resetting chip after gpu hang
...
Jun 01 14:34:33 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [418], reason: Hang on render ring, action: reset
Jun 01 14:34:33 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 01 14:34:33 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jun 01 14:34:33 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel iss
Jun 01 14:34:33 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jun 01 14:34:33 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jun 01 14:34:33 kernel: drm/i915: Resetting chip after gpu hang
Jun 01 14:38:00 kernel: drm/i915: Resetting chip after gpu hang
Jun 01 14:38:14 kernel: drm/i915: Resetting chip after gpu hang
...
Jun 01 14:41:16 kernel: drm/i915: Resetting chip after gpu hang
Jun 01 14:41:50 kernel: drm/i915: Resetting chip after gpu hang
Jun 01 14:42:10 kernel: drm/i915: Resetting chip after gpu hang
Jun 01 14:43:12 kernel: drm/i915: Resetting chip after gpu hang
...
Jun 01 14:48:08 kernel: drm/i915: Resetting chip after gpu hang
...
Jun 01 14:54:41 kernel: drm/i915: Resetting chip after gpu hang
...
Jun 01 15:01:15 kernel: drm/i915: Resetting chip after gpu hang
Comment 20 Chris Wilson 2017-06-02 08:21:53 UTC
And please attach the latest error state.
Comment 21 Jan Viktorin 2017-06-02 08:37:29 UTC
Created attachment 131667 [details]
/sys/class/drm/card0/error from 4.11.3
Comment 22 Jan Viktorin 2017-06-07 16:42:57 UTC
Created attachment 131780 [details]
/sys/class/drm/card0/error from 4.9.14
Comment 23 Elizabeth 2017-06-07 21:00:57 UTC
Submitter added logs. Moving bug to reopen status.
Comment 24 Jan Viktorin 2017-07-31 11:24:22 UTC
Created attachment 133144 [details]
/sys/class/drm/card0/error from 4.12.3

Kernel 4.12.3-1-ARCH - usable but irritating...

Jul 31 10:25:50 kernel: microcode: microcode updated early to revision 0x22, date = 2017-01-27
Jul 31 10:25:50 kernel: Linux version 4.12.3-1-ARCH (builduser@nspawn-13499) (gcc version 7.1.1 20170630 (GCC) ) #1 SMP PREEMPT Sat Jul 22 15:32:02 UTC 2017
...
Jul 31 10:27:39 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [396], reason: Hang on rcs, action: reset
Jul 31 10:27:39 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 31 10:27:39 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jul 31 10:27:39 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 31 10:27:39 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jul 31 10:27:39 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jul 31 10:27:39 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 10:36:41 kernel: drm/i915: Resetting chip after gpu hang
...

Jul 31 10:54:46 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 10:55:41 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 10:58:29 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:05:58 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:44:55 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 11:45:29 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:56:20 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:58:43 kernel: drm/i915: Resetting chip after gpu hang
...
(and many more...)
Comment 25 Elizabeth 2017-08-18 15:14:58 UTC
From error state with 4.12.3:
HEAD and ACTHD are the different, so execution was inside the batch.
head = 0x00017e30, wraps = 3
ACTHD: 0x00000000 00617e30
    at ring: 0x00000000
FAULT_REG: 0x000000c7
    Valid
    Invalid and Unloaded PD Fault (PPGTT)
    Address 0x00000000
    Source ID 24
hangcheck action: dead

Last command executed IPEHR: 0x7a000002
ring (rcs) at 0x00000000_00002000; HEAD points to: 0x00000000_00019e30
0x00002000:      0x7a000002: PIPE_CONTROL
0x00002004:      0x00100002:    no write, cs stall, stall at scoreboard, 
0x00002008:      0x00000000:    
0x0000200c:      0x00000000:    
0x00002010:      0x7a000002: PIPE_CONTROL
0x00002014:      0x01154c1e:    qword write, cs stall, tlb invalidate, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, constant cache invalidate, state cache invalidate, stall at scoreboard, 
0x00002018:      0x7fdee080:    
0x0000201c:      0x00000000:    
0x00002020:      0x18802100: MI_BATCH_BUFFER_START
0x00002024:      0x7fde1000:    dword 1
0x00002028:      0x7a000002: PIPE_CONTROL
0x0000202c:      0x001010a1:    no write, cs stall, render target cache flush, PIPE_CONTROL flush, DC flush, depth cache flush, 
0x00002030:      0x7fdee080:    
0x00002034:      0x00000000:    
0x00002038:      0x11000001: MI_LOAD_REGISTER_IMM
0x0000203c:      0x00022040:    dword 1
0x00002040:      0x00000db4:    dword 2
0x00002044:      0x11000001: MI_LOAD_REGISTER_IMM
0x00002048:      0x00012044:    dword 1
0x0000204c:      0x00000db4:    dword 2
0x00002050:      0x11000001: MI_LOAD_REGISTER_IMM
0x00002054:      0x0001a044:    dword 1
0x00002058:      0x00000db4:    dword 2
0x0000205c:      0x00000000: MI_NOOP
0x00002060:      0x10800001: MI_STORE_DATA_INDEX
0x00002064:      0x000000c0:    index
0x00002068:      0x00000db4:    dword
0x0000206c:      0x01000000: MI_USER_INTERRUPT
0x00002070:      0x7a000002: PIPE_CONTROL
0x00002074:      0x00100002:    no write, cs stall, stall at scoreboard, 
0x00002078:      0x00000000:    
0x0000207c:      0x00000000:    
0x00002080:      0x7a000002: PIPE_CONTROL
0x00002084:      0x01154c1e:    qword write, cs stall, tlb invalidate, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, constant cache invalidate, state cache invalidate, stall at scoreboard, 
0x00002088:      0x7fdee080:    
0x0000208c:      0x00000000: 

This sequence repeats through all the log.
Comment 26 Elizabeth 2017-12-07 17:28:53 UTC
Hello Jan, just in case, could you try latest mesa (17.3) and xorg. Also have you tried with intel_iommu=igfx_off or i915_enable_rc6=0 parameters on grub?? 
Thank you.
Comment 27 Jan Viktorin 2017-12-20 12:04:11 UTC
Unfortunately, I could not see any improvement. LTS freezes indefinitely... 4.13.4 freezes occasionally.

I've tried separately:

kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017
kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=e4f7e7dd-8ae0-4e8b-beea-4185f71cfb4a rw quiet i915_enable_rc6=0

kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017
kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=e4f7e7dd-8ae0-4e8b-beea-4185f71cfb4a rw quiet intel_iommu=igfx_off

And than I tried both kernels I have with mesa 17.3 (I didn't mix this with the kernel params, should I?):

kernel: Linux version 4.9.67-1-lts (builduser@andyrtr) (gcc version 7.2.1 20171128 (GCC) ) #1 SMP Tue Dec 5 13:07:07 CET 2017

kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017
Comment 28 Jan Viktorin 2017-12-20 12:08:11 UTC
...I forgot to mention the xorg version, it's xorg-server 1.19.5-1.
Comment 29 Elizabeth 2017-12-20 15:34:29 UTC
(In reply to Jan Viktorin from comment #27)
> Unfortunately, I could not see any improvement. LTS freezes indefinitely...
> 4.13.4 freezes occasionally.
> ...
> And than I tried both kernels I have with mesa 17.3 (I didn't mix this with
> the kernel params, should I?):
> ...
No, 17.3 by itself it's ok. Though it seems that it didn't help :/
Comment 30 Elizabeth 2018-01-18 21:48:27 UTC
May not be related, but which DE are you using?
Also, just in case could you try https://bugs.freedesktop.org/show_bug.cgi?id=104411#c5
Comment 31 Jani Saarinen 2018-03-29 07:11:23 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 32 Jani Saarinen 2018-04-22 15:28:57 UTC
Closing, please re-open if still occurs.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.