Summary: | [HSW] stuck on render ring (screen is repeatedly freezing) | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Jan Viktorin <iviktorin> | ||||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||||
Priority: | medium | CC: | agg, doa379, intel-gfx-bugs, mort.yao, philippe | ||||||||||||||||||||||||
Version: | XOrg git | ||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||
Whiteboard: | ReadyForDev | ||||||||||||||||||||||||||
i915 platform: | HSW | i915 features: | GPU hang | ||||||||||||||||||||||||
Attachments: |
|
Description
Jan Viktorin
2015-01-14 17:13:47 UTC
Created attachment 112229 [details]
full dmesg from kernel 3.14.28
Created attachment 112230 [details]
full dmesg from kernel 3.17.6
Created attachment 112232 [details]
/sys/class/drm/card0/error from 3.14.28
Created attachment 112233 [details]
/sys/class/drm/card0/error from 3.17.6 (gzipped)
Created attachment 112234 [details]
cpuinfo
Created attachment 112236 [details]
lspci -v
Created attachment 112238 [details]
Xorg.log from 3.14.28
Created attachment 112239 [details]
Xorg.log from 3.17.6
We seem to have completely neglected this bug. Apologies. Does the problem persist with latest kernels and userspace? Hello, yes it does. I use Arch Linux so I've got quite new kernels. All of them were buggy. But the problem changes overtime. Now, with 4.4.5-1-ARCH, it sometimes freezes in a way that I cannot click anywhere (or the clicks are delayed) but I cannot see any garbage on the screen anymore (as I could before). I can see those messages in dmesg: [931469.554401] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [931469.554411] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun [931469.563758] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B [931469.563770] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [931530.957849] [drm] stuck on render ring [931530.958314] [drm] GPU HANG: ecode 7:0:0x87d7fefa, in firefox [20825], reason: Ring hung, action: reset [931530.960386] drm/i915: Resetting chip after gpu hang And yes, the problem is usually connected to a web browser. I prefer Firefox. It sometimes does not draw itself immediatelly but with some delay. When playing online videos, it sometimes stops drawing the video (I can just hear the sound). Chromium usually also behaves strange (I cannot confirm at the moment). Just while I am writing this text, a frezee has happened many times: [1182811.842634] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [1182811.842643] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun [1182811.856148] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B [1182811.856159] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [1185127.973155] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [1185127.973167] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun [1185127.984698] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B [1185127.984710] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [1190430.697806] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [1190430.697817] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun [1190430.707191] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B [1190430.707202] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun Is this issue still seen with latest kernel? Yes. With 4.4.31 LTS it does not happen so often. With 4.8.7, it is nearly unusable for me. On both kernels, Chromium usually seems to be frozen but when it redraws it seems to work. Linux version 4.8.7-1-ARCH Nov 16 15:22:16 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [498], reason: Hang on render ring, action: reset Nov 16 15:22:16 pcviktorin kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 16 15:22:16 pcviktorin kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 16 15:22:16 pcviktorin kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 16 15:22:16 pcviktorin kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 16 15:22:16 pcviktorin kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 16 15:22:16 pcviktorin kernel: drm/i915: Resetting chip after gpu hang Nov 16 15:22:39 pcviktorin kernel: drm/i915: Resetting chip after gpu hang Nov 16 15:22:53 pcviktorin kernel: drm/i915: Resetting chip after gpu hang Linux version 4.4.31-1-lts Nov 18 14:30:18 pcviktorin kernel: [drm] stuck on render ring Nov 18 14:30:18 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [342], reason: Ring hung, action: reset Nov 18 14:30:18 pcviktorin kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 18 14:30:18 pcviktorin kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 18 14:30:18 pcviktorin kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 18 14:30:18 pcviktorin kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 18 14:30:18 pcviktorin kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 18 14:30:18 pcviktorin kernel: drm/i915: Resetting chip after gpu hang ... Nov 18 17:02:12 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffc, in chromium [23742], reason: Ring hung, action: reset Nov 18 17:02:12 pcviktorin kernel: drm/i915: Resetting chip after gpu hang ... Nov 18 17:39:02 pcviktorin kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [342], reason: Ring hung, action: reset Nov 18 17:39:02 pcviktorin kernel: drm/i915: Resetting chip after gpu hang Sorry for such short report, I am connected remotely to that computer right now. When I am less busy, I can provide more details from both kernels if you are interested. removing needinfo status since information has been provided Jan, is this still valid with latest kernel? With 4.4.48-1-lts, I can see the following errors in dmesg: [2214848.975943] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [2214848.975954] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun [2214848.980066] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* uncleared fifo underrun on pipe B [2214848.980077] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun The screen DOES NOT freeze anymore. At least, I did not notice it for some time now. What is very strange, Chromium does not redraw. I can click on buttons and they seem work (e.g. close tab quits the application), however, the browser does not show any changes. Please, can you specify the term "latest kernel" next time? Do you mean a certain latest release or the current master? I do not build my own kernels... By "latest kernel" we usually mean: - the latest kernel release from https://kernel.org/, currently v4.10.x - the latest mainline kernel, currently v4.11-rc1 - the drm-tip branch of https://cgit.freedesktop.org/drm-tip which contains all the very latest graphics changes headed for upstream in upcoming kernels There's not much point in asking for testing on specific kernel versions, because it's not unusual for the response to come a kernel release or two later. ;) After upgrade to 4.10.1-1-ARCH, the problem is still there. Mar 10 15:00:41 kernel: Linux version 4.10.1-1-ARCH (builduser@heftig-13232) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP PREEMPT Sun Feb 26 21:08:53 UTC 2017 ... Mar 10 16:17:10 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [426], reason: Hang on render ring, action: reset Mar 10 16:17:10 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Mar 10 16:17:10 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Mar 10 16:17:10 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Mar 10 16:17:10 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Mar 10 16:17:10 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Mar 10 16:17:10 kernel: drm/i915: Resetting chip after gpu hang Mar 10 16:17:18 kernel: drm/i915: Resetting chip after gpu hang The system became irresponsible. After reboot, I have switched to 4.9.13-1-lts and the system crashes while starting the Mate session. I've added stack traces of the mate applets because they have appeared at the same time but they might be unrelated. Mar 10 16:20:08 kernel: Linux version 4.9.13-1-lts (builduser@andyrtr) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP Mon Feb 27 21:32:16 CET 2017 ... Mar 10 16:20:59 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [415], reason: Hang on render ring, action: reset Mar 10 16:20:59 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Mar 10 16:20:59 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Mar 10 16:20:59 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Mar 10 16:20:59 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Mar 10 16:20:59 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Mar 10 16:20:59 kernel: drm/i915: Resetting chip after gpu hang Mar 10 16:21:27 kernel: drm/i915: Resetting chip after gpu hang ... Mar 10 16:21:37 kernel: drm/i915: Resetting chip after gpu hang Mar 10 16:21:37 kernel: mate-settings-d[587]: segfault at 38 ip 00007fc57749a6fd sp 00007ffc5c6b3310 error 4 in libmatemixer-pulse.so[7fc577491000+1f000] Mar 10 16:21:37 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Mar 10 16:21:37 systemd[1]: Started Process Core Dump (PID 763/UID 0). Mar 10 16:21:38 systemd-coredump[764]: Process 587 (mate-settings-d) of user 1000 dumped core. Stack trace of thread 587: #0 0x00007fc57749a6fd n/a (libmatemixer-pulse.so) #1 0x00007fc598f6cf75 g_closure_invoke (libgobject-2.0.so.0) #2 0x00007fc598f7ef82 n/a (libgobject-2.0.so.0) #3 0x00007fc598f87bcc g_signal_emit_valist (libgobject-2.0.so.0) #4 0x00007fc598f87faf g_signal_emit (libgobject-2.0.so.0) #5 0x00007fc57749c211 n/a (libmatemixer-pulse.so) #6 0x00007fc58c1607e3 n/a (libpulse.so.0) #7 0x00007fc57fdbbba1 n/a (libpulsecommon-10.0.so) #8 0x00007fc57728dfaf n/a (libpulse-mainloop-glib.so.0) #9 0x00007fc598c945a7 g_main_context_dispatch (libglib-2.0.so.0) #10 0x00007fc598c94810 n/a (libglib-2.0.so.0) #11 0x00007fc598c94b32 g_main_loop_run (libglib-2.0.so.0) #12 0x00007fc5998793a7 gtk_main (libgtk-x11-2.0.so.0) #13 0x0000000000403dd8 main (mate-settings-daemon) #14 0x00007fc5986a8511 __libc_start_main (libc.so.6) #15 0x0000000000403e7a _start (mate-settings-daemon) Stack trace of thread 589: #0 0x00007fc59876a67d poll (libc.so.6) #1 0x00007fc598c947a6 n/a (libglib-2.0.so.0) #2 0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0) #3 0x00007fc598c94901 n/a (libglib-2.0.so.0) #4 0x00007fc598cbc175 n/a (libglib-2.0.so.0) #5 0x00007fc598a332e7 start_thread (libpthread.so.0) #6 0x00007fc59877454f __clone (libc.so.6) Stack trace of thread 588: #0 0x00007fc59876a67d poll (libc.so.6) #1 0x00007fc598c947a6 n/a (libglib-2.0.so.0) #2 0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0) #3 0x00007fc58e1bc4bd n/a (libdconfsettings.so) #4 0x00007fc598cbc175 n/a (libglib-2.0.so.0) #5 0x00007fc598a332e7 start_thread (libpthread.so.0) #6 0x00007fc59877454f __clone (libc.so.6) Stack trace of thread 590: #0 0x00007fc59876a67d poll (libc.so.6) #1 0x00007fc598c947a6 n/a (libglib-2.0.so.0) #2 0x00007fc598c94b32 g_main_loop_run (libglib-2.0.so.0) #3 0x00007fc59927a446 n/a (libgio-2.0.so.0) #4 0x00007fc598cbc175 n/a (libglib-2.0.so.0) #5 0x00007fc598a332e7 start_thread (libpthread.so.0) #6 0x00007fc59877454f __clone (libc.so.6) Stack trace of thread 600: #0 0x00007fc59876a67d poll (libc.so.6) #1 0x00007fc598c947a6 n/a (libglib-2.0.so.0) #2 0x00007fc598c948bc g_main_context_iteration (libglib-2.0.so.0) #3 0x00007fc596eaec7d n/a (libdconf.so.1) #4 0x00007fc598cbc175 n/a (libglib-2.0.so.0) #5 0x00007fc598a332e7 start_thread (libpthread.so.0) #6 0x00007fc59877454f __clone (libc.so.6) ... Mar 10 16:21:47 kernel: drm/i915: Resetting chip after gpu hang Mar 10 16:21:47 mate-netspeed-a[711]: g_object_unref: assertion 'G_IS_OBJECT (object)' failed Mar 10 16:21:47 kernel: mate-volume-con[610]: segfault at 1 ip 00007f4b213bae45 sp 00007fff79c50638 error 6 in libglib-2.0.so.0.5000.3[7f4b2132b000+111000] ... Mar 10 16:21:48 systemd-coredump[786]: Process 610 (mate-volume-con) of user 1000 dumped core. Stack trace of thread 610: #0 0x00007f4b213bae45 g_mutex_lock (libglib-2.0.so.0) #1 0x00007f4b21373343 g_source_remove_poll (libglib-2.0.so.0) #2 0x00007f4b15bdd041 n/a (libpulse-mainloop-glib.so.0) #3 0x00007f4b15736c99 n/a (libpulsecommon-10.0.so) #4 0x00007f4b1573728e pa_iochannel_free (libpulsecommon-10.0.so) #5 0x00007f4b1574b3ad pa_pstream_unlink (libpulsecommon-10.0.so) #6 0x00007f4b159999e5 n/a (libpulse.so.0) #7 0x00007f4b1599a178 n/a (libpulse.so.0) #8 0x00007f4b1599bae5 n/a (libpulse.so.0) #9 0x00007f4b1599cfea n/a (libpulse.so.0) #10 0x00007f4b15745ba1 n/a (libpulsecommon-10.0.so) #11 0x00007f4b15bdcfaf n/a (libpulse-mainloop-glib.so.0) #12 0x00007f4b213755a7 g_main_context_dispatch (libglib-2.0.so.0) #13 0x00007f4b21375810 n/a (libglib-2.0.so.0) #14 0x00007f4b21375b32 g_main_loop_run (libglib-2.0.so.0) #15 0x00007f4b21e8d3a7 gtk_main (libgtk-x11-2.0.so.0) #16 0x0000000000405700 main (mate-volume-control-applet) #17 0x00007f4b20d89511 __libc_start_main (libc.so.6) #18 0x00000000004057ba _start (mate-volume-control-applet) Stack trace of thread 614: #0 0x00007f4b20e4b67d poll (libc.so.6) #1 0x00007f4b213757a6 n/a (libglib-2.0.so.0) #2 0x00007f4b213758bc g_main_context_iteration (libglib-2.0.so.0) #3 0x00007f4b21375901 n/a (libglib-2.0.so.0) #4 0x00007f4b2139d175 n/a (libglib-2.0.so.0) #5 0x00007f4b211142e7 start_thread (libpthread.so.0) #6 0x00007f4b20e5554f __clone (libc.so.6) Stack trace of thread 615: #0 0x00007f4b20e4b67d poll (libc.so.6) #1 0x00007f4b213757a6 n/a (libglib-2.0.so.0) #2 0x00007f4b21375b32 g_main_loop_run (libglib-2.0.so.0) #3 0x00007f4b1fbc7446 n/a (libgio-2.0.so.0) #4 0x00007f4b2139d175 n/a (libglib-2.0.so.0) #5 0x00007f4b211142e7 start_thread (libpthread.so.0) #6 0x00007f4b20e5554f __clone (libc.so.6) Stack trace of thread 780: #0 0x00007f4b20e4b67d poll (libc.so.6) #1 0x00007f4b159bdee1 n/a (libpulse.so.0) #2 0x00007f4b159af6f1 pa_mainloop_poll (libpulse.so.0) #3 0x00007f4b159afd8e pa_mainloop_iterate (libpulse.so.0) #4 0x00007f4b159afe40 pa_mainloop_run (libpulse.so.0) #5 0x00007f4b159bde29 n/a (libpulse.so.0) #6 0x00007f4b1575bfe8 n/a (libpulsecommon-10.0.so) #7 0x00007f4b211142e7 start_thread (libpthread.so.0) #8 0x00007f4b20e5554f __clone (libc.so.6) Stack trace of thread 781: #0 0x00007f4b20e55541 __clone (libc.so.6) Reporter, is this still valid? Hello, for 4.11.3, there is an improvement, the Chromium now runs well. However there were more quite short but noticable freezes. Jun 01 14:32:43 kernel: Linux version 4.11.3-1-ARCH (builduser@tobias) (gcc version 7.1.1 20170516 (GCC) ) #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 ... Jun 01 14:34:33 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [418], reason: Hang on render ring, action: reset Jun 01 14:34:33 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jun 01 14:34:33 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jun 01 14:34:33 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel iss Jun 01 14:34:33 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jun 01 14:34:33 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jun 01 14:34:33 kernel: drm/i915: Resetting chip after gpu hang ... Jun 01 14:34:33 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [418], reason: Hang on render ring, action: reset Jun 01 14:34:33 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jun 01 14:34:33 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Jun 01 14:34:33 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel iss Jun 01 14:34:33 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Jun 01 14:34:33 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Jun 01 14:34:33 kernel: drm/i915: Resetting chip after gpu hang Jun 01 14:38:00 kernel: drm/i915: Resetting chip after gpu hang Jun 01 14:38:14 kernel: drm/i915: Resetting chip after gpu hang ... Jun 01 14:41:16 kernel: drm/i915: Resetting chip after gpu hang Jun 01 14:41:50 kernel: drm/i915: Resetting chip after gpu hang Jun 01 14:42:10 kernel: drm/i915: Resetting chip after gpu hang Jun 01 14:43:12 kernel: drm/i915: Resetting chip after gpu hang ... Jun 01 14:48:08 kernel: drm/i915: Resetting chip after gpu hang ... Jun 01 14:54:41 kernel: drm/i915: Resetting chip after gpu hang ... Jun 01 15:01:15 kernel: drm/i915: Resetting chip after gpu hang And please attach the latest error state. Created attachment 131667 [details]
/sys/class/drm/card0/error from 4.11.3
Created attachment 131780 [details]
/sys/class/drm/card0/error from 4.9.14
Submitter added logs. Moving bug to reopen status. Created attachment 133144 [details]
/sys/class/drm/card0/error from 4.12.3
Kernel 4.12.3-1-ARCH - usable but irritating...
Jul 31 10:25:50 kernel: microcode: microcode updated early to revision 0x22, date = 2017-01-27
Jul 31 10:25:50 kernel: Linux version 4.12.3-1-ARCH (builduser@nspawn-13499) (gcc version 7.1.1 20170630 (GCC) ) #1 SMP PREEMPT Sat Jul 22 15:32:02 UTC 2017
...
Jul 31 10:27:39 kernel: [drm] GPU HANG: ecode 7:0:0x85dffffd, in Xorg [396], reason: Hang on rcs, action: reset
Jul 31 10:27:39 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 31 10:27:39 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jul 31 10:27:39 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 31 10:27:39 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jul 31 10:27:39 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jul 31 10:27:39 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 10:36:41 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 10:54:46 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 10:55:41 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 10:58:29 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:05:58 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:44:55 kernel: drm/i915: Resetting chip after gpu hang
Jul 31 11:45:29 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:56:20 kernel: drm/i915: Resetting chip after gpu hang
...
Jul 31 11:58:43 kernel: drm/i915: Resetting chip after gpu hang
...
(and many more...)
From error state with 4.12.3: HEAD and ACTHD are the different, so execution was inside the batch. head = 0x00017e30, wraps = 3 ACTHD: 0x00000000 00617e30 at ring: 0x00000000 FAULT_REG: 0x000000c7 Valid Invalid and Unloaded PD Fault (PPGTT) Address 0x00000000 Source ID 24 hangcheck action: dead Last command executed IPEHR: 0x7a000002 ring (rcs) at 0x00000000_00002000; HEAD points to: 0x00000000_00019e30 0x00002000: 0x7a000002: PIPE_CONTROL 0x00002004: 0x00100002: no write, cs stall, stall at scoreboard, 0x00002008: 0x00000000: 0x0000200c: 0x00000000: 0x00002010: 0x7a000002: PIPE_CONTROL 0x00002014: 0x01154c1e: qword write, cs stall, tlb invalidate, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, constant cache invalidate, state cache invalidate, stall at scoreboard, 0x00002018: 0x7fdee080: 0x0000201c: 0x00000000: 0x00002020: 0x18802100: MI_BATCH_BUFFER_START 0x00002024: 0x7fde1000: dword 1 0x00002028: 0x7a000002: PIPE_CONTROL 0x0000202c: 0x001010a1: no write, cs stall, render target cache flush, PIPE_CONTROL flush, DC flush, depth cache flush, 0x00002030: 0x7fdee080: 0x00002034: 0x00000000: 0x00002038: 0x11000001: MI_LOAD_REGISTER_IMM 0x0000203c: 0x00022040: dword 1 0x00002040: 0x00000db4: dword 2 0x00002044: 0x11000001: MI_LOAD_REGISTER_IMM 0x00002048: 0x00012044: dword 1 0x0000204c: 0x00000db4: dword 2 0x00002050: 0x11000001: MI_LOAD_REGISTER_IMM 0x00002054: 0x0001a044: dword 1 0x00002058: 0x00000db4: dword 2 0x0000205c: 0x00000000: MI_NOOP 0x00002060: 0x10800001: MI_STORE_DATA_INDEX 0x00002064: 0x000000c0: index 0x00002068: 0x00000db4: dword 0x0000206c: 0x01000000: MI_USER_INTERRUPT 0x00002070: 0x7a000002: PIPE_CONTROL 0x00002074: 0x00100002: no write, cs stall, stall at scoreboard, 0x00002078: 0x00000000: 0x0000207c: 0x00000000: 0x00002080: 0x7a000002: PIPE_CONTROL 0x00002084: 0x01154c1e: qword write, cs stall, tlb invalidate, instruction cache invalidate, texture cache invalidate, vf fetch invalidate, constant cache invalidate, state cache invalidate, stall at scoreboard, 0x00002088: 0x7fdee080: 0x0000208c: 0x00000000: This sequence repeats through all the log. Hello Jan, just in case, could you try latest mesa (17.3) and xorg. Also have you tried with intel_iommu=igfx_off or i915_enable_rc6=0 parameters on grub?? Thank you. Unfortunately, I could not see any improvement. LTS freezes indefinitely... 4.13.4 freezes occasionally. I've tried separately: kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017 kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=e4f7e7dd-8ae0-4e8b-beea-4185f71cfb4a rw quiet i915_enable_rc6=0 kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017 kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=e4f7e7dd-8ae0-4e8b-beea-4185f71cfb4a rw quiet intel_iommu=igfx_off And than I tried both kernels I have with mesa 17.3 (I didn't mix this with the kernel params, should I?): kernel: Linux version 4.9.67-1-lts (builduser@andyrtr) (gcc version 7.2.1 20171128 (GCC) ) #1 SMP Tue Dec 5 13:07:07 CET 2017 kernel: Linux version 4.13.4-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Thu Sep 28 08:39:52 CEST 2017 ...I forgot to mention the xorg version, it's xorg-server 1.19.5-1. (In reply to Jan Viktorin from comment #27) > Unfortunately, I could not see any improvement. LTS freezes indefinitely... > 4.13.4 freezes occasionally. > ... > And than I tried both kernels I have with mesa 17.3 (I didn't mix this with > the kernel params, should I?): > ... No, 17.3 by itself it's ok. Though it seems that it didn't help :/ May not be related, but which DE are you using? Also, just in case could you try https://bugs.freedesktop.org/show_bug.cgi?id=104411#c5 First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Closing, please re-open if still occurs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.