Description
Thiago Macieira
2018-05-02 02:36:25 UTC
Please send full dmesg with drm.debug=0x1e from boot to failure? Chris, Imre, any thoughts? Do you really want 2 to 3 days worth of dmesg? (In reply to Thiago Macieira from comment #2) > Do you really want 2 to 3 days worth of dmesg? No. If we thought it would be relevant, it would be in the error state. (In reply to Chris Wilson from comment #3) > (In reply to Thiago Macieira from comment #2) > > Do you really want 2 to 3 days worth of dmesg? > > No. If we thought it would be relevant, it would be in the error state. Then please reopen, as Jani closed by saying: > Please send full dmesg with drm.debug=0x1e from boot to failure? One bit of information that would be useful here is https://patchwork.freedesktop.org/series/42550/ to differentiate between whether the new ELSP submission was loaded and the seqno write went astray or if it died without seeing the new request. Thiago, what do you mean with reopen? What bug specifically? (In reply to Jani Saarinen from comment #6) > Thiago, what do you mean with reopen? What bug specifically? This bug is in NEEDINFO state. That means you're expecting more information from me. If it's not the drm.debug=0x1e for 3 days, then what is it? (In reply to Chris Wilson from comment #5) > One bit of information that would be useful here is > https://patchwork.freedesktop.org/series/42550/ to differentiate between > whether the new ELSP submission was loaded and the seqno write went astray > or if it died without seeing the new request. Applying patches to the kernel means disabling secure boot. I'd rather not, but can do if nothing else solves it. (In reply to Thiago Macieira from comment #7) > (In reply to Jani Saarinen from comment #6) > > Thiago, what do you mean with reopen? What bug specifically? > > This bug is in NEEDINFO state. That means you're expecting more information > from me. If it's not the drm.debug=0x1e for 3 days, then what is it? Well I think we were but Chris do not need that info it seems. This bug was new and never been other state than new => need info: https://bugs.freedesktop.org/show_activity.cgi?id=106342 > > (In reply to Chris Wilson from comment #5) > > One bit of information that would be useful here is > > https://patchwork.freedesktop.org/series/42550/ to differentiate between > > whether the new ELSP submission was loaded and the seqno write went astray > > or if it died without seeing the new request. > > Applying patches to the kernel means disabling secure boot. I'd rather not, > but can do if nothing else solves it. I don't understand you. So I'm marking as though you have all the info you need. If you need more, set back to NEEDINFO and tell me what you need. Created attachment 139501 [details]
card0_error 2018-05-11
dmesg:
[187374.488441] [drm] GPU HANG: ecode 9:0:0x8f5ea223, in chrome [4097], reason: Hang on rcs0, action: reset
[187374.488448] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[187374.488451] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[187374.488454] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[187374.488457] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[187374.488461] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[187374.488508] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[187375.703915] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[187375.703955] i915 0000:00:02.0: Resetting chip after gpu hang
[187376.920268] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[187378.243403] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[187378.590743] asynchronous wait on fence i915:X[2417]/0:15b40 timed out
[187379.566737] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[187379.678708] i915 0000:00:02.0: Failed to reset chip
[187888.334907] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
$ glxinfo
name of display: :0
i965: Failed to submit batchbuffer: Input/output error
(In reply to Thiago Macieira from comment #10) > Created attachment 139501 [details] > card0_error 2018-05-11 That one is a regular userspace hang. With respect to the earlier hangs, we've just applied commit 77dfedb5be03779f9a5d83e323a1b36e32090105 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri May 11 13:11:45 2018 +0100 drm/i915/execlists: Use rmb() to order CSB reads We assume that the CSB is written using the normal ringbuffer coherency protocols, as outlined in kernel/events/ring_buffer.c: * (HW) (DRIVER) * * if (LOAD ->data_tail) { LOAD ->data_head * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->data_head STORE ->data_tail * } So we assume that the HW fulfils its ordering requirements (B), and so we should use a complimentary rmb (C) to ensure that our read of its WRITE pointer is completed before we start accessing the data. The final mb (D) is implied by the uncached mmio we perform to inform the HW of our READ pointer. to drm-tip which may explain why we didn't drain ELSP. (In reply to Chris Wilson from comment #11) > (In reply to Thiago Macieira from comment #10) > > Created attachment 139501 [details] > > card0_error 2018-05-11 > > That one is a regular userspace hang. What does that mean? Is it a Mesa bug? Either way, I don't see how a userspace process should be allowed to do anything that causes other processes to get EIO. Thiago, can you or did you try the latest drm-tip that includes the patch Chris is referring to above? (In reply to Francesco Balestrieri from comment #13) > Thiago, can you or did you try the latest drm-tip that includes the patch > Chris is referring to above? I haven't tried that. I'm not a kernel developer, so I don't have a ready-made kernel build. The best I can do is use the latest release from Linus. The commit in question is not even in the latest -rc yet. (In reply to Thiago Macieira from comment #14) > (In reply to Francesco Balestrieri from comment #13) > > Thiago, can you or did you try the latest drm-tip that includes the patch > > Chris is referring to above? > > I haven't tried that. I'm not a kernel developer, so I don't have a > ready-made kernel build. The best I can do is use the latest release from > Linus. The commit in question is not even in the latest -rc yet. OK. For what it's worth, the instructions to build drm-tip are here: https://01.org/linuxgraphics/documentation/build-guide-0 Created attachment 139619 [details]
card0_error 2018-05-17
dmesg:
[89982.954152] [drm] GPU HANG: ecode 9:0:0x8adfb5fe, in krunner [2949], reason: Hang on rcs0, action: reset
[89982.954155] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[89982.954156] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[89982.954156] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[89982.954157] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[89982.954157] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[89982.954178] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[89984.169411] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89984.169486] i915 0000:00:02.0: Resetting chip after gpu hang
[89985.386115] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89986.148508] asynchronous wait on fence i915:X[2695]/0:171977 timed out
[89986.708558] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89988.032543] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89988.140506] i915 0000:00:02.0: Failed to reset chip
The only difference this time is that it did not happen immediately after resuming from hibernation, but after about a minute. I managed to log in and see my desktop. It wasn't until I tried to use krunner that the hang was reported.
I'm going to go now one week without using the USB-C dock. Let's see if the hang happens without that.
Created attachment 139841 [details]
card0_error 2018-05-29
Nope, the USB-C dock is not an influence. This GPU hang happened without any USB-C connection.
[203461.307996] [drm] GPU HANG: ecode 9:0:0x22bfff23, in krunner [51086], reason: Hang on rcs0, action: reset
[203461.308001] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[203461.308004] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[203461.308006] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[203461.308009] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[203461.308012] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[203461.308074] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[203462.520867] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[203462.520996] i915 0000:00:02.0: Resetting chip after gpu hang
[203463.739120] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[203464.424718] asynchronous wait on fence i915:X[2357]/0:3f9891 timed out
[203465.063100] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[203466.384830] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[203466.492741] i915 0000:00:02.0: Failed to reset chip
Different message today. No card0/error was generated: [113192.331640] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [113192.331704] i915 0000:00:02.0: Resetting chip after gpu hang [113193.547824] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [113194.871950] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [113196.196713] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [113196.302119] i915 0000:00:02.0: Failed to reset chip [113196.334112] asynchronous wait on fence i915:X[2820]/0:1db556 timed out Created attachment 140103 [details]
card0_error 2018-06-08
Are more of these files useful? Is there any new information to be gleaned from them, or are they all saying the same thing?
Created attachment 140288 [details]
card0_error 2018-06-22
4.16.12
[136676.117554] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[136676.117601] i915 0000:00:02.0: Resetting chip after gpu hang
[136677.335666] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[136678.660287] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[136679.982186] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[136680.089493] i915 0000:00:02.0: Failed to reset chip
Created attachment 140582 [details]
card0_error 2018-07-11
kernel 4.17.3
[166313.855821] drm: not enough stolen space for compressed buffer (need 50688000 more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.
[166320.700776] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on rcs0, action: reset
[166320.700778] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[166320.700778] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[166320.700779] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[166320.700780] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[166320.700780] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[166320.700796] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[166321.911857] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[166321.911987] i915 0000:00:02.0: Resetting chip after gpu hang
[166323.115741] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[166324.426461] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[166325.735849] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[166325.843762] i915 0000:00:02.0: Failed to reset chip
[166327.091830] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[166327.715066] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Hi Chris, Francesco, Do you need any more information to progress? Created attachment 140747 [details]
card0_error 2018-07-20
4.17.4
[97626.210963] [drm] GPU HANG: ecode 9:0:0x63ec03e1, in plasmashell [2620], reason: Hang on rcs0, action: reset
[97626.210966] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[97626.210967] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[97626.210968] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[97626.210969] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[97626.210970] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[97626.210993] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[97627.414130] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[97627.414279] i915 0000:00:02.0: Resetting chip after gpu hang
[97628.618081] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[97629.926095] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[97631.234083] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[97631.339429] i915 0000:00:02.0: Failed to reset chip
[97632.587457] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[97632.602102] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[97636.507400] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Created attachment 140944 [details]
card0_error 2018-07-30
4.17.6
[200509.348277] [drm] GPU HANG: ecode 9:0:0xa3edbc82, in chrome [4770], reason: Hang on rcs0, action: reset
[200509.348279] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[200509.348280] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[200509.348280] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[200509.348281] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[200509.348282] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[200509.348297] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[200510.549451] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[200510.549500] i915 0000:00:02.0: Resetting chip after gpu hang
[200511.750749] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[200513.061331] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[200514.369250] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[200514.474645] i915 0000:00:02.0: Failed to reset chip
[200515.646641] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[200515.710684] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 140945 [details]
card0_error 2018-08-02
4.17.6
[164735.714076] [drm] GPU HANG: ecode 9:0:0x8463451a, in kscreenlocker_g [65753], reason: Hang on rcs0, action: reset
[164735.714078] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[164735.714079] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[164735.714080] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[164735.714080] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[164735.714081] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[164735.714095] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[164736.917478] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[164736.917566] i915 0000:00:02.0: Resetting chip after gpu hang
[164738.119008] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[164739.429550] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[164740.738040] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[164740.842805] i915 0000:00:02.0: Failed to reset chip
[164742.010813] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[164742.098849] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[164744.910725] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Created attachment 141027 [details]
card0_error 2018-08-09
4.7.11
[167358.604501] [drm] GPU HANG: ecode 9:0:0x6140dc79, in plasmashell [2636], reason: Hang on rcs0, action: reset
[167358.604503] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[167358.604504] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[167358.604504] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[167358.604505] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[167358.604506] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[167358.604516] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[167359.806616] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[167359.806759] i915 0000:00:02.0: Resetting chip after gpu hang
[167361.009010] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[167361.826462] asynchronous wait on fence i915:X[2471]/0:1a05d timed out
[167362.316299] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[167363.625181] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[167363.730468] i915 0000:00:02.0: Failed to reset chip
[167364.898452] i915 0000:00:02.0: i915_reset_device timed out, cancelling all in-flight rendering.
[167364.965180] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
That's 4.17.11 Now running 4.18.0, which does contain commit 77dfedb5be03779f9a5d83e323a1b36e32090105. Will report if I still experience issues. Created attachment 141384 [details]
card0_error 2018-08-30
kernel 4.18.0, DMC 1.27. No changes in behaviour, having the exact same problem
dmesg:
[182585.295906] [drm] GPU HANG: ecode 9:0:0xdd607401, in plasmashell [2643], reason: hang on rcs0, action: reset
[182585.295910] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[182585.295910] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[182585.295911] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[182585.295912] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[182585.295912] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[182585.295971] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[182585.297223] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[182585.297253] i915 0000:00:02.0: Resetting chip for hang on rcs0
[182585.298805] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[182585.407018] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[182585.514988] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[182585.621681] i915 0000:00:02.0: Failed to reset chip
[182585.623039] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 141389 [details]
card0_error 2018-08-30 (second of the same day)
dmesg:
[27643.613948] [drm] GPU HANG: ecode 9:0:0x283b3249, in X [2799], reason: hang on rcs0, action: reset
[27643.613949] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[27643.613950] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[27643.613950] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[27643.613951] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[27643.613951] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[27643.613966] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[27643.615198] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[27643.615230] i915 0000:00:02.0: Resetting chip for hang on rcs0
[27643.616507] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[27643.723158] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[27643.831267] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[27643.941964] i915 0000:00:02.0: Failed to reset chip
[27643.943312] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[27645.285925] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Created attachment 141783 [details]
card0_error 2018-09-18
[194075.388443] [drm] GPU HANG: ecode 9:0:0x575ec1a7, in kscreenlocker_g [49285], reason: hang on rcs0, action: reset
[194075.388446] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[194075.388448] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[194075.388449] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[194075.388450] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[194075.388452] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[194075.388481] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[194075.389750] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[194075.389827] i915 0000:00:02.0: Resetting chip for hang on rcs0
[194075.392877] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[194075.503680] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[194075.611830] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[194075.718389] i915 0000:00:02.0: Failed to reset chip
[194075.719758] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 141784 [details]
card0_error 2018-09-21
kernel 4.18.8
[110720.786094] [drm] GPU HANG: ecode 9:0:0x61a6fe91, in kwin_x11 [2608], reason: hang on rcs0, action: reset
[110720.786099] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[110720.786101] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[110720.786103] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[110720.786105] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[110720.786107] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[110720.786193] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[110720.787519] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[110720.787622] i915 0000:00:02.0: Resetting chip for hang on rcs0
[110720.789327] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[110720.896415] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[110721.008435] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[110721.119116] i915 0000:00:02.0: Failed to reset chip
[110721.120485] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 141785 [details]
card0_error 2018-09-28
kernel 4.18.8
[197950.408173] [drm] GPU HANG: ecode 9:0:0x8fdfbffe, in kmail [42709], reason: hang on rcs0, action: reset
[197950.408179] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[197950.408182] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[197950.408185] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[197950.408187] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[197950.408190] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[197950.408268] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[197950.409563] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[197950.409656] i915 0000:00:02.0: Resetting chip for hang on rcs0
[197950.411037] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[197950.520735] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[197950.628678] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[197950.735409] i915 0000:00:02.0: Failed to reset chip
[197950.736758] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[197951.811342] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Created attachment 141866 [details]
card0_error 2018-10-03
kernel 4.18.8
[227443.174994] [drm] GPU HANG: ecode 9:0:0x8edb2106, in kmail [23122], reason: hang on rcs0, action: reset
[227443.174997] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[227443.174998] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[227443.174999] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[227443.175001] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[227443.175002] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[227443.175034] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[227443.176287] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[227443.176345] i915 0000:00:02.0: Resetting chip for hang on rcs0
[227443.177689] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[227443.283872] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[227443.395812] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[227443.502465] i915 0000:00:02.0: Failed to reset chip
[227443.503735] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 141968 [details]
card0_error 2018-10-09
kernel 4.18.9
[89670.191894] [drm] GPU HANG: ecode 9:0:0x00815216, in chrome [5645], reason: hang on rcs0, action: reset
[89670.191898] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[89670.191900] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[89670.191902] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[89670.191903] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[89670.191905] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[89670.191944] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[89670.193228] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89670.193313] i915 0000:00:02.0: Resetting chip for hang on rcs0
[89670.194772] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89670.300444] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89670.408385] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89670.519134] i915 0000:00:02.0: Failed to reset chip
[89670.520482] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[89671.894458] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Is there light at the end of the tunnel? Is this fixed in any upcoming version? I assume this is a mesa bug, so changing the product changing the product to Mesa. (In reply to Lakshmi from comment #37) > I assume this is a mesa bug, so changing the product changing the product to > Mesa. Considering it's a GPU hang, why do you assume it's a Mesa bug? (In reply to Thiago Macieira from comment #38) > (In reply to Lakshmi from comment #37) > > I assume this is a mesa bug, so changing the product changing the product to > > Mesa. > > Considering it's a GPU hang, why do you assume it's a Mesa bug? The last 4 error states you added indicate that most of the units of the 3d engine are not busy. ACTHD does not seem to point to a location in the batch. IPEHR is fairly weird too (last executed instruction): 0x9e79016f (WTH is this?) 0x710cdef8 (Still unknown...) 0x70004000 (MEDIA_VFE_STATE, used for compute, but not present in the batch) Also INSTDONE is bonkers : INSTDONE: 0xffdffffe PRB0 Ring Enable: false CS Done: false INSTDONE: 0xffd7fffe PRB0 Ring Enable: false GAM Done: false CS Done: false Usually Ring Enable is true. Does that look more like something that happens with display hangs? There are ioctls returning EIO in a newly-launched process, like glxinfo. I'll get the exact ioctl that is failing next time this happens. To me, that says the problem is inside the kernel. No matter what previous processes did, the kernel ought to honour the new ones. Created attachment 142158 [details]
card0_error 2018-10-23
4.18.12
dmesg:
[409897.549764] [drm] GPU HANG: ecode 9:0:0x57abd315, in chrome [68092], reason: hang on rcs0, action: reset
[409897.549767] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[409897.549767] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[409897.549768] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[409897.549768] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[409897.549769] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[409897.549787] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[409897.551031] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[409897.551066] i915 0000:00:02.0: Resetting chip for hang on rcs0
[409897.552430] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[409897.661286] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[409897.769317] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[409897.875998] i915 0000:00:02.0: Failed to reset chip
[409897.877312] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
strace from glxinfo:
openat(AT_FDCWD, "/etc/drirc", O_RDONLY) = 5</etc/drirc>
read(5</etc/drirc>, "<!--\n\n=========================="..., 4096) = 4096
getrandom("\xb2\x57\x3a\xe1\xb0\xb4\x25\x28", 8, GRND_NONBLOCK) = 8
read(5</etc/drirc>, "tion name=\"allow_glsl_builtin_va"..., 4096) = 4096
read(5</etc/drirc>, "n\" executable=\"AlienIsolation\">\n"..., 4096) = 4096
read(5</etc/drirc>, "lso higher gpu load. -->\n "..., 4096) = 1354
read(5</etc/drirc>, "", 4096) = 0
close(5</etc/drirc>) = 0
openat(AT_FDCWD, "/home/tjmaciei/.drirc", O_RDONLY) = -1 ENOENT (No such file or directory)
getrandom("\x06\x80\xb3\x56\x96\xe9\x0c\x07", 8, GRND_NONBLOCK) = 8
openat(AT_FDCWD, "/etc/drirc", O_RDONLY) = 5</etc/drirc>
read(5</etc/drirc>, "<!--\n\n=========================="..., 4096) = 4096
getrandom("\xca\x88\xbd\x26\xbb\x9e\x85\xfd", 8, GRND_NONBLOCK) = 8
read(5</etc/drirc>, "tion name=\"allow_glsl_builtin_va"..., 4096) = 4096
read(5</etc/drirc>, "n\" executable=\"AlienIsolation\">\n"..., 4096) = 4096
read(5</etc/drirc>, "lso higher gpu load. -->\n "..., 4096) = 1354
read(5</etc/drirc>, "", 4096) = 0
close(5</etc/drirc>) = 0
openat(AT_FDCWD, "/home/tjmaciei/.drirc", O_RDONLY) = -1 ENOENT (No such file or directory)
geteuid() = 1000
getuid() = 1000
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfbb0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_EXECBUFFER2, 0x7ffc087cfbb0) = -1 ENOENT (No such file or directory)
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
futex(0x7f667f85f4e8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_GET_APERTURE, 0x7ffc087cfca0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfbd0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_TILING, 0x7ffc087cfb20) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfbc4) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_GEM_CLOSE, 0x7ffc087cfb90) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_REG_READ, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GET_RESET_STATS, 0x7ffc087cfca0) = 0
brk(0x558cd1dd0000) = 0x558cd1dd0000
brk(0x558cd1df1000) = 0x558cd1df1000
brk(0x558cd1e12000) = 0x558cd1e12000
brk(0x558cd1e33000) = 0x558cd1e33000
brk(0x558cd1e54000) = 0x558cd1e54000
brk(0x558cd1e75000) = 0x558cd1e75000
brk(0x558cd1e96000) = 0x558cd1e96000
brk(0x558cd1eb7000) = 0x558cd1eb7000
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GETPARAM, 0x7ffc087cfc00) = 0
geteuid() = 1000
getuid() = 1000
getuid() = 1000
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 5<socket:[12127798]>
connect(5<socket:[12127798]>, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
sendto(5<socket:[12127798]>, "\2\0\0\0\v\0\0\0\7\0\0\0passwd\0", 19, MSG_NOSIGNAL, NULL, 0) = 19
poll([{fd=5<socket:[12127798]>, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=5, revents=POLLIN|POLLHUP}])
recvmsg(5<socket:[12127798]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="passwd\0", iov_len=7}, {iov_base="\310O\3\0\0\0\0\0", iov_len=8}], msg_iovlen=2, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[6</var/lib/nscd/passwd>]}], msg_controllen=20, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 15
mmap(NULL, 217032, PROT_READ, MAP_SHARED, 6</var/lib/nscd/passwd>, 0) = 0x7f6681489000
close(6</var/lib/nscd/passwd>) = 0
close(5<socket:[12127798]>) = 0
stat("/home/tjmaciei", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/tjmaciei/.cache", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/tjmaciei/.cache", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/home/tjmaciei/.cache/mesa_shader_cache", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/home/tjmaciei/.cache/mesa_shader_cache/index", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 5</home/tjmaciei/dev/cache/mesa_shader_cache/index>
fstat(5</home/tjmaciei/dev/cache/mesa_shader_cache/index>, {st_mode=S_IFREG|0644, st_size=1310728, ...}) = 0
mmap(NULL, 1310728, PROT_READ|PROT_WRITE, MAP_SHARED, 5</home/tjmaciei/dev/cache/mesa_shader_cache/index>, 0) = 0x7f667eb7f000
close(5</home/tjmaciei/dev/cache/mesa_shader_cache/index>) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f667e37e000
mprotect(0x7f667e37f000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x7f667eb7dfb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f667eb7e9d0, tls=0x7f667eb7e700, child_tidptr=0x7f667eb7e9d0) = 3665
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sched_setscheduler(3665, SCHED_IDLE, [0]) = 0
futex(0x7f667f7b6d80, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/dev/urandom", O_RDONLY) = 5</dev/urandom>
read(5</dev/urandom>, "\334x\331\366\315wu\265\0364\227\321\363\346r\222", 16) = 16
close(5</dev/urandom>) = 0
brk(0x558cd1ed8000) = 0x558cd1ed8000
getpid() = 3664
getpid() = 3664
getpid() = 3664
getpid() = 3664
getpid() = 3664
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f667e33d000
poll([{fd=3<socket:[12129015]>, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3<socket:[12129015]>, [{iov_base="\227#Z\3\1\0\0\0\4\0\0\0\1\0\0\0\t\r\0\0006\0\0\0\1\0\0\0\4\0\0\0"..., iov_len=3488}], 1) = 3488
poll([{fd=3<socket:[12129015]>, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3<socket:[12129015]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\247\33\0j\1\0\0\"\0\227\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
munmap(0x7f667e33d000, 266240) = 0
getpid() = 3664
getpid() = 3664
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfb80) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfb74) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfb80) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfb74) = 0
brk(0x558cd1ef9000) = 0x558cd1ef9000
openat(AT_FDCWD, "/etc/drirc", O_RDONLY) = 5</etc/drirc>
read(5</etc/drirc>, "<!--\n\n=========================="..., 4096) = 4096
getrandom("\x65\xdd\x4e\xa9\x48\x6e\xaf\x7a", 8, GRND_NONBLOCK) = 8
read(5</etc/drirc>, "tion name=\"allow_glsl_builtin_va"..., 4096) = 4096
read(5</etc/drirc>, "n\" executable=\"AlienIsolation\">\n"..., 4096) = 4096
read(5</etc/drirc>, "lso higher gpu load. -->\n "..., 4096) = 1354
read(5</etc/drirc>, "", 4096) = 0
close(5</etc/drirc>) = 0
openat(AT_FDCWD, "/home/tjmaciei/.drirc", O_RDONLY) = -1 ENOENT (No such file or directory)
brk(0x558cd1f1b000) = 0x558cd1f1b000
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfba0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfb94) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_MMAP, 0x7ffc087cfba0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfba0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfb94) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_MMAP, 0x7ffc087cfba0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CONTEXT_CREATE, 0x7ffc087cfc20) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfbc0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfbb4) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfbb0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfba4) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_MMAP, 0x7ffc087cfbb0) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_CREATE, 0x7ffc087cfb60) = 0
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_SET_DOMAIN, 0x7ffc087cfb54) = 0
poll([{fd=3<socket:[12129015]>, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3<socket:[12129015]>, [{iov_base="\227\"\r\0\3\0`\7\233\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\3\0\0\0\221 \0\0"..., iov_len=56}], 1) = 56
poll([{fd=3<socket:[12129015]>, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3<socket:[12129015]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\1\36\0\0\0\0\0\7\0\240\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
getpid() = 3664
getpid() = 3664
getpid() = 3664
recvmsg(3<socket:[12129015]>, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3<socket:[12129015]>, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable)
getpid() = 3664
mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f668147b000
poll([{fd=3<socket:[12129015]>, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3<socket:[12129015]>, [{iov_base="N\0\4\0\1\0`\7j\1\0\0\22\1\0\0\1\30\f\0\4\0`\7j\1\0\0\0\0\0\0"..., iov_len=72}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 72
poll([{fd=3<socket:[12129015]>, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3<socket:[12129015]>, [{iov_base="b\0\3\0\4\0\0\0DRI2", iov_len=12}], 1) = 12
poll([{fd=3<socket:[12129015]>, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3<socket:[12129015]>, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0\"\0\0\0\0\0\1\232w\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
ioctl(4</dev/dri/card0>, DRM_IOCTL_I915_GEM_EXECBUFFER2, 0x7ffc087d0220) = -1 EIO (Input/output error)
write(2</dev/pts/2>, "i965: Failed to submit batchbuff"..., 55i965: Failed to submit batchbuffer: Input/output error
) = 55
futex(0x558cd1eaea50, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x558cd1eaea00, FUTEX_WAKE_PRIVATE, 1) = 1
getpid() = 3664
exit_group(1) = ?
As you can see near the end, the ioctl for DRM_IOCTL_I915_GEM_EXECBUFFER2 ends in EIO. This indicates the problem is in the kernel.
Created attachment 142266 [details]
card0_error 2018-10-29
4.18.12:
[304462.511265] [drm] GPU HANG: ecode 9:0:0x6f656195, in krunner [3106], reason: hang on rcs0, action: reset
[304462.511267] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[304462.511267] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[304462.511268] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[304462.511268] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[304462.511268] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[304462.511291] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[304462.512522] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[304462.512554] i915 0000:00:02.0: Resetting chip for hang on rcs0
[304462.513873] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[304462.621208] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[304462.729200] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[304462.835854] i915 0000:00:02.0: Failed to reset chip
[304462.837177] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
What version of Mesa are you running? (In reply to Lionel Landwerlin from comment #43) > What version of Mesa are you running? 18.1.7 currently. Created attachment 142352 [details]
card0_error 2018-11-02
kernel 4.18.15
[144096.419245] [drm] GPU HANG: ecode 9:0:0x8adec402, in kmail [124429], reason: hang on rcs0, action: reset
[144096.419249] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[144096.419251] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[144096.419252] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[144096.419253] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[144096.419255] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[144096.419292] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[144096.420545] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[144096.420610] i915 0000:00:02.0: Resetting chip for hang on rcs0
[144096.421946] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[144096.530458] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[144096.638465] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[144096.745251] i915 0000:00:02.0: Failed to reset chip
[144096.746500] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142394 [details]
card0_error 2018-11-06
4.18.15
[190495.326290] [drm] GPU HANG: ecode 9:0:0x60a3ff22, in qtcreator [5792], reason: hang on rcs0, action: reset
[190495.326296] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[190495.326306] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[190495.326309] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[190495.326312] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[190495.326316] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[190495.326362] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[190495.327658] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[190495.327748] i915 0000:00:02.0: Resetting chip for hang on rcs0
[190495.329158] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[190495.436783] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[190495.544748] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[190495.655468] i915 0000:00:02.0: Failed to reset chip
[190495.656807] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[190497.643368] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Created attachment 142494 [details]
card0_error 2018-11-16
4.18.15
[251467.019461] [drm] GPU HANG: ecode 9:0:0xaedce18e, in chrome [3345], reason: hang on rcs0, action: reset
[251467.019512] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[251467.020747] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[251467.020802] i915 0000:00:02.0: Resetting chip for hang on rcs0
[251467.022269] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[251467.130920] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[251467.238851] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[251467.345543] i915 0000:00:02.0: Failed to reset chip
[251467.346885] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Are we convinced already that this is NOT a Mesa bug, but an i915/firmware one? $ ltrace glxinfo XOpenDisplay(0, 0x7fffcd6ced08, 0x7fffcd6ceae0, 0x7f9883fa2718) = 0x55e492342d00 __printf_chk(1, 0x55e491f86408, 0x55e492343f50, 0name of display: :0 ) = 20 glXChooseVisual(0x55e492342d00, 0, 0x55e491f8e200, 0) = 0x55e492371e50 XFree(0x55e492371e50, 0x55e4924b7130, 1, 0) = 1 glXChooseFBConfig(0x55e492342d00, 0, 0x7fffcd6ce9d0, 0x7fffcd6ce960) = 0x55e4923720e0 glXQueryExtensionsString(0x55e492342d00, 0, 1, 6) = 0x55e4923726f0 strstr("GLX_ARB_create_context GLX_ARB_c"..., "GLX_ARB_create_context_profile") = "GLX_ARB_create_context_profile G"... strlen("GLX_ARB_create_context_profile") = 30 glXGetProcAddress(0x55e491f86124, 0x55e491f86998, 0x55e491f86998, 24) = 0x7f9884201fb0 XSetErrorHandler(0x55e491f82cd0, 0, 5, 0) = 0x7f9883ff00d0 XSetErrorHandler(0x7f9883ff00d0, 1, 0x55e49238cc70, 0) = 0x55e491f82cd0 XSetErrorHandler(0x55e491f82cd0, 0x55e4924aae30, 5, 5) = 0x7f9883ff00d0 XSetErrorHandler(0x7f9883ff00d0, 0, 0x7f9884210320, 1) = 0x55e491f82cd0 glXIsDirect(0x55e492342d00, 0x55e492372f50, 0x7f9884210320, 1) = 1 glXGetVisualFromFBConfig(0x55e492342d00, 0x55e4924aae30, 1, 0) = 0x55e492371e50 XFree(0x55e4923720e0, 0x55e492344ef0, 0, 0x55e4923451a0) = 1 XCreateColormap(0x55e492342d00, 362, 0x55e49234e7e0, 0) = 0x1c00001 XCreateWindow(0x55e492342d00, 362, 0, 0) = 0x1c00004 glXMakeCurrent(0x55e492342d00, 0x1c00004, 0x55e492372f50, 0xeff5i965: Failed to submit batchbuffer: Input/output error <no return ...> +++ exited (status 1) +++ Created attachment 142531 [details]
card0_error 2018-11-20
kernel 4.19.1
[87496.664193] IPv6: ADDRCONF(NETDEV_CHANGE): wlp58s0: link becomes ready
[87498.501292] [drm] GPU HANG: ecode 9:0:0x0020b097, in X [2030], reason: hang on rcs0, action: reset
[87498.501302] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[87498.501307] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[87498.501311] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[87498.501315] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[87498.501320] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[87498.502367] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[87498.504230] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[87498.506822] i915 0000:00:02.0: Resetting chip for hang on rcs0
[87498.509713] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[87498.618498] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[87498.726635] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[87498.832790] i915 0000:00:02.0: Failed to reset chip
[87498.835687] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142683 [details]
card0_error 2018-12-01
4.19.2
[254508.104715] [drm] GPU HANG: ecode 9:0:0x8fdfbffe, in chrome [5113], reason: hang on rcs0, action: reset
[254508.104720] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[254508.104720] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[254508.104721] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[254508.104722] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[254508.104723] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[254508.105734] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[254508.107470] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.107522] i915 0000:00:02.0: Resetting chip for hang on rcs0
[254508.110271] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.216901] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.325034] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.431224] i915 0000:00:02.0: Failed to reset chip
[254508.434066] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142684 [details]
card0_error 2018-12-01
4.19.2
[254508.104715] [drm] GPU HANG: ecode 9:0:0x8fdfbffe, in chrome [5113], reason: hang on rcs0, action: reset
[254508.104720] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[254508.104720] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[254508.104721] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[254508.104722] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[254508.104723] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[254508.105734] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[254508.107470] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.107522] i915 0000:00:02.0: Resetting chip for hang on rcs0
[254508.110271] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.216901] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.325034] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[254508.431224] i915 0000:00:02.0: Failed to reset chip
[254508.434066] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142743 [details]
card0_error 2018-12-06
4.19.2:
[247123.117705] [drm] GPU HANG: ecode 9:0:0x4144fc23, in kscreenlocker_g [103789], reason: hang on rcs0, action: reset
[247123.117707] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[247123.117708] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[247123.117709] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[247123.117710] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[247123.117711] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[247123.118721] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[247123.120463] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[247123.121185] i915 0000:00:02.0: Resetting chip for hang on rcs0
[247123.124833] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[247123.234729] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[247123.346730] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[247123.456988] i915 0000:00:02.0: Failed to reset chip
[247123.459741] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142744 [details]
card0_error_2018-12-07_lenovo_S300
I have Lenovo S300, Kabylake, kernel 4.19.4, Arch, KDE desktop + chromium. Been getting this issue ever since I've started to use this laptop (a year or so). Crash happen only after resume from hibernation. Sometimes immediately when logging back, sometimes in a matter of minutes. [56572.472400] [drm] GPU HANG: ecode 9:0:0x893bdd9d, in chromium [15360], reason: hang on rcs0, action: reset [56572.472402] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [56572.472403] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [56572.472403] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [56572.472404] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [56572.472404] [drm] GPU crash dump saved to /sys/class/drm/card0/error [56572.473413] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 [56572.475145] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [56572.475181] i915 0000:00:02.0: Resetting chip for hang on rcs0 [56572.477939] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [56572.585555] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [56572.692218] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [56572.797101] i915 0000:00:02.0: Failed to reset chip [56572.799896] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout Moved back from Mesa. This is not a Mesa bug. Moved back from ASSIGNED, since clearly no one is working on this. Created attachment 142755 [details]
card0_error 2018-12-08
4.19.2
[49041.720961] [drm] GPU HANG: ecode 9:0:0x848acd64, in X [2032], reason: hang on rcs0, action: reset
[49041.720964] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[49041.720965] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[49041.720965] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[49041.720966] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[49041.720966] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[49041.721986] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[49041.723719] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[49041.723754] i915 0000:00:02.0: Resetting chip for hang on rcs0
[49041.726515] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[49041.833834] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[49041.941847] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[49042.048105] i915 0000:00:02.0: Failed to reset chip
[49042.050856] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
Created attachment 142810 [details]
card0_error 2018-12-13
4.19.7:
[171009.688247] [drm] GPU HANG: ecode 9:0:0x9bfd1292, in X [2016], reason: hang on rcs0, action: reset
[171009.688253] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[171009.688256] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[171009.688258] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[171009.688261] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[171009.688264] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[171009.689304] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[171009.691076] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[171009.691155] i915 0000:00:02.0: Resetting chip for hang on rcs0
[171009.693938] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[171009.802277] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[171009.910299] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[171010.016473] i915 0000:00:02.0: Failed to reset chip
[171010.019319] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[171015.308387] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Possibly related to bug 108717. Created attachment 143049 [details]
card0_error 2019-01-09
4.19.11:
[86214.433065] [drm] GPU HANG: ecode 9:0:0xb7dea192, in X [2061], reason: hang on rcs0, action: reset
[86214.433068] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[86214.433069] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[86214.433069] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[86214.433070] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[86214.433071] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[86214.434080] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[86214.435813] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[86214.435852] i915 0000:00:02.0: Resetting chip for hang on rcs0
[86214.438619] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[86214.545725] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[86214.653728] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[86214.759982] i915 0000:00:02.0: Failed to reset chip
[86214.762731] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[86216.711990] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
I'm on kernel 4.20.3 since a few days and the have not yet experienced a crash. (In reply to Romek from comment #60) > I'm on kernel 4.20.3 since a few days and the have not yet experienced a > crash. Thanks, that's good to know. openSUSE Tumbleweed has 4.20.0 available now. I'll upgrade and see what happens. (In reply to Romek from comment #60) > I'm on kernel 4.20.3 since a few days and the have not yet experienced a > crash. Uptime now 8.5 days on 4.20.0, which is a good statistic confidence that it's fixed. Let's wait for 14 days, which is unheard of. Uptime is now 11 days on 4.20.0. Statistically speaking, this bug is fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.