|Summary:||[SKL] GPU hangs running KiCad|
|Product:||Mesa||Reporter:||Ian Eure <ian>|
|Component:||Drivers/DRI/i965||Assignee:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|Status:||RESOLVED FIXED||QA Contact:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|Priority:||medium||CC:||andreas, bgamari, intel-gfx-bugs, luis.botello.ortega, stark3y, trygvis|
|i915 platform:||i915 features:|
GPU crash dump
GPU hang dump on 4.15.12 / Mesa 17.3.7
Description Ian Eure 2017-11-03 05:04:17 UTC
Created attachment 135219 [details] GPU crash dump KiCad reliably causes GPU hangs which kill my entire X session. Hovering over the timeline control in VLC often causes a "mini hang," where the system becomes unresponsive for 30s or so, then recovers. Hardware is a HP Z2 Mini, Intel Graphics, and a 4K display. Software is Debian 9.2 "Stretch", current as of today, 2017-11-02. I've tried various things like disabling RC6 and using the xf86 driver instead of modesetting, but it continues to crash. Nov 2 21:54:01 up kernel: [ 141.831194] [drm] GPU HANG: ecode 9:0:0x85dfbfff, in Xwayland , reason: Hang on render ring, action: reset Nov 2 21:54:01 up kernel: [ 141.831196] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 2 21:54:01 up kernel: [ 141.831197] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 2 21:54:01 up kernel: [ 141.831198] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 2 21:54:01 up kernel: [ 141.831199] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 2 21:54:01 up kernel: [ 141.831200] [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 2 21:54:01 up kernel: [ 141.831256] drm/i915: Resetting chip after gpu hang
Comment 1 Chris Wilson 2017-11-03 10:04:08 UTC
Kicad related hangs #100648 and #103398.
Comment 2 Mark Janes 2017-11-03 15:26:18 UTC
I wasn't able to reproduce KiCad hangs when I looked at this in bug 100648. Elizabeth/Louis: Can you reproduce this using Ian's more specific use case?
Comment 3 Elizabeth 2017-11-03 18:37:04 UTC
Created attachment 135232 [details] dmesg_log_kicad_debian After 15 to 30 minutes on kicad and a fresh debian image I manage to reproduce. Mesa and kernel are the ones that come with debian plus a "sudo apt-get update". I'm going to try to replicate with latest Mesa release. Stay tuned. $ glxinfo | grep "OpenGL version" OpenGL version string: 3.0 Mesa 13.0.6 $ Xorg -Version X.Org X Server 1.19.2 Release Date: 2017-03-02 X Protocol Version 11, Revision 0 Build Operating System: Linux 4.9.0-3-amd64 x86_64 Debian Current Operating System: Linux debian 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz Intel Corporation HD Graphics 520 (rev 07) (prog-if 00 [VGA controller])
Comment 4 Elizabeth 2017-11-03 18:38:02 UTC
(In reply to Elizabeth from comment #3) Forget to mention, X got killed, no hang reported on dmesg.
Comment 5 Mark Janes 2017-11-03 19:45:58 UTC
If you are going to use debian to reproduce a bug, please use the "testing" distribution: https://www.debian.org/devel/debian-installer/ It will have a far newer graphics stack.
Comment 6 Elizabeth 2017-11-08 23:02:20 UTC
Created attachment 135328 [details] Kern_log_kicad Reproduced with Mesa 17.2.4 and kernel 4.13.0-1amd64. To easy reproduce on Kicad, download a free demo project (heavy), select objects, check the option "Include items on invisible layers", repeat until display freeze and login window is displayed (between 5 - 15 mins).
Comment 7 Mark Janes 2017-11-08 23:39:32 UTC
Thanks Elizabeth! Can you provide a url for the heavy kicad demo project that you used?
Comment 8 Ian Eure 2017-11-09 02:16:36 UTC
The fastest way to reproduce this is to open a fairly complex project, select several components, and start dragging them. I don't have a project to share, but on the one I'm currently working on, this will produce a crash almost instantly.
Comment 9 Ian Eure 2017-11-09 02:22:50 UTC
Hmm, I tried another project and couldn't repro quickly. I'm willing to share my project privately if that will help, it seems to cause crashes very quickly.
Comment 10 Elizabeth 2017-11-09 15:27:42 UTC
(In reply to Mark Janes from comment #7) > Thanks Elizabeth! > > Can you provide a url for the heavy kicad demo project that you used? Here are a lot of projects, I used the first that I found: http://kicad-pcb.org/made-with-kicad/
Comment 11 Matt Turner 2017-11-10 23:36:08 UTC
*** Bug 99986 has been marked as a duplicate of this bug. ***
Comment 12 Matt Turner 2017-11-10 23:36:29 UTC
*** Bug 103373 has been marked as a duplicate of this bug. ***
Comment 13 Matt Turner 2017-11-10 23:36:34 UTC
*** Bug 103398 has been marked as a duplicate of this bug. ***
Comment 14 Matt Turner 2017-11-10 23:36:45 UTC
*** Bug 100648 has been marked as a duplicate of this bug. ***
Comment 15 Matt Turner 2017-11-10 23:41:44 UTC
(In reply to Elizabeth from comment #10) > (In reply to Mark Janes from comment #7) > > Thanks Elizabeth! > > > > Can you provide a url for the heavy kicad demo project that you used? > Here are a lot of projects, I used the first that I found: > http://kicad-pcb.org/made-with-kicad/ Please link to the specific project. I assume that the first one you see on that page is the same one I see, but who knows. Also, within the first one... there are multiple KiCad projects... Ben (cc'd) and I have tried making an apitrace of KiCad that will reproduce the issue, but we can't get apitrace to make a trace of this application at all. I notice from all the crash dumps from this and the duplicate bugs that it's not actually KiCad that's active when the hang occurs. It's Xorg or Xwayland...
Comment 16 Elizabeth 2017-11-13 23:43:36 UTC
(In reply to Matt Turner from comment #15) > ... Hmmm... can't seems to find tstkicad project... I tried with this other one, I made sure to have saved the right link this time and reproduced the issue: https://github.com/FPGAwars/icezum
Comment 17 Mark Janes 2017-11-14 22:30:35 UTC
I couldn't reproduce this on my kbl debian testing system. I looked at Elizabeth's system, and found that GUC was enabled, which is known to cause GPU instabilities. I can debug further once Elizabeth reproduces this with a stock debian testing installation.
Comment 18 Mark Janes 2017-11-17 22:42:39 UTC
Elizabeth reproduced this on a stock debian SKL system, with linux 4.9 and 4.13. It does not reproduce with SNA, however, there are serious rendering issues when running Kicad with SNA.
Comment 19 Clayton Craft 2017-12-05 00:02:23 UTC
I retested this on 17.3-rc6 using the following attachment, and was *not* able to reproduce the issue after ~15 minutes of moving the schematic around in Eeschema: https://bugs.freedesktop.org/attachment.cgi?id=135390 On 17.2.x, I could reproduce this easily with the above test.sch file and ~10 minutes of moving it around in the Eeschema window. Elizabeth, can you try reproducing this on 17.3-rc6?
Comment 20 Elizabeth 2017-12-05 23:11:07 UTC
(In reply to Clayton Craft from comment #19) > I retested this on 17.3-rc6 using the following attachment, and was *not* > able to reproduce the issue after ~15 minutes of moving the schematic around > in Eeschema: https://bugs.freedesktop.org/attachment.cgi?id=135390 > > On 17.2.x, I could reproduce this easily with the above test.sch file and > ~10 minutes of moving it around in the Eeschema window. > > Elizabeth, can you try reproducing this on 17.3-rc6? Hello Clayton, I installed 17.3-rc6 from https://mesa.freedesktop.org/archive/, and couldn't reproduce the issue. With 17.2.5 I was able to reproduce the hang in less than 5min, and after installing 17.3-rc6 it seems to be fixed.
Comment 21 Brian Starkey 2018-03-25 20:40:47 UTC
Created attachment 138349 [details] GPU hang dump on 4.15.12 / Mesa 17.3.7 I'm still seeing this, dump attached. * Machine: Thinkpad 13 with Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz * uname -a: Linux glados 4.15.12-1-ARCH #1 SMP PREEMPT Wed Mar 21 15:14:56 UTC 2018 x86_64 GNU/Linux * Mesa version: 17.3.7 dmesg said I should open a new bug, but as there's already at least 4 others that seems redundant?
Comment 22 Elizabeth 2018-03-26 15:06:22 UTC
(In reply to Brian Starkey from comment #21) > Created attachment 138349 [details] > GPU hang dump on 4.15.12 / Mesa 17.3.7 > > I'm still seeing this, dump attached. > > * Machine: Thinkpad 13 with Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz > * uname -a: Linux glados 4.15.12-1-ARCH #1 SMP PREEMPT > Wed Mar 21 15:14:56 UTC 2018 x86_64 GNU/Linux > * Mesa version: 17.3.7 > > dmesg said I should open a new bug, but as there's already at least 4 others > that seems redundant? Hello Brian. In this case, this specific hang was already verified to be fixed by 17.3.6 meaning that your issue, even if is produced with the same program, must have a different root-cause or to be a regression. Could you open a new bug with the steps to reproduce, your configuration, the error state and if possible an apitrace of the hang? To prove if this is a regression, you can test with 17.3.6 and 17.3.7, if the first works properly vs the second, then is a regression. Thank you.
Comment 23 Brian Starkey 2018-03-26 21:42:28 UTC
Hi Elizabeth, thanks for the advice. I've been running 17.3.6 for a while now and no crash yet. I haven't found a sure-fire way to reproduce it so it's hard to be sure, but I'll be sure to open a new bug when I figure out if it's a regression or also present in 17.3.6.