Created attachment 135219 [details]
GPU crash dump
KiCad reliably causes GPU hangs which kill my entire X session.
Hovering over the timeline control in VLC often causes a "mini hang," where the system becomes unresponsive for 30s or so, then recovers.
Hardware is a HP Z2 Mini, Intel Graphics, and a 4K display. Software is Debian 9.2 "Stretch", current as of today, 2017-11-02.
I've tried various things like disabling RC6 and using the xf86 driver instead of modesetting, but it continues to crash.
Nov 2 21:54:01 up kernel: [ 141.831194] [drm] GPU HANG: ecode 9:0:0x85dfbfff, in Xwayland , reason: Hang on render ring, action: reset
Nov 2 21:54:01 up kernel: [ 141.831196] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 2 21:54:01 up kernel: [ 141.831197] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 2 21:54:01 up kernel: [ 141.831198] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 2 21:54:01 up kernel: [ 141.831199] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 2 21:54:01 up kernel: [ 141.831200] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 2 21:54:01 up kernel: [ 141.831256] drm/i915: Resetting chip after gpu hang
Kicad related hangs #100648 and #103398.
I wasn't able to reproduce KiCad hangs when I looked at this in bug 100648.
Elizabeth/Louis: Can you reproduce this using Ian's more specific use case?
Created attachment 135232 [details]
After 15 to 30 minutes on kicad and a fresh debian image I manage to reproduce. Mesa and kernel are the ones that come with debian plus a "sudo apt-get update". I'm going to try to replicate with latest Mesa release. Stay tuned.
$ glxinfo | grep "OpenGL version"
OpenGL version string: 3.0 Mesa 13.0.6
$ Xorg -Version
X.Org X Server 1.19.2
Release Date: 2017-03-02
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.9.0-3-amd64 x86_64 Debian
Current Operating System: Linux debian 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64
Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
Intel Corporation HD Graphics 520 (rev 07) (prog-if 00 [VGA controller])
(In reply to Elizabeth from comment #3)
Forget to mention, X got killed, no hang reported on dmesg.
If you are going to use debian to reproduce a bug, please use the "testing" distribution:
It will have a far newer graphics stack.
Created attachment 135328 [details]
Reproduced with Mesa 17.2.4 and kernel 4.13.0-1amd64. To easy reproduce on Kicad, download a free demo project (heavy), select objects, check the option "Include items on invisible layers", repeat until display freeze and login window is displayed (between 5 - 15 mins).
Can you provide a url for the heavy kicad demo project that you used?
The fastest way to reproduce this is to open a fairly complex project, select several components, and start dragging them. I don't have a project to share, but on the one I'm currently working on, this will produce a crash almost instantly.
Hmm, I tried another project and couldn't repro quickly. I'm willing to share my project privately if that will help, it seems to cause crashes very quickly.
(In reply to Mark Janes from comment #7)
> Thanks Elizabeth!
> Can you provide a url for the heavy kicad demo project that you used?
Here are a lot of projects, I used the first that I found:
*** Bug 99986 has been marked as a duplicate of this bug. ***
*** Bug 103373 has been marked as a duplicate of this bug. ***
*** Bug 103398 has been marked as a duplicate of this bug. ***
*** Bug 100648 has been marked as a duplicate of this bug. ***
(In reply to Elizabeth from comment #10)
> (In reply to Mark Janes from comment #7)
> > Thanks Elizabeth!
> > Can you provide a url for the heavy kicad demo project that you used?
> Here are a lot of projects, I used the first that I found:
Please link to the specific project. I assume that the first one you see on that page is the same one I see, but who knows. Also, within the first one... there are multiple KiCad projects...
Ben (cc'd) and I have tried making an apitrace of KiCad that will reproduce the issue, but we can't get apitrace to make a trace of this application at all.
I notice from all the crash dumps from this and the duplicate bugs that it's not actually KiCad that's active when the hang occurs. It's Xorg or Xwayland...
(In reply to Matt Turner from comment #15)
Hmmm... can't seems to find tstkicad project... I tried with this other one, I made sure to have saved the right link this time and reproduced the issue:
I couldn't reproduce this on my kbl debian testing system. I looked at Elizabeth's system, and found that GUC was enabled, which is known to cause GPU instabilities.
I can debug further once Elizabeth reproduces this with a stock debian testing installation.
Elizabeth reproduced this on a stock debian SKL system, with linux 4.9 and 4.13. It does not reproduce with SNA, however, there are serious rendering issues when running Kicad with SNA.
I retested this on 17.3-rc6 using the following attachment, and was *not* able to reproduce the issue after ~15 minutes of moving the schematic around in Eeschema: https://bugs.freedesktop.org/attachment.cgi?id=135390
On 17.2.x, I could reproduce this easily with the above test.sch file and ~10 minutes of moving it around in the Eeschema window.
Elizabeth, can you try reproducing this on 17.3-rc6?
(In reply to Clayton Craft from comment #19)
> I retested this on 17.3-rc6 using the following attachment, and was *not*
> able to reproduce the issue after ~15 minutes of moving the schematic around
> in Eeschema: https://bugs.freedesktop.org/attachment.cgi?id=135390
> On 17.2.x, I could reproduce this easily with the above test.sch file and
> ~10 minutes of moving it around in the Eeschema window.
> Elizabeth, can you try reproducing this on 17.3-rc6?
Hello Clayton, I installed 17.3-rc6 from https://mesa.freedesktop.org/archive/, and couldn't reproduce the issue.
With 17.2.5 I was able to reproduce the hang in less than 5min, and after installing 17.3-rc6 it seems to be fixed.
Created attachment 138349 [details]
GPU hang dump on 4.15.12 / Mesa 17.3.7
I'm still seeing this, dump attached.
* Machine: Thinkpad 13 with Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
* uname -a: Linux glados 4.15.12-1-ARCH #1 SMP PREEMPT
Wed Mar 21 15:14:56 UTC 2018 x86_64 GNU/Linux
* Mesa version: 17.3.7
dmesg said I should open a new bug, but as there's already at least 4 others that seems redundant?
(In reply to Brian Starkey from comment #21)
> Created attachment 138349 [details]
> GPU hang dump on 4.15.12 / Mesa 17.3.7
> I'm still seeing this, dump attached.
> * Machine: Thinkpad 13 with Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
> * uname -a: Linux glados 4.15.12-1-ARCH #1 SMP PREEMPT
> Wed Mar 21 15:14:56 UTC 2018 x86_64 GNU/Linux
> * Mesa version: 17.3.7
> dmesg said I should open a new bug, but as there's already at least 4 others
> that seems redundant?
Hello Brian. In this case, this specific hang was already verified to be fixed by 17.3.6 meaning that your issue, even if is produced with the same program, must have a different root-cause or to be a regression. Could you open a new bug with the steps to reproduce, your configuration, the error state and if possible an apitrace of the hang? To prove if this is a regression, you can test with 17.3.6 and 17.3.7, if the first works properly vs the second, then is a regression. Thank you.
Hi Elizabeth, thanks for the advice. I've been running 17.3.6 for a while now and no crash yet.
I haven't found a sure-fire way to reproduce it so it's hard to be sure, but I'll be sure to open a new bug when I figure out if it's a regression or also present in 17.3.6.