Created attachment 136441 [details] dmesg detailing hang The GPU hang is reliably reproducible with mesa 17.3.0 and 17.3.1 by visiting Google Maps in Firefox. Downgrading mesa to version 17.2.7 fixes the issue.
Created attachment 136442 [details] GPU crash dump
Thanks for your time Axel, Could you help us to bisect the issue to find the culprit commit? Meanwhile, I'll try to reproduce by my side.
(In reply to Elizabeth from comment #2) > Thanks for your time Axel, > Could you help us to bisect the issue to find the culprit commit? > Meanwhile, I'll try to reproduce by my side. Certainly, I am glad to help. I will start bisecting within the next few days.
I just finished the git bisect. This issue was introduced with commit ea0d2e98ecb369ab84e78c84709c0930ea8c293a. Unfortunately, I do not know enough about mesa or the Intel drivers to further debug the problem and create a patch myself. However, please let me know if I can provide additional information that would be helpful.
Interesting, thank you for bisecting! So far, I'm not able to reproduce this :( Does it work if you run with INTEL_DEBUG=norbc ?
(In reply to Kenneth Graunke from comment #5) > Interesting, thank you for bisecting! So far, I'm not able to reproduce > this :( > > Does it work if you run with INTEL_DEBUG=norbc ? Thank you for working on this. Yes, when setting INTEL_DEBUG=norbc it works without problems.
I very likely have the same bug on my Thinkpad T470 (curiously, Axel Fischer is using a Thinkpad T460 model) and can confirm mostly everything what has already been said. - The GPU hang occurs reliably when using Mesa 17.3.0 and 17.3.1. Downgrading to 17.2.6 on Arch fixed the issue for me as well. - Everything runs without any problems as soon as INTEL_DEBUG=norbc is set. However, for me the GPU hang always occurs already a bunch of seconds after starting Xorg. Currently, I have not yet made the effort to do a git bisect. I append my dmesg log und and GPU crash dump in the hope to give further hints. If I can help further, just tell me and I will try to do so.
Created attachment 136523 [details] dmesg gpu hang after a fresh boot
Created attachment 136524 [details] GPU crash dump
Elizabeth, since Ken can't reproduce this, can you please take a look? You might be able to discover the factors that trigger the hang.
I would say this also affects my Dell XPS Precision 9550. Working with Eclipse Kepler causes a hang in a very predictable manner. Using the INTEL_DEBUG=norbc works without hangs and I am downgrading also to 17.2.5 (Debian Testing packets) ( WIP ) Details about the processor: vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz stepping : 3 microcode : 0xba cpu MHz : 2600.000 cache size : 6144 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp bugs : bogomips : 5184.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
Hello everybody, Sorry for the delay. I tried to reproduce on a SKL laptop without success. I installed Gentoo, xorg, default kernel, kde, firefox and mesa. I went to firefox and maps, there I used map view, satellite view, street view, and the hang didn't happen. I'm using: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz Gentoo Base System release 2.4.1 Firefox ESR 52.5.2 (64-bit) X.Org X Server 1.19.5 plasmashell 5.10.5 Qt 5.7.1 KDE Frameworks 5.40.0 kf5-config 1.0 Memory 8Gb I tried default kernel and mesa, kernel-genkernel-x86_64-4.9.72-gentoo Mesa 17.2.7 and Mesa 17.3.1 And latest drm-tip and mesa, Linux gentoo 4.15.0-rc8+ #4 SMP Wed Jan 17 15:16:57 CST 2018 x86_64 Mesa 17.3.1 and Mesa 17.4.0-devel What desktop environment are you using? Did firefox return any crash log when the hang occurs? Are you using any special capability on maps? Are you logged in? How long is needed until the device hang? Stephan, Alejandro, in your case how can reliable produce the hang? Thank you.
Hi Elizabeth, I am on Gentoo's testing branch. I am only using a window manager (herbstluftwm 0.7.0) but no desktop environment. The issue is reproducible with: Firefox 57.0.4 built against GTK 2.24.31 X.Org X server 1.19.6 Core i7-6600U 24 GiB memory The device hang happens immediately when loading Google maps (normal map view). I was logged in when it happened. Firefox did not produce any crash log. In case it is not reproducible for you, I can try to debug it and see if I can figure out what the problem is. I have a lot of experience with C++, however no C and no knowledge of mesa. That might take some time though since I will have to do a lot of reading/learning for that first.
Hi Elizabeth! Thanks for your effort in investigating the issue. For me starting X with my standard configuration always leads to a GPU hang within at most 10 seconds after start up of X. I'm using a Thinkpad T470 with Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz and 32GB of RAM on Arch Linux. I'm currently using linux 4.14.13-1 mesa 17.3.2-2 xorg-server 1.19.6-2 The following information may be useful. As Axel is using herbstluftwm and I do as well, I was curious whether the WM is the issue. And indeed, I get the following interesting result: 1) awesomewm without lemonbar: no crash 2) herbstluftwm with lemonbar: crash 3) herbstluftwm without lemonbar: no crash In fact, I'm not using the standard lemonbar package from Arch, but the package lemonbar-xft-git from AUR which is simply a build of https://github.com/krypt-n/bar In the next days, I will check further combinations and in particular the lemonbar issue to get down to the precise reason for the crash. Sadly, currently I do not have more time to dig deeper. But in the meantime I nevertheless want to share this information.
Axel: please check that this is not a dup of https://bugs.freedesktop.org/show_bug.cgi?id=104214 You should be able to build and test master or the 17.3 release branch, which has the fix ready for the 17.3.1 release.
Created attachment 136865 [details] Simple script which leads to GPU hang when piped into lemonbar-xft I have now further investigated the reason for the GPU crash on my laptop and could construct the following minimal example. Steps to reproduce the crash on my laptop: 1) Install https://github.com/krypt-n/bar which gives you the lemonbar executable 2) ./crash_me.sh | lemonbar -f "Droid Sans" The script pipes lines into lemonbar which then should be shown on the bar. Further observations: 1) the GPU hang occurs after the first update of lemonbar. In the script this is after 1 second (sleep 1). If this is changed to e.g. sleep 30 the crash occurs after 30 seconds. 2) The crash occurs for some XFT fonts, but not for all. I suspect that the crash orginates from XFT font rendering. 3) Changing the echod text to something different results in no further crash. So oddly, this seems to be a rather fine tuned issue. 4) The crash occurs independently of the chosen X window manager. If I can help further, I will try to do so.
Thank you, this is very helpful! I can reproduce the lemonbar GPU hang with your script. (To avoid crashing X while debugging, you can run it in Xephyr -glamor as well...) I've determined that always_flush_cache=true, INTEL_DEBUG=reemit, and always_flush_batch=true all work around the problem. We may be missing a flush somewhere. I'm not sure where. Renaming bug to mention CCS and Lemonbar.
This may help: https://patchwork.freedesktop.org/series/36957/
It does not. :(
*** Bug 104383 has been marked as a duplicate of this bug. ***
patch on list: https://patchwork.freedesktop.org/series/37023/
This is fixed by the following commit in master: commit 20f70ae3858bc213e052a8434f0e637eb36203c4 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Tue Jan 23 23:47:26 2018 -0800 i965/draw: Set NEW_AUX_STATE when draw aux changes Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104411 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104383 Fixes: ea0d2e98ecb369ab84e78c84709c0930ea8c293a Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*** Bug 104462 has been marked as a duplicate of this bug. ***
*** Bug 104689 has been marked as a duplicate of this bug. ***
*** Bug 105184 has been marked as a duplicate of this bug. ***
*** Bug 105314 has been marked as a duplicate of this bug. ***
*** Bug 105195 has been marked as a duplicate of this bug. ***
*** Bug 105315 has been marked as a duplicate of this bug. ***
*** Bug 104394 has been marked as a duplicate of this bug. ***
*** Bug 99867 has been marked as a duplicate of this bug. ***
*** Bug 104991 has been marked as a duplicate of this bug. ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.