Summary: | [SKL] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1085], reason: Hang on render ring, action: reset | ||
---|---|---|---|
Product: | Mesa | Reporter: | Antoine Aubry <antoine> |
Component: | Drivers/DRI/i965 | Assignee: | Antoine Aubry <antoine> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | medium | CC: | intel-gfx-bugs, mark.a.janes |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Content of /sys/class/drm/card0/error
Sample kicad project that reproduces the problem |
I experimented with different versions of the kernel. With versions 4.9.0 and 4.9.6 the freeze is followed by every GUI process segfaulting. With kernel 4.8.0, there are still freezes, but after a while the applications become responsive again. Also, when installing a kernel, update-initramfs generates the following warnings: W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_14.bin for module i915 W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver8_7.bin for module i915 I don't know if they are related to the freezes, though. The guc is not needed for current hardware -- you can ignore that error. Can you try with latest version of kernel from drm-tip (https://cgit.freedesktop.org/drm/drm-tip/), xf86-video-intel (https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/) and latest Mesa (https://mesa.freedesktop.org/archive/). Moreover, if this is still occurring, to confirm what is causing it, can you also try with modesetting driver (https://cgit.freedesktop.org/xorg/driver/xf86-video-modesetting) rather than Intel DDX (ie xf86-video-intel) and let us know the status? Provide your config and change current status to: - RESOLVED/* if you cannot reproduce. - REOPENED otherwise; attach fresh gpu error dump, kernel log & xorg log * Details: - Kernel: 4.9.0-040900-generic - Platform: Skylake (PCI ID: 0x1916, PCI Revision: 0x07, PCI Subsystem: 103c:81a1) - Mesa : [Please confirm your version] - xf86-video-intel : [Please confirm your version] (In reply to yann from comment #4) > Can you try with latest version of kernel from drm-tip > (https://cgit.freedesktop.org/drm/drm-tip/), xf86-video-intel > (https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/) and latest Mesa > (https://mesa.freedesktop.org/archive/). > > Moreover, if this is still occurring, to confirm what is causing it, can you > also try with modesetting driver Note that this was occurring with -modesetting. (In reply to Chris Wilson from comment #5) > (In reply to yann from comment #4) > > Can you try with latest version of kernel from drm-tip > > (https://cgit.freedesktop.org/drm/drm-tip/), xf86-video-intel > > (https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/) and latest Mesa > > (https://mesa.freedesktop.org/archive/). > > > > Moreover, if this is still occurring, to confirm what is causing it, can you > > also try with modesetting driver > > Note that this was occurring with -modesetting. thanks Chris. just for my knowledge/curiosity, is it because this is default Ubuntu 16.10 configuration or is this info available else where here? It is the type of batch buffer in the error state. (In reply to Chris Wilson from comment #7) > It is the type of batch buffer in the error state. thank so much :) (In reply to yann from comment #4) > Can you try with latest version of kernel from drm-tip > (https://cgit.freedesktop.org/drm/drm-tip/), xf86-video-intel > (https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/) and latest Mesa > (https://mesa.freedesktop.org/archive/). > > Moreover, if this is still occurring, to confirm what is causing it, can you > also try with modesetting driver > (https://cgit.freedesktop.org/xorg/driver/xf86-video-modesetting) rather > than Intel DDX (ie xf86-video-intel) and let us know the status? > > Provide your config and change current status to: > - RESOLVED/* if you cannot reproduce. > - REOPENED otherwise; attach fresh gpu error dump, kernel log & xorg log > > > * Details: > - Kernel: 4.9.0-040900-generic > - Platform: Skylake (PCI ID: 0x1916, PCI Revision: 0x07, PCI Subsystem: > 103c:81a1) > - Mesa : [Please confirm your version] > - xf86-video-intel : [Please confirm your version] I don't mind trying with the latest versions of the components that you mentioned, but I need some help with that. I am a developer, so I an not completely clueless, but I know nothing about installing drivers that are not already packaged :) However, I am willing to learn. Can you indicate where I can find information on how to compile and install these components? Thanks So at this stage Antoine, the issue you are facing is probably linked to Mesa. So you may start to update first your mesa version to the lastest: check http://www.mesa3d.org/download.html (In reply to yann from comment #10) > So at this stage Antoine, the issue you are facing is probably linked to > Mesa. So you may start to update first your mesa version to the lastest: > check http://www.mesa3d.org/download.html I'll try that and report back. Thanks So, I tried compiling mesa and installing it, but I think that I used the wrong prefix on the configure script, because ldconfig still reported that it would resolve the libraries from a different location than /usr/local/lib. Then I tried to install them using a ppa that supposedly offers the latest versions. After that, my desktop environment (unity) ceased to work. I fiddled a lot, reverted the packages from the ppa and uninstalled the libraries that I had compiled myself, but it was still broken. In the end, I reinstalled the OS :S The good news are that since I did a fresh install, I won't loose anything more if I need to reinstall again :) So, before I try again, I may need more information on how to install the libraries. On my setup - Ubuntu 16.10 - each of the libraries seems to be in a different directory. What arguments should I pass to ./configure to ensure that it installs properly ? Also, after installing, how can I confirm that the libraries that are in use are indeed the ones that I compiled ? Thanks (In reply to Antoine Aubry from comment #12) > So, I tried compiling mesa and installing it, but I think that I used the > wrong prefix on the configure script, because ldconfig still reported that > it would resolve the libraries from a different location than /usr/local/lib. > Then I tried to install them using a ppa that supposedly offers the latest > versions. After that, my desktop environment (unity) ceased to work. I > fiddled a lot, reverted the packages from the ppa and uninstalled the > libraries that I had compiled myself, but it was still broken. In the end, I > reinstalled the OS :S > > The good news are that since I did a fresh install, I won't loose anything > more if I need to reinstall again :) > > So, before I try again, I may need more information on how to install the > libraries. On my setup - Ubuntu 16.10 - each of the libraries seems to be in > a different directory. What arguments should I pass to ./configure to ensure > that it installs properly ? Please start with http://www.mesa3d.org/autoconf.html & https://01.org/linuxgraphics/documentation/build-guide-0 > Also, after installing, how can I confirm that the libraries that are in use > are indeed the ones that I compiled ? > > Thanks v here check glxinfo ;) https://dri.freedesktop.org/wiki/glxinfo/ Thank you for trying the latest mesa. I'm sorry that it was so painful for you. You can use mesa in the installed path (/usr/local/lib) by setting environment variables before launching the app from the command line: $ export LD_LIBRARY_PATH=/usr/local/lib $ export LIBGL_DRIVERS_PATH=/usr/local/lib/dri $ pcbnew This application works fine for me on debian testing with mesa's tip and linux 4.9. Is there some activity in the app that provokes your hang? Thanks for the information. I will try that. The hangs seem to happen when some part of the schematic is repainted. They tend to happen more while scrolling the view. Their frequency also seems to increase with time. After a reboot, it usually takes 5 to 10 minutes for the screen to freeze. But after a few freezes, they start to become much more frequent. Maybe it is due to some kind of resource leak. If you can provide a sample schematic, I can try to scroll around and make it happen. Created attachment 129284 [details]
Sample kicad project that reproduces the problem
Includes the entire kicad project. The actual issue occurs when opening mainboard.kicad_pcb in pcbnew. The fastest way to cause a freeze is to zoom in and out repeatedly using the mouse wheel. After a few tens of zooms, I get freezes consistently.
I forgot to say in my previous comment that I found that zooming in and out is even more likely to cause the freeze. I attempted again to build mesa 13.0.3, which I believe is the latest version. After installing the library with make install, I tried running pcbnew as instructed: $ export LD_LIBRARY_PATH=/usr/local/lib $ export LIBGL_DRIVERS_PATH=/usr/local/lib/dri $ pcbnew I observed no changes, the GPU still hangs. Btw, that this made my system unusable again. After logging off, the desktop manager would no longer start, just as before. I attempted to uninstall with "make uninstall", but that did not help. I has to reinstall the OS again to fix it. This is certainly an easily fixable problem, but I could not find out what was the problem. I opened your file on my sklgt2, and was able to scroll/zoom for a *long* time with no gpu hangs. I'm using debian testing with linux 4.9, modesetting, mesa 13. There seems to be something else going on with your system, because installing mesa to /usr/local/ can not disable your system. Encountering a GPU Hang can sometimes generate unrecoverable errors that require a reboot, but you should never need to reinstall. It's troubling that this wouldn't work properly on Ubuntu with the oibaf ppa. It would be good for someone else who uses Ubuntu can reproduce this with your kicad project. Humm, I did not use the ppa that you mentioned. It was someone else's and maybe it was broken. I'll try with that one and report. Thanks for testing with my files. In the configuration that you mention, you refer to "modesetting". What does this mean ? There are a few ways to accelerate 2D graphics. For Intel hardware, you can use: - SNA: xf86-video-intel, which accelerates the 2D api for Intel GPUs - Glamor/modesetting: implements the 2D api in OpenGL Different distributions choose different defaults. A GPU hang caused by Xorg can be caused by SNA or Mesa's implementation of the GL commands sent by Glamor. Some Mesa GPU hangs have been recently fixed by: 180653c357d19ca88f7895f59874a58fac99cc53 Author: Topi Pohjolainen <topi.pohjolainen@intel.com> i965/blorp: Make post draw flush more explicit Some SNA GPU hangs have been recently fixed by: 4acd4a7d3d2f41227022fa7581cfb85a0b124eae author Chris Wilson <chris@chris-wilson.co.uk> sna/gen9: Emit a dummy primitive between VertexElements Make sure you use versions that have those patches. For information on how to switch between sna and modesetting, see https://bugs.freedesktop.org/show_bug.cgi?id=98999 So I installed the drivers from the oibaf ppa, and since then I ceased experiencing hangs! Thank you so much for the patience and helpfuless. I really appreciate the time that you all took to explain the concepts to me. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 129205 [details] Content of /sys/class/drm/card0/error When using KiCad's Pcbnew program, after a while the screen freezes for a few seconds, then the window manager restarts. The output of dmesg instructs to file a new bug, so here it is. The output of dmesg shows the following: [ 2359.835457] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1085], reason: Hang on render ring, action: reset [ 2359.835461] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 2359.835463] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 2359.835465] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 2359.835467] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 2359.835469] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 2359.835576] drm/i915: Resetting chip after gpu hang [ 2359.835665] [drm] RC6 on [ 2359.855202] [drm] GuC firmware load skipped [ 2371.867120] drm/i915: Resetting chip after gpu hang [ 2371.867214] [drm] RC6 on [ 2371.883179] [drm] GuC firmware load skipped I am using Ubuntu 16.10 with kernel 4.9. This is the output of "uname -a": Linux spectre 4.9.0-040900-generic #201612111631 SMP Sun Dec 11 21:33:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux I have attached the GPU crash dump from /sys/class/drm/card0/error, and here is the output of lspci: 00:00.0 Host bridge: Intel Corporation Skylake Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07) 00:13.0 Non-VGA unclassified device: Intel Corporation Device 9d35 (rev 21) 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) 00:1c.1 PCI bridge: Intel Corporation Device 9d11 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Multimedia audio controller: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21) 01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01) 02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 61) 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01) Please let me know what other information I can provide to help with this.