Created attachment 142829 [details] dmesg This can be reproduced by running blender 2.80 on fedora 29 stock kernel and drm-tip kernel by opening blender and rotating the cube for a while. The whole desktop freezes for about 10 seconds and then unfreezes. Logs from: 4.20.0-rc6+ x86_64
Created attachment 142830 [details] /sys/class/drm/card0/error
It is a Dell XPS 13 9360
Product is set as Mesa assuming it's a Mesa bug.
Reproduced it with Gnome on Xorg
Hello, We'll also need Mesa's version. And your cpu have UHD Graphics 620.
These logs are from mesa 18.2.6, but I could also reproduce the same bug on 18.3.1.
Created attachment 142878 [details] lscpu
Just realized that this large dmesg is hard to navigate. If booted without drm.debug, the relevant lines are [65454.798345] [drm] GPU HANG: ecode 9:0:0x84dffefc, in blender [31698], reason: hang on rcs0, action: reset [65454.798349] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [65454.798351] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [65454.798352] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [65454.798354] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [65454.798356] [drm] GPU crash dump saved to /sys/class/drm/card0/error [65454.799424] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
I thought to ask apitrace of Blender but apitrace is crashing when replaying the trace at the moment... I cannot reproduce hang with: i7-7500U - HD Graphics 620 Arch Linux Blender 2.8 Mesa 18.2.4/18.3.1/master Kernel 4.19.2 Maybe someone else will be able to reproduce or I'll fix apitrace. Also I didn't look closer at crash dump.
Is a bisect on blender codebase worth trying? This does not occur with 2.79b.
Hi Vladimir, don't think that it may help now. I also tried blender 2.80 on UHD630 GPU, but using 18 ubuntu on X. I didn't reproduce hang also, so it might be related exactly to wayland, will install it tomorrow and check. To be sure that I got steps correctly: 1. launch app 2. Rotate cube with "middle" button of the mouse. (How long in your case?) 3. Hang occurs?
Yes, the steps are correct. The time before the hang occurs varies, but usually is under 10 seconds
Created attachment 142888 [details] how it looks like Recorded under gnome wayland with Peek.
As I already mentioned, this happens on both Xorg and Wayland, even if I run the windows version of blender under wine. The only difference is that on Xorg the mouse cursor still moves, but on Wayland it is frozen too.
Could you record an apitrace of Blender? https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown > apitrace trace blender-2.8 And post trace file. I found that I wasn't able to replay the trace of blender 2.8 due to possibly unnecessary assert without which replay is working. If you have some time you can build apitrace yourself (https://github.com/apitrace/apitrace/blob/master/docs/INSTALL.markdown) and check if replaying it will hang GPU, you would need to comment one line: retrace/glws_glx.cpp:147 > //assert(!pbuffer); Since the steps are simple I doubt the trace will hang on other machines but it still worth checking. Also you had Gnome in all cases, maybe it worth checking with other desktop environment.
Created attachment 142898 [details] apitrace Apitrace of blender. I have built apitrace and checked, and it indeed reproduces the hang. The hang also occurs if I run blender under Xwayland in weston, so I doubt it is related to the DE. I haven't tried on bare X though.
Thanks! The hang is reproducible with this trace: > [ 4796.729606] [drm] GPU HANG: ecode 9:0:0x84dffefc, in glretrace [13521], reason: hang on rcs0, action: reset
I can't reproduce this on SKL, with debian testing: Linux 4.18.0 mesa 18.1.9 I built mesa master, and tested with a drm-tip kernel as well, and couldn't reproduce with the trace file. I tried XWayland and Xorg. Usually, KBL and SKL has similar failure patterns. Danylo, does the hang reproduce every time you retrace the file on KBL?
> Usually, KBL and SKL has similar failure patterns. Danylo, does the hang reproduce every time you retrace the file on KBL? Yes, it always hangs on Kaby Lake and Coffee Lake, I didn't test on other machines.
Bisected to commit: a363bb2cd0e2a141f2c60be005009703bffcbe4e Author: Kenneth Graunke <kenneth@whitecape.org> Date: Tue Apr 10 01:18:25 2018 -0700 i965: Allocate VMA in userspace for full-PPGTT systems. This patch enables soft-pinning of all buffers, allowing us to skip relocation processing entirely. All systems with full PPGTT and > 4GB of VMA should gain these benefits. This should be most Gen8+. Unfortunately, this excludes a few systems: - Cherryview (only has 32-bit addressing, despite 48-bit pointers) - Broadwell with a 32-bit kernel - Anybody running pre-4.5 kernel. We may enable it for Cherryview in the future, but it would require some tweaks to the memory zone. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With this commit and INTEL_DEBUG=reemit there is no hang.
Managed to get some understanding about what's going on here : Since we switch to softpin all buffers, that means the vertex buffers aren't restricted to the low 4Gb region. So we run into the same HW issue as before. In effect softpinning VBOs nullifys the 32bit reloc flag in genX(emit_vertex_buffer_state) (genX_state_upload.c). I'm not quite sure how to fix this apart from disabling softpinning on all buffer objects, because buffers can be reused from one type another (transform feedback output into vertices for instance)...
(In reply to Lionel Landwerlin from comment #22) > Managed to get some understanding about what's going on here : > > Since we switch to softpin all buffers, that means the vertex buffers aren't > restricted to the low 4Gb region. So we run into the same HW issue as before. > In effect softpinning VBOs nullifys the 32bit reloc flag in > genX(emit_vertex_buffer_state) (genX_state_upload.c). > > I'm not quite sure how to fix this apart from disabling softpinning on all > buffer objects, because buffers can be reused from one type another > (transform feedback output into vertices for instance)... We ought to be doing VF cache invalidations when VB[i] or IB transition between different 4GB segments. But, maybe we're not doing those properly. :(
With this MR, I get rid of the hang on my system when replaying the trace : https://gitlab.freedesktop.org/mesa/mesa/merge_requests/62
Should be fixed on master at the following commit : commit 31e4c9ce400341df9b0136419b3b3c73b8c9eb7e Author: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Date: Thu Jan 3 16:18:48 2019 +0000 i965: add CS stall on VF invalidation workaround Thanks for reporting this!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.