Created attachment 135709 [details] /sys/class/drm/card0/error From dmesg: [ 156.971130] [drm] GPU HANG: ecode 2:0:0x4005ffc1, in Xorg [553], reason: Hang on rcs0, action: reset [ 156.971140] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 156.971142] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 156.971143] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 156.971144] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 156.971145] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 156.971257] drm/i915: Resetting chip after gpu hang Most often this bug happens when i start game in wine but sometimes it just happen.
Created attachment 135710 [details] dmesg
Oh dear. The batch is missing half of its cachelines. It will have been written using pwrite, but the other possibility is a stray GPU or GTT write. If you have a good way of reproducing (say wine), then narrowing it down to a change in component would be very useful, i.e. does the problem go away if you downgrade the kernel. If you have the patient, a git bisect would be a massive help.
>> i.e. does the problem go away if you downgrade the kernel On 4.9.51-1 of Debian Stretch the problem still occurs with different description (i'm on Sid now so i can't send you the crash dump). Also, changing AccelMethod to "UXA" in xorg.conf didn't fix the problem. When wine crashes it tell "intel_do_flush_locked failed: Input/output error".
Ok, that at least rules out the execbuf changes in 4.13 as being the root cause. What does the UXA error state look like? UXA and SNA are sufficiently different that if the error looks the same (every other cacheline being zero), that suggests a third party (mesa) is trashing memory.
Created attachment 135711 [details] error (random) #2 New random hang (first error in attachment was random too).
Created attachment 135712 [details] error (sna) Hang with wine, sna (?) (no xorg.conf in /etc/X11).
Created attachment 135713 [details] error (uxa) Hang with wine, uxa.
Created attachment 135714 [details] error (sna) 2 Hang with wine, sna (Option "AccelMethod" "sna" in xorg.conf).
FWIW, I think I'm seeing this same bug on a Macbook running Fedora25-x86_64: [Wed Nov 29 07:03:06 2017] [drm] GPU HANG: ecode 8:0:0x84d77c1c, in Xorg [827], reason: No progress on rcs0, action: reset [Wed Nov 29 07:03:06 2017] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [Wed Nov 29 07:03:06 2017] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [Wed Nov 29 07:03:06 2017] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [Wed Nov 29 07:03:06 2017] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [Wed Nov 29 07:03:06 2017] [drm] GPU crash dump saved to /sys/class/drm/card0/error [Wed Nov 29 07:03:06 2017] drm/i915: Resetting chip after gpu hang I'm attaching /sys/class/drm/card0/error content too. I have no clue what caused this as I wasn't doing anything specific at the time (not running wine, or playing games, etc).
Created attachment 135848 [details] content of /sys/class/drm/card0/error
Created attachment 135903 [details] Debian Stretch, wine, error
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Closing due to inactivity, please re-open is issue still exists.
Random hangs were fixed with 4.16, i think it was some kinds of regressions. But the bug with wine still existed last time i checked with 4.17. The internet saids it can be a regression too. Similar problem was fixed in 3.9.4, but i try this kernel and the trouble still exist. This week (or may be next) i'm going to recheck problem with 4.18 and newer mesa and if it still exist report it to wine and ask for help with finding what exactly triggers the gpu hang.
Also, my motherboard is https://www.asrock.com/mb/Intel/P4i65GV/index.asp. Don't know why i didn't write it earlier.
Please try to reproduce the issue with latest stable kernel (4.19) If problem exists, set kernel parameters drm.debug=0x1e log_buf_len=4M and reboot. Try to reproduce the issue and attach the dmesg log and /sys/class/drm/card0/error. This way we see more information about the bug.
I've checked with 4.19. The bug is still here. dmesg & error in attachments.
Created attachment 142217 [details] 4.19, dmesg
Created attachment 142218 [details] 4.19, error
I reported the bug to wine: https://bugs.winehq.org/show_bug.cgi?id=46065. There can be some extra information.
You do appreciate that this isn't the same bug? We are now looking at a bug in the command stream as submitted by mesa; the command stream itself looks intact.
I'm not sure that i understand. The only way i can reproduce this bug is using wine, so i hope if you fix gpu hang then wine will work. If not it'll be another problem.
Created attachment 142892 [details] 4.20, dmesg
Created attachment 142893 [details] 4.20, error
Error logs attached in this bug indicates GPU hang for different reasons. If this issue is seen again please create a issue under Mesa product Drivers/DRI/i915.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.