Bug 103901 - [drm] GPU HANG: ecode 2:0:0x4005ffc1, in Xorg [553], reason: Hang on rcs0, action: reset
Summary: [drm] GPU HANG: ecode 2:0:0x4005ffc1, in Xorg [553], reason: Hang on rcs0, ac...
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-25 08:25 UTC by rtentser
Modified: 2019-02-06 14:02 UTC (History)
5 users (show)

See Also:
i915 platform: I865G
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (13.48 KB, text/plain)
2017-11-25 08:25 UTC, rtentser
no flags Details
dmesg (42.07 KB, text/plain)
2017-11-25 08:25 UTC, rtentser
no flags Details
error (random) #2 (16.77 KB, text/plain)
2017-11-25 17:42 UTC, rtentser
no flags Details
error (sna) (10.71 KB, text/plain)
2017-11-25 17:43 UTC, rtentser
no flags Details
error (uxa) (15.45 KB, text/plain)
2017-11-25 17:44 UTC, rtentser
no flags Details
error (sna) 2 (11.41 KB, text/plain)
2017-11-25 17:45 UTC, rtentser
no flags Details
content of /sys/class/drm/card0/error (27.22 KB, text/plain)
2017-12-01 00:01 UTC, Lonni J Friedman
no flags Details
Debian Stretch, wine, error (696.18 KB, text/plain)
2017-12-03 13:26 UTC, rtentser
no flags Details
4.19, dmesg (138.99 KB, text/plain)
2018-10-26 12:26 UTC, rtentser
no flags Details
4.19, error (8.46 KB, text/plain)
2018-10-26 12:31 UTC, rtentser
no flags Details
4.20, dmesg (157.33 KB, text/plain)
2018-12-25 16:36 UTC, rtentser
no flags Details
4.20, error (9.49 KB, text/plain)
2018-12-25 16:36 UTC, rtentser
no flags Details

Description rtentser 2017-11-25 08:25:09 UTC
Created attachment 135709 [details]
/sys/class/drm/card0/error

From dmesg:
[  156.971130] [drm] GPU HANG: ecode 2:0:0x4005ffc1, in Xorg [553], reason: Hang on rcs0, action: reset
[  156.971140] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  156.971142] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  156.971143] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  156.971144] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  156.971145] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  156.971257] drm/i915: Resetting chip after gpu hang

Most often this bug happens when i start game in wine but sometimes it just happen.
Comment 1 rtentser 2017-11-25 08:25:38 UTC
Created attachment 135710 [details]
dmesg
Comment 2 Chris Wilson 2017-11-25 09:28:50 UTC
Oh dear. The batch is missing half of its cachelines. It will have been written using pwrite, but the other possibility is a stray GPU or GTT write.

If you have a good way of reproducing (say wine), then narrowing it down to a change in component would be very useful, i.e. does the problem go away if you downgrade the kernel. If you have the patient, a git bisect would be a massive help.
Comment 3 rtentser 2017-11-25 11:06:53 UTC
>> i.e. does the problem go away if you downgrade the kernel
On 4.9.51-1 of Debian Stretch the problem still occurs with different description (i'm on Sid now so i can't send you the crash dump).

Also, changing AccelMethod to "UXA" in xorg.conf didn't fix the problem.

When wine crashes it tell "intel_do_flush_locked failed: Input/output error".
Comment 4 Chris Wilson 2017-11-25 11:12:45 UTC
Ok, that at least rules out the execbuf changes in 4.13 as being the root cause. What does the UXA error state look like? UXA and SNA are sufficiently different that if the error looks the same (every other cacheline being zero), that suggests a third party (mesa) is trashing memory.
Comment 5 rtentser 2017-11-25 17:42:22 UTC
Created attachment 135711 [details]
error (random) #2

New random hang (first error in attachment was random too).
Comment 6 rtentser 2017-11-25 17:43:51 UTC
Created attachment 135712 [details]
error (sna)

Hang with wine, sna (?) (no xorg.conf in /etc/X11).
Comment 7 rtentser 2017-11-25 17:44:43 UTC
Created attachment 135713 [details]
error (uxa)

Hang with wine, uxa.
Comment 8 rtentser 2017-11-25 17:45:50 UTC
Created attachment 135714 [details]
error (sna) 2

Hang with wine, sna (Option "AccelMethod" "sna" in xorg.conf).
Comment 9 Lonni J Friedman 2017-12-01 00:00:27 UTC
FWIW, I think I'm seeing this same bug on a Macbook running Fedora25-x86_64:

[Wed Nov 29 07:03:06 2017] [drm] GPU HANG: ecode 8:0:0x84d77c1c, in Xorg [827], reason: No progress on rcs0, action: reset
[Wed Nov 29 07:03:06 2017] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[Wed Nov 29 07:03:06 2017] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[Wed Nov 29 07:03:06 2017] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[Wed Nov 29 07:03:06 2017] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[Wed Nov 29 07:03:06 2017] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[Wed Nov 29 07:03:06 2017] drm/i915: Resetting chip after gpu hang


I'm attaching /sys/class/drm/card0/error content too.  I have no clue what caused this as I wasn't doing anything specific at the time (not running wine, or playing games, etc).
Comment 10 Lonni J Friedman 2017-12-01 00:01:12 UTC
Created attachment 135848 [details]
content of /sys/class/drm/card0/error
Comment 11 rtentser 2017-12-03 13:26:43 UTC
Created attachment 135903 [details]
Debian Stretch, wine, error
Comment 12 Jani Saarinen 2018-03-29 07:10:24 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 13 Jani Saarinen 2018-04-25 10:20:23 UTC
Closing due to inactivity, please re-open is issue still exists.
Comment 14 rtentser 2018-10-23 05:38:30 UTC
Random hangs were fixed with 4.16, i think it was some kinds of regressions.
But the bug with wine still existed last time i checked with 4.17. The internet saids it can be a regression too. Similar problem was fixed in 3.9.4, but i try this kernel and the trouble still exist.

This week (or may be next) i'm going to recheck problem with 4.18 and newer mesa and if it still exist report it to wine and ask for help with finding what exactly triggers the gpu hang.
Comment 15 rtentser 2018-10-23 05:44:01 UTC
Also, my motherboard is https://www.asrock.com/mb/Intel/P4i65GV/index.asp. Don't know why i didn't write it earlier.
Comment 16 Lakshmi 2018-10-23 08:41:45 UTC
Please try to reproduce the issue with latest stable kernel (4.19)
If problem exists, set kernel parameters drm.debug=0x1e log_buf_len=4M and  reboot.
Try to reproduce the issue and attach the dmesg log and /sys/class/drm/card0/error.

This way we see more information about the bug.
Comment 17 rtentser 2018-10-26 12:26:11 UTC
I've checked with 4.19. The bug is still here. dmesg & error in attachments.
Comment 18 rtentser 2018-10-26 12:26:44 UTC
Created attachment 142217 [details]
4.19, dmesg
Comment 19 rtentser 2018-10-26 12:31:23 UTC
Created attachment 142218 [details]
4.19, error
Comment 20 rtentser 2018-10-31 12:57:46 UTC
I reported the bug to wine: https://bugs.winehq.org/show_bug.cgi?id=46065.
There can be some extra information.
Comment 21 Chris Wilson 2018-10-31 14:03:56 UTC
You do appreciate that this isn't the same bug? We are now looking at a bug in the command stream as submitted by mesa; the command stream itself looks intact.
Comment 22 rtentser 2018-10-31 18:01:10 UTC
I'm not sure that i understand. The only way i can reproduce this bug is using wine, so i hope if you fix gpu hang then wine will work. If not it'll be another problem.
Comment 23 rtentser 2018-12-25 16:36:27 UTC
Created attachment 142892 [details]
4.20, dmesg
Comment 24 rtentser 2018-12-25 16:36:56 UTC
Created attachment 142893 [details]
4.20, error
Comment 25 Lakshmi 2019-02-06 14:02:02 UTC
Error logs attached in this bug indicates GPU hang for different reasons. If this issue is seen again please create a issue under Mesa product Drivers/DRI/i915.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.