Bug 110969 - i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in Xwayland [10802], hang on rcs0
Summary: i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in Xwayland [10802], hang ...
Status: RESOLVED DUPLICATE of bug 111014
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-22 13:11 UTC by apreiml
Modified: 2019-07-02 19:22 UTC (History)
1 user (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
Error log from /sys/class/drm/card0/error (29.89 KB, text/plain)
2019-06-22 13:11 UTC, apreiml
no flags Details

Description apreiml 2019-06-22 13:11:30 UTC
Created attachment 144612 [details]
Error log from /sys/class/drm/card0/error

Frequently getting GPU Hangs while on high load and mem usage. I'm using sway as window manager and I ran PHP-Storm using Xwayland.
Comment 1 Chris Wilson 2019-06-22 15:51:02 UTC
rcs0 command stream:
  IDLE?: no
  START: 0x00001000
  HEAD:  0x30813250 [0x00013228]
  TAIL:  0x000134c8 [0x000132e0, 0x00013308]
  CTL:   0x0001f001
  MODE:  0x00004000
  HWS:   0x7fffe000
  ACTHD: 0x00000000 30813250
  IPEIR: 0x00000000
  IPEHR: 0x79120000
  INSTDONE: 0xffffbff8
  SC_INSTDONE: 0xfffffffe
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  batch: [0x00000000_7fff8000, 0x00000000_7fff9000]
  BBADDR: 0x00000000_7fff1740
  BB_STATE: 0x00000000
  INSTPS: 0x00000501
  INSTPM: 0x00000080
  FADDR: 0x00000000 00014250
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  SYNC_0: 0x00000000
  SYNC_1: 0x00000000
  GFX_MODE: 0x00002a00
  PP_DIR_BASE: 0x7f9f0000
...
0x0001422c:      0x00002220:    dword 1
0x00014230:      0xffffffff:    dword 2
0x00014234:      0x11000001: MI_LOAD_REGISTER_IMM
0x00014238:      0x00002228:    dword 1
0x0001423c:      0x7f9f0000:    dword 2
0x00014240:      0x04000000: MI_ARB_ON_OFF
0x00014244:      0x00000000: MI_NOOP
0x00014248:      0x0c000000: MI_SET_CONTEXT
0x0001424c:      0x7fff010c:    gtt offset = 0x7fff0000
0x00014250:      0x00000000: MI_NOOP
0x00014254:      0x04000001: MI_ARB_ON_OFF
0x00014258:      0x7a000002: PIPE_CONTROL
0x0001425c:      0x00100002:    no write, cs stall, stall at scoreboard, 
0x00014260:      0x00000000:    
0x00014264:      0x00000000:    

Suggests it died in the context restore.

If there is a link with mempressure, we might not keeping the context image intact over paging.
Comment 2 Denis 2019-06-26 09:42:08 UTC
Hi Chris, is there any chance that this https://bugs.freedesktop.org/show_bug.cgi?id=110860 and https://bugs.freedesktop.org/show_bug.cgi?id=110858 this, for example, issues have the same root cause with current one?
I based on HW information (all IVB) and error codes
Comment 3 Chris Wilson 2019-07-02 08:52:21 UTC
(In reply to Denis from comment #2)
> Hi Chris, is there any chance that this
> https://bugs.freedesktop.org/show_bug.cgi?id=110860 and
> https://bugs.freedesktop.org/show_bug.cgi?id=110858 this, for example,
> issues have the same root cause with current one?
> I based on HW information (all IVB) and error codes

I am working under that assumption, yes.
Comment 4 Paul 2019-07-02 10:12:16 UTC
Hi Chris 
I've seen the same error code in Xorg, kwin_x11 and Chromium on my IVB (Intel® HD Graphics 2500) with Fedora 29 and the kernel 5.1.7
I've reproduced the issue after I had reduced RAM to 4 GB and load it for ~60% by replaying videos in a few windows of Chromium)
According to this ticket (https://bugs.freedesktop.org/show_bug.cgi?id=111014) - the issue with hangs was fixed on 5.2.0-rc6 version of kernel.
I've updated the kernel to this version, but now Fedora doesn't see network card(it's not integrated) on it. I try to fix this issue.
Will give an update after it resolving.
P.S.: if you have any idea why network card doesn't work - please tell me :)
Comment 5 Chris Wilson 2019-07-02 10:18:18 UTC
The error codes are meaningless, I am afraid. While they were put in to help catagorise bug reports, I think they are too misleading.

The hint from bug 111014 is that this is fixed in drm-tip, so if you have are hitting it often that would be a good place to start.
Comment 6 Chris Wilson 2019-07-02 19:22:36 UTC
Please test with

commit c84c9029d782a3a0d2a7f0522ecb907314d43e2c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Apr 19 12:17:47 2019 +0100

    drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context

heading to v5.1 via stable in the next few weeks.

*** This bug has been marked as a duplicate of bug 111014 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.