Bug 104324

Summary: hl2_linux: Resetting rcs0 after gpu hang
Product: DRI Reporter: Horst Schirmeier <horst>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: alexandre.nunes, intel-gfx-bugs
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
/sys/class/drm/card0/error none

Description Horst Schirmeier 2017-12-18 17:39:19 UTC
drm/i915 crashes quite often here, today during Half-Life 2's intro:

[171527.464595] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[172042.110000] [drm] GPU HANG: ecode 9:0:0x85dffffb, in hl2_linux [9283], reason: Hang on rcs0, action: reset
[172042.110001] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[172042.110002] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[172042.110003] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[172042.110003] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[172042.110004] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[172042.110011] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[172057.062564] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[172067.074512] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[172075.074481] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[172083.074456] i915 0000:00:02.0: Resetting rcs0 after gpu hang
[172086.086461] MatQueue0[9330]: segfault at 2e4 ip 00000000d7d958a5 sp 00000000c161a6e0 error 4 in client.so[d796f000+af5000]
Comment 1 Horst Schirmeier 2017-12-18 17:42:10 UTC
I currently can't attach /sys/class/drm/card0/error, bugs.freedesktop.org gives me an "Internal Server Error".
Comment 2 Horst Schirmeier 2017-12-18 17:44:47 UTC
Created attachment 136251 [details]
/sys/class/drm/card0/error

It seems bugs.freedesktop.org fails spectacularly if a to-be-uploaded file does not have read permissions for the user doing it.
Comment 3 Chris Wilson 2017-12-18 20:59:24 UTC
https://bugs.freedesktop.org/show_bug.cgi?id=102435#c25

This should be fixed by:

commit ee57b15ec764736e2d5360beaef9fb2045ed0f68
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Wed Nov 29 16:22:42 2017 -0800

    i965: Disable regular fast-clears (CCS_D) on gen9+
    
    This partially reverts commit 3e57e9494c2279580ad6a83ab8c065d01e7e634e
    which caused a bunch of GPU hangs on several Source titles.  To date, we
    have no clue why these hangs are actually happening.  This undoes the
    final effect of 3e57e9494c227 and gets us back to not hanging.  Tested
    with Team Fortress 2.
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102435
    Fixes: 3e57e9494c2279580ad6a83ab8c065d01e7e634e
    Cc: mesa-stable@lists.freedesktop.org

If not, please reopen.  Thanks for the reports and your patience!

*** This bug has been marked as a duplicate of bug 104325 ***
Comment 4 Horst Schirmeier 2017-12-19 07:36:59 UTC

*** This bug has been marked as a duplicate of bug 102435 ***
Comment 5 Horst Schirmeier 2017-12-19 12:12:16 UTC
I can confirm that Mesa 17.3.0.1 (1:17.3.0.1-1~a~padoka0 from https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/pkppa) seems to fix the issue on Ubuntu 17.10 (amd64, Intel i7-6700HQ w/ Intel HD Graphics 530; Lenovo ThinkPad T460p).
Comment 6 Chris Wilson 2017-12-26 16:04:41 UTC
*** Bug 104386 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.