Bug 103502

Summary: GPU HANG: ecode 3:0:0x7c9bf89c, in Xorg [781], reason: Hang on rcs0, action: reset
Product: DRI Reporter: taz.007
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: medium CC: intel-gfx-bugs, sitsofe
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: I915GM i915 features: GPU hang
Attachments:
Description Flags
the content of /sys/class/drm/card0/error
none
dmesg none

Description taz.007 2017-10-29 11:21:01 UTC
Created attachment 135151 [details]
the content of /sys/class/drm/card0/error

I'm getting this message regularly since using kernel 4.13 (I did not notice it with kernel 4.12)

After this message I'm seeing 
"drm/i915: Resetting chip after gpu hang"
being printed every 10 seconds. 
The Xorg display is frozen.

steps to reproduce : 
just browse the web (I'm using palemoon browser), it usually happens when clicking on links/opening new tab with said links.

kernel version 4.13.8-1-ARCH
distrib : arch linux
old laptop : Hardware name: Acer, inc. Aspire 1640Z    /Lugano3         , BIOS 3A24 10/30/06
VGA connector, however there were no physical screen attached to it (I use the box via x11vnc)
Comment 1 taz.007 2017-10-29 11:22:01 UTC
Created attachment 135152 [details]
dmesg

some other errors that are present while booting.
Comment 2 Chris Wilson 2017-10-30 10:09:54 UTC
Hmm, tiling alarm bells for i915gm:

Active (rcs0) [14]:
    00000000_0050b000    57344 3e 02 [ 36f00 00 00 00 00 ] 00 Y dirty uncache
...
0x00507034:      0x7d8e0001: 3DSTATE_BUFFER_INFO
0x00507038:      0x03600800:    color, tiling = Y, pitch=2048
0x0050703c:      0x0050b000:    address

Unease as we tend to assume these implicit fencing require the "unfenced" alignment. Likely we reused an offset and failed to notice a change in alignment constraints. Something like:

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3d7190764f10..7f53b4860428 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -331,6 +331,10 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
        if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment))
                return true;
 
+       if (flags & __EXEC_OBJECT_NEEDS_MAP &&
+           !IS_ALIGNED(vma->node.start, vma->fence_alignment))
+               return true;
+
        if (flags & EXEC_OBJECT_PINNED &&
            vma->node.start != entry->offset)
                return true;
Comment 3 Chris Wilson 2017-11-01 13:50:52 UTC
commit 1d033beb20d6d5885587a02a393b6598d766a382
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 31 10:36:07 2017 +0000

    drm/i915: Check incoming alignment for unfenced buffers (on i915gm)
    
    In case the object has changed tiling between calls to execbuf, we need
    to check if the existing offset inside the GTT matches the new tiling
    constraint. We even need to do this for "unfenced" tiled objects, where
    the 3D commands use an implied fence and so the object still needs to
    match the physical fence restrictions on alignment (only required for
    gen2 and early gen3).
    
    In commit 2889caa92321 ("drm/i915: Eliminate lots of iterations over
    the execobjects array"), the idea was to remove the second guessing and
    only set the NEEDS_MAP flag when required. However, the entire check
    for an unusable offset for fencing was removed and not just the
    secondary check. I.e.
    
            /* avoid costly ping-pong once a batch bo ended up non-mappable */
            if (entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
                !i915_vma_is_map_and_fenceable(vma))
                    return !only_mappable_for_reloc(entry->flags);
    
    was entirely removed as the ping-pong between execbuf passes was fixed,
    but its primary purpose in forcing unaligned unfenced access to be
    rebound was forgotten.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103502
    Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171031103607.17836-1-chris@chris-wilson.co.uk
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Comment 4 Chris Wilson 2017-11-13 19:08:49 UTC
*** Bug 103723 has been marked as a duplicate of this bug. ***
Comment 5 Luka Paunovic 2017-12-02 11:48:40 UTC
(In reply to Chris Wilson from comment #4)
> *** Bug 103723 has been marked as a duplicate of this bug. ***

How do I implement this FIX on Ubuntu Artful Aardvark?
Comment 6 Luka Paunovic 2017-12-02 11:49:47 UTC
(In reply to Chris Wilson from comment #3)
> commit 1d033beb20d6d5885587a02a393b6598d766a382
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Oct 31 10:36:07 2017 +0000
> 
>     drm/i915: Check incoming alignment for unfenced buffers (on i915gm)
>     
>     In case the object has changed tiling between calls to execbuf, we need
>     to check if the existing offset inside the GTT matches the new tiling
>     constraint. We even need to do this for "unfenced" tiled objects, where
>     the 3D commands use an implied fence and so the object still needs to
>     match the physical fence restrictions on alignment (only required for
>     gen2 and early gen3).
>     
>     In commit 2889caa92321 ("drm/i915: Eliminate lots of iterations over
>     the execobjects array"), the idea was to remove the second guessing and
>     only set the NEEDS_MAP flag when required. However, the entire check
>     for an unusable offset for fencing was removed and not just the
>     secondary check. I.e.
>     
>             /* avoid costly ping-pong once a batch bo ended up non-mappable
> */
>             if (entry->flags & __EXEC_OBJECT_NEEDS_MAP &&
>                 !i915_vma_is_map_and_fenceable(vma))
>                     return !only_mappable_for_reloc(entry->flags);
>     
>     was entirely removed as the ping-pong between execbuf passes was fixed,
>     but its primary purpose in forcing unaligned unfenced access to be
>     rebound was forgotten.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103502
>     Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the
> execobjects array")
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20171031103607.17836-1-
> chris@chris-wilson.co.uk
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

This is the FIX I was asking about how to apply :D 
I clicked the wrong reply button.
Comment 7 Elizabeth 2018-01-04 18:41:03 UTC
Latest tip or stable should have it upstream already.
https://cgit.freedesktop.org/drm-tip
https://www.kernel.org
Comment 8 Elizabeth 2018-03-02 16:00:16 UTC
Closing issue. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.