Bug 105591 - [CI] igt@gem_mmap_gtt@forked-* - Failed assertion: page[j] == i + j -
Summary: [CI] igt@gem_mmap_gtt@forked-* - Failed assertion: page[j] == i + j -
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-19 08:36 UTC by Marta Löfstedt
Modified: 2019-01-15 09:25 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2018-03-19 08:36:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-bsw-n3050/igt@gem_mmap_gtt@forked-big-copy-odd.html

(gem_mmap_gtt:1394) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:638:
(gem_mmap_gtt:1394) CRITICAL: Failed assertion: page[j] == i + j
(gem_mmap_gtt:1394) CRITICAL: Last errno: 25, Inappropriate ioctl for device
(gem_mmap_gtt:1394) CRITICAL: error: 0x7000b848 != 0x6cf8
Subtest forked-big-copy-odd failed.
Comment 1 Marta Löfstedt 2018-03-21 09:50:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_5/fi-bsw-n3050/igt@gem_mmap_gtt@forked-medium-copy-xy.html

(gem_mmap_gtt:1626) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636:
(gem_mmap_gtt:1626) CRITICAL: Failed assertion: page[j] == ~(i + j)
(gem_mmap_gtt:1626) CRITICAL: Last errno: 25, Inappropriate ioctl for device
(gem_mmap_gtt:1626) CRITICAL: error: 0x7a86c0 != 0xffffddc4
Subtest forked-medium-copy-XY failed.
Comment 2 Marta Löfstedt 2018-03-23 09:11:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-bsw-n3050/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html	

(gem_mmap_gtt:1371) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636:
(gem_mmap_gtt:1371) CRITICAL: Failed assertion: page[j] == ~(i + j)
(gem_mmap_gtt:1371) CRITICAL: Last errno: 25, Inappropriate ioctl for device
(gem_mmap_gtt:1371) CRITICAL: error: 0xe0ff00 != 0xffffee42
Subtest forked-basic-small-copy-XY failed.
Comment 3 Marta Löfstedt 2018-04-03 11:41:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-bsw-n3050/igt@gem_mmap_gtt@forked-big-copy.html

(gem_mmap_gtt:1420) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636:
(gem_mmap_gtt:1420) CRITICAL: Failed assertion: page[j] == ~(i + j)
(gem_mmap_gtt:1420) CRITICAL: Last errno: 25, Inappropriate ioctl for device
(gem_mmap_gtt:1420) CRITICAL: error: 0x7000b848 != 0xffffaee8
Subtest forked-big-copy failed.
Comment 4 Martin Peres 2018-09-03 08:41:40 UTC
Still happening:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_93/fi-bsw-kefka/igt@gem_mmap_gtt@forked-medium-copy-xy.html

(gem_mmap_gtt:1170) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:671:
(gem_mmap_gtt:1170) CRITICAL: Failed assertion: page[j] == i + j
(gem_mmap_gtt:1170) CRITICAL: Last errno: 25, Inappropriate ioctl for device
(gem_mmap_gtt:1170) CRITICAL: error: 0xf566cfcc != 0x3d02
Subtest forked-medium-copy-XY failed.(ge
Comment 5 Chris Wilson 2019-01-09 14:09:46 UTC
I think we may need to undo

commit 4509276ee824bb967885c095c610767e42345c36
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Feb 20 12:47:18 2017 +0000

    drm/i915: Remove Braswell GGTT update w/a
    
    Testing with concurrent GGTT accesses no longer show the coherency
    problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
    to GGTT with access through GGTT on Braswell"). My presumption is that
    the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
    Wait for writes through the GTT to land before reading back"), along
    with the use of WC updates to the global gTT in commit 8448661d65f6
    ("drm/i915: Convert clflushed pagetables over to WC maps". Given
    that the original symptoms can no longer be reproduced, time to remove
    the workaround.

and so restore 

commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 23 18:43:32 2015 +0100

    drm/i915: Serialise updates to GGTT with access through GGTT on Braswell
    
    When accessing through the GTT from one CPU whilst concurrently updating
    the GGTT PTEs in another thread, the hardware likes to return random
    data. As we have strong serialisation prevent us from modifying the PTE
    of an active GTT mmapping, we have to conclude that it whilst modifying
    other PTE's that error occurs. (I have not looked for any pattern such
    as modifying PTE within the same page or cacheline as active PTE -
    though checking whether revoking neighbouring objects should be enough
    to test that theory.) The corruption also seems restricted to Braswell
    and disappears with maxcpus=0. This patch stops all access through the
    GTT by other CPUs when we update any PTE by stopping the machine around
    the GGTT update.
    
    Note that splitting up the 64 bit write into two 32 bit writes was
    tried and found to fail too.
Comment 6 Chris Wilson 2019-01-15 09:25:03 UTC
commit 8cd999181f8c744c87fb64e7b3600876ec3428b2 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 14 21:17:27 2019 +0000

    drm/i915: Prevent concurrent GGTT update and use on Braswell (again)
    
    On Braswell, under heavy stress, if we update the GGTT while
    simultaneously accessing another region inside the GTT, we are returned
    the wrong values. To prevent this we stop the machine to update the GGTT
    entries so that no memory traffic can occur at the same time.
    
    This was first spotted in
    
    commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Fri Oct 23 18:43:32 2015 +0100
    
        drm/i915: Serialise updates to GGTT with access through GGTT on Braswell
    
    but removed again in forlorn hope with
    
    commit 4509276ee824bb967885c095c610767e42345c36
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Mon Feb 20 12:47:18 2017 +0000
    
        drm/i915: Remove Braswell GGTT update w/a
    
    However, gem_concurrent_blit is once again only stable with the patch
    applied and CI is detecting the odd failure in forked gem_mmap_gtt tests
    (which smell like the same issue). Fwiw, a wide variety of CPU memory
    barriers (around GGTT flushing, fence updates, PTE updates) and GPU
    flushes/invalidates (between requests, after PTE updates) were tried as
    part of the investigation to find an alternate cause, nothing comes
    close to serialised GGTT updates.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105591
    Testcase: igt/gem_concurrent_blit
    Testcase: igt/gem_mmap_gtt/*forked*
    References: 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell")
    References: 4509276ee824 ("drm/i915: Remove Braswell GGTT update w/a")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190114211729.30352-1-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.