https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-bsw-n3050/igt@gem_mmap_gtt@forked-big-copy-odd.html (gem_mmap_gtt:1394) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:638: (gem_mmap_gtt:1394) CRITICAL: Failed assertion: page[j] == i + j (gem_mmap_gtt:1394) CRITICAL: Last errno: 25, Inappropriate ioctl for device (gem_mmap_gtt:1394) CRITICAL: error: 0x7000b848 != 0x6cf8 Subtest forked-big-copy-odd failed.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_5/fi-bsw-n3050/igt@gem_mmap_gtt@forked-medium-copy-xy.html (gem_mmap_gtt:1626) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636: (gem_mmap_gtt:1626) CRITICAL: Failed assertion: page[j] == ~(i + j) (gem_mmap_gtt:1626) CRITICAL: Last errno: 25, Inappropriate ioctl for device (gem_mmap_gtt:1626) CRITICAL: error: 0x7a86c0 != 0xffffddc4 Subtest forked-medium-copy-XY failed.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-bsw-n3050/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html (gem_mmap_gtt:1371) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636: (gem_mmap_gtt:1371) CRITICAL: Failed assertion: page[j] == ~(i + j) (gem_mmap_gtt:1371) CRITICAL: Last errno: 25, Inappropriate ioctl for device (gem_mmap_gtt:1371) CRITICAL: error: 0xe0ff00 != 0xffffee42 Subtest forked-basic-small-copy-XY failed.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-bsw-n3050/igt@gem_mmap_gtt@forked-big-copy.html (gem_mmap_gtt:1420) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:636: (gem_mmap_gtt:1420) CRITICAL: Failed assertion: page[j] == ~(i + j) (gem_mmap_gtt:1420) CRITICAL: Last errno: 25, Inappropriate ioctl for device (gem_mmap_gtt:1420) CRITICAL: error: 0x7000b848 != 0xffffaee8 Subtest forked-big-copy failed.
Still happening: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_93/fi-bsw-kefka/igt@gem_mmap_gtt@forked-medium-copy-xy.html (gem_mmap_gtt:1170) CRITICAL: Test assertion failure function test_huge_copy, file ../tests/gem_mmap_gtt.c:671: (gem_mmap_gtt:1170) CRITICAL: Failed assertion: page[j] == i + j (gem_mmap_gtt:1170) CRITICAL: Last errno: 25, Inappropriate ioctl for device (gem_mmap_gtt:1170) CRITICAL: error: 0xf566cfcc != 0x3d02 Subtest forked-medium-copy-XY failed.(ge
I think we may need to undo commit 4509276ee824bb967885c095c610767e42345c36 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 20 12:47:18 2017 +0000 drm/i915: Remove Braswell GGTT update w/a Testing with concurrent GGTT accesses no longer show the coherency problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell"). My presumption is that the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: Wait for writes through the GTT to land before reading back"), along with the use of WC updates to the global gTT in commit 8448661d65f6 ("drm/i915: Convert clflushed pagetables over to WC maps". Given that the original symptoms can no longer be reproduced, time to remove the workaround. and so restore commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 23 18:43:32 2015 +0100 drm/i915: Serialise updates to GGTT with access through GGTT on Braswell When accessing through the GTT from one CPU whilst concurrently updating the GGTT PTEs in another thread, the hardware likes to return random data. As we have strong serialisation prevent us from modifying the PTE of an active GTT mmapping, we have to conclude that it whilst modifying other PTE's that error occurs. (I have not looked for any pattern such as modifying PTE within the same page or cacheline as active PTE - though checking whether revoking neighbouring objects should be enough to test that theory.) The corruption also seems restricted to Braswell and disappears with maxcpus=0. This patch stops all access through the GTT by other CPUs when we update any PTE by stopping the machine around the GGTT update. Note that splitting up the 64 bit write into two 32 bit writes was tried and found to fail too.
commit 8cd999181f8c744c87fb64e7b3600876ec3428b2 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jan 14 21:17:27 2019 +0000 drm/i915: Prevent concurrent GGTT update and use on Braswell (again) On Braswell, under heavy stress, if we update the GGTT while simultaneously accessing another region inside the GTT, we are returned the wrong values. To prevent this we stop the machine to update the GGTT entries so that no memory traffic can occur at the same time. This was first spotted in commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 23 18:43:32 2015 +0100 drm/i915: Serialise updates to GGTT with access through GGTT on Braswell but removed again in forlorn hope with commit 4509276ee824bb967885c095c610767e42345c36 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 20 12:47:18 2017 +0000 drm/i915: Remove Braswell GGTT update w/a However, gem_concurrent_blit is once again only stable with the patch applied and CI is detecting the odd failure in forked gem_mmap_gtt tests (which smell like the same issue). Fwiw, a wide variety of CPU memory barriers (around GGTT flushing, fence updates, PTE updates) and GPU flushes/invalidates (between requests, after PTE updates) were tried as part of the investigation to find an alternate cause, nothing comes close to serialised GGTT updates. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105591 Testcase: igt/gem_concurrent_blit Testcase: igt/gem_mmap_gtt/*forked* References: 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell") References: 4509276ee824 ("drm/i915: Remove Braswell GGTT update w/a") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114211729.30352-1-chris@chris-wilson.co.uk
Last seen drmtip_194 (1 month / 829 runs ago). Closing this bug as fixed.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.