On CI_DRM_3051, the machine fi-gdg-551 hits the following assert when running igt@gem_mmap_gtt@basic-small-bo-tiledx: (gem_mmap_gtt:1816) CRITICAL: Test assertion failure function test_huge_bo, file gem_mmap_gtt.c:518: (gem_mmap_gtt:1816) CRITICAL: Failed assertion: memcmp(ptr , linear_pattern, PAGE_SIZE) == 0 Subtest basic-small-bo-tiledX failed. Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3051/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html
Not a clue. Nothing rings alarm bells in the test, and it passes 100% on my 915gm. Big difference in the nature of fences between i915g and everything else in the farm though.
Now also for the tiledY subtest https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3087/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledy.html
Seen also on https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3150/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo.html
Links are dead here is new one: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3644/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html Also, we haven't hit the issue for basic-small-bo-tiledy nor basic-small-bo, for over 600 runs.
For quite some time it is only the tiledX that sporadically fails on GDG https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3828/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html (gem_mmap_gtt:1863) CRITICAL: Test assertion failure function test_huge_bo, file gem_mmap_gtt.c:522: (gem_mmap_gtt:1863) CRITICAL: Failed assertion: memcmp(ptr , tiled_pattern, PAGE_SIZE) == 0 Subtest basic-small-bo-tiledX failed.
*** Bug 106014 has been marked as a duplicate of this bug. ***
*** Bug 106016 has been marked as a duplicate of this bug. ***
*** Bug 106082 has been marked as a duplicate of this bug. ***
Based on the name of the test, I assume this is the same issue. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_22/fi-gdg-551/igt@gem_mmap_gtt@hang.html (gem_mmap_gtt:1138) CRITICAL: Test assertion failure function test_hang, file ../tests/gem_mmap_gtt.c:391: (gem_mmap_gtt:1138) CRITICAL: Failed assertion: gtt[0][x] == patterns[last_pattern] Subtest hang failed.
Pardon my intrusion. Running subtest basic-small-bo-tiledX with latest igt-gpu-tools (1.22+173+gf560ae5a-1) and drm-tip (4.17rc6+1560+g9d5095539d5f+755171-1) yields high failure rates on my GM45: out of 1000 iterations, only 360 are successful. Neither basic-small-bo-tiledY nor basic-small-bo have any failures with 100 iterations each. Platform: Dell Inspiron 1545 Eagle Lake / Core2Duo (Pentium(R) Dual-Core CPU T4200 @ 2.00GHz) / GMA4500 LVDS (VGA) I tested it out of curiosity to its relation to my other bug.
(In reply to Adric Blake from comment #10) > Pardon my intrusion. > > Running subtest basic-small-bo-tiledX with latest igt-gpu-tools > (1.22+173+gf560ae5a-1) and drm-tip (4.17rc6+1560+g9d5095539d5f+755171-1) > yields high failure rates on my GM45: out of 1000 iterations, only 360 are > successful. > > Neither basic-small-bo-tiledY nor basic-small-bo have any failures with 100 > iterations each. > > Platform: > Dell Inspiron 1545 > Eagle Lake / Core2Duo (Pentium(R) Dual-Core CPU T4200 @ 2.00GHz) / GMA4500 > LVDS (VGA) > > I tested it out of curiosity to its relation to my other bug. Thanks for this valuable data! This may mean mean that this bug should be split, I'll wait for Chris' opinion :)
That basically depends on whether the failure pattern is universal (any gtt write/read) like on gdg or specific to this test. At the present time, it is quite clear that we have a number of errata with gdg's CPU that we are not taking into account (that being adding noclflush fixed quite a few issues by itself is telling). I don't think it is very likely that gm45 with a Core2 is going to be exactly the same issues as a Pentium4.
Would thoroughly testing with gem_mmap_gtt (or another test?) be enough to determine if your first point is the case? Or would it be better to just go ahead and make a new bug?
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_40/fi-gdg-551/igt@gen3_render_tiledx_blits.html (gen3_render_tiledx_blits:1160) CRITICAL: Test assertion failure function check_bo, file ../tests/gen3_render_tiledx_blits.c:318: (gen3_render_tiledx_blits:1160) CRITICAL: Failed assertion: v[i] == val (gen3_render_tiledx_blits:1160) CRITICAL: Expected 0x0077ffb0, found 0x0077ffa0 at offset 0x000ffec0 Test gen3_render_tiledx_blits failed.
(In reply to Adric Blake from comment #13) > Would thoroughly testing with gem_mmap_gtt (or another test?) be enough to > determine if your first point is the case? Or would it be better to just go > ahead and make a new bug? Sorry for the delay, but in case of doubt: go ahead and make a new bug. Please reference back to this bug, so we can keep the history :)
So I thought this was a cpu issue, turns out to be unknown swizzling instead: commit a0f2d23b7d3d4226a0a7637a9240bfa86f08c1d3 (HEAD, upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 8 17:29:46 2018 +0100 igt/gem_mmap_gtt: Checking tiling pattern requires known swizzling As the swizzling is baked into the tiling pattern, the swizzling has to be consistent across the entire GTT mmap for our tests to work. However, under L-shaped memory configurations on older architectures, the swizzling varied depending on which region the page found itself in -- invalidating our assumptions and ability to predict the tiling pattern. Reported-by: Adric Blake <promarbler14@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Some gdg oddities still remain. Hopefully this will throw some light on to them as well.
(In reply to Chris Wilson from comment #16) > So I thought this was a cpu issue, turns out to be unknown swizzling instead: > > commit a0f2d23b7d3d4226a0a7637a9240bfa86f08c1d3 (HEAD, upstream/master) > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Jun 8 17:29:46 2018 +0100 > > igt/gem_mmap_gtt: Checking tiling pattern requires known swizzling > > As the swizzling is baked into the tiling pattern, the swizzling has to > be consistent across the entire GTT mmap for our tests to work. However, > under L-shaped memory configurations on older architectures, the > swizzling varied depending on which region the page found itself in -- > invalidating our assumptions and ability to predict the tiling pattern. > > Reported-by: Adric Blake <promarbler14@gmail.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106848 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Some gdg oddities still remain. Hopefully this will throw some light on to > them as well. Still happening, unless the following errors should be in a separate bug: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_partial_pwrite_pread@writes-after-reads.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_pread_pwrite.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_partial_pwrite_pread@writes.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_set_tiling_vs_pwrite.html
intel_os-WARNING: Insufficient free memory; /proc/meminfo: MemTotal: 965812 kB MemFree: 520256 kB MemAvailable: 510932 kB Buffers: 184 kB Cached: 97664 kB SwapCached: 0 kB Active: 188800 kB Inactive: 161496 kB Active(anon): 128000 kB Inactive(anon): 125308 kB Active(file): 60800 kB Inactive(file): 36188 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 1030140 kB SwapFree: 1030140 kB Dirty: 132 kB Writeback: 0 kB AnonPages: 252476 kB Mapped: 96804 kB Shmem: 856 kB Slab: 67848 kB SReclaimable: 26888 kB SUnreclaim: 40960 kB KernelStack: 2512 kB PageTables: 6080 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1513044 kB Committed_AS: 538144 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB AnonHugePages: 75776 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 81356 kB DirectMap2M: 950272 kB
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_69/fi-gdg-551/igt@gen3_render_tiledy_blits.html Test assertion failure function check_bo, file ../tests/gen3_render_tiledy_blits.c:318: (gen3_render_tiledy_blits:1420) CRITICAL: Failed assertion: v[i] == val (gen3_render_tiledy_blits:1420) CRITICAL: Expected 0x047ff138, found 0x047ff538 at offset 0x000fc4e0 Test gen3_render_tiledy_blits failed. Seems like another random bitflip!
commit 78071c2fa53db2f04b8eddc6e6118be4fbc5c2fe (HEAD, upstream/master) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 15 13:44:33 2018 +0100 igt/gem_tiled_partial_pwrite_pread: Check for known swizzling As we want to compare a templated tiling pattern against the target_bo, we need to know that the swizzling is compatible. Or else the two tiling pattern may differ due to underlying page address that we cannot know, and so the test may sporadically fail. References: https://bugs.freedesktop.org/show_bug.cgi?id=102575 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Ready for more info on any remaining flip-flops.
Ok, the remaining bug here is pread-vs-swizzling; whereby we may or may not swizzle on either path leading to an inconsistency.
igt@gem_set_tiling_vs_pwrite started failing consistently on GDG starting with drmtip_79: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_79/fi-gdg-551/igt@gem_set_tiling_vs_pwrite.html (gem_set_tiling_vs_pwrite:1174) CRITICAL: Test assertion failure function __real_main49, file ../tests/gem_set_tiling_vs_pwrite.c:78: (gem_set_tiling_vs_pwrite:1174) CRITICAL: Failed assertion: data[i] == i (gem_set_tiling_vs_pwrite:1174) CRITICAL: error: 0x10 != 0 Test gem_set_tiling_vs_pwrite failed.
Proposal from Chris pending rebase, resend and review is to remove the half-hearted unswizzling attempts: https://patchwork.freedesktop.org/series/47043/
The patch above is in drm-tip, but CI Bug Log still report occurrences of this bug (although I'm not sure whether it's really the same issue). Most recent example: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_187/fi-gdg-551/igt@gem_tiled_pread_pwrite.html What next?
Used to occur every 10-20 runs, now last seen 675 runs ago. I'll go out on a limb and call it resolved.
(In reply to Francesco Balestrieri from comment #25) > Used to occur every 10-20 runs, now last seen 675 runs ago. I'll go out on a > limb and call it resolved. Definitely looks fixed, indeed!
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.