Bug 102575 - [BAT][GDG] igt@gem_mmap_gtt@basic-small-bo-tiledx - fail - Failed assertion: memcmp(ptr , tiled_pattern, PAGE_SIZE) == 0
Summary: [BAT][GDG] igt@gem_mmap_gtt@basic-small-bo-tiledx - fail - Failed assertion: ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high critical
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 106014 106016 106082 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-09-07 07:39 UTC by Martin Peres
Modified: 2019-03-06 18:09 UTC (History)
2 users (show)

See Also:
i915 platform: I915G
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2017-09-07 07:39:31 UTC
On CI_DRM_3051, the machine fi-gdg-551 hits the following assert when running igt@gem_mmap_gtt@basic-small-bo-tiledx:
	
(gem_mmap_gtt:1816) CRITICAL: Test assertion failure function test_huge_bo, file gem_mmap_gtt.c:518:
(gem_mmap_gtt:1816) CRITICAL: Failed assertion: memcmp(ptr , linear_pattern, PAGE_SIZE) == 0
Subtest basic-small-bo-tiledX failed.

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3051/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html
Comment 1 Chris Wilson 2017-09-07 08:36:43 UTC
Not a clue. Nothing rings alarm bells in the test, and it passes 100% on my 915gm.

Big difference in the nature of fences between i915g and everything else in the farm though.
Comment 4 Marta Löfstedt 2018-01-18 14:03:05 UTC
Links are dead here is new one:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3644/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html

Also, we haven't hit the issue for basic-small-bo-tiledy nor basic-small-bo, for over 600 runs.
Comment 5 Marta Löfstedt 2018-02-23 10:41:42 UTC
For quite some time it is only the tiledX that sporadically fails on GDG

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3828/fi-gdg-551/igt@gem_mmap_gtt@basic-small-bo-tiledx.html

(gem_mmap_gtt:1863) CRITICAL: Test assertion failure function test_huge_bo, file gem_mmap_gtt.c:522:
(gem_mmap_gtt:1863) CRITICAL: Failed assertion: memcmp(ptr , tiled_pattern, PAGE_SIZE) == 0
Subtest basic-small-bo-tiledX failed.
Comment 6 Chris Wilson 2018-04-12 18:06:22 UTC
*** Bug 106014 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2018-04-12 18:31:32 UTC
*** Bug 106016 has been marked as a duplicate of this bug. ***
Comment 8 Chris Wilson 2018-04-17 11:35:19 UTC
*** Bug 106082 has been marked as a duplicate of this bug. ***
Comment 9 Martin Peres 2018-04-18 13:19:43 UTC
Based on the name of the test, I assume this is the same issue.

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_22/fi-gdg-551/igt@gem_mmap_gtt@hang.html

(gem_mmap_gtt:1138) CRITICAL: Test assertion failure function test_hang, file ../tests/gem_mmap_gtt.c:391:
(gem_mmap_gtt:1138) CRITICAL: Failed assertion: gtt[0][x] == patterns[last_pattern]
Subtest hang failed.
Comment 10 Adric Blake 2018-05-25 19:00:27 UTC
Pardon my intrusion.

Running subtest basic-small-bo-tiledX with latest igt-gpu-tools (1.22+173+gf560ae5a-1) and drm-tip (4.17rc6+1560+g9d5095539d5f+755171-1) yields high failure rates on my GM45: out of 1000 iterations, only 360 are successful.

Neither basic-small-bo-tiledY nor basic-small-bo have any failures with 100 iterations each.

Platform:
Dell Inspiron 1545
Eagle Lake / Core2Duo (Pentium(R) Dual-Core CPU T4200 @ 2.00GHz) / GMA4500
LVDS (VGA)

I tested it out of curiosity to its relation to my other bug.
Comment 11 Martin Peres 2018-05-25 19:40:58 UTC
(In reply to Adric Blake from comment #10)
> Pardon my intrusion.
> 
> Running subtest basic-small-bo-tiledX with latest igt-gpu-tools
> (1.22+173+gf560ae5a-1) and drm-tip (4.17rc6+1560+g9d5095539d5f+755171-1)
> yields high failure rates on my GM45: out of 1000 iterations, only 360 are
> successful.
> 
> Neither basic-small-bo-tiledY nor basic-small-bo have any failures with 100
> iterations each.
> 
> Platform:
> Dell Inspiron 1545
> Eagle Lake / Core2Duo (Pentium(R) Dual-Core CPU T4200 @ 2.00GHz) / GMA4500
> LVDS (VGA)
> 
> I tested it out of curiosity to its relation to my other bug.

Thanks for this valuable data! This may mean mean that this bug should be split, I'll wait for Chris' opinion :)
Comment 12 Chris Wilson 2018-05-25 19:49:45 UTC
That basically depends on whether the failure pattern is universal (any gtt write/read) like on gdg or specific to this test. At the present time, it is quite clear that we have a number of errata with gdg's CPU that we are not taking into account (that being adding noclflush fixed quite a few issues by itself is telling). I don't think it is very likely that gm45 with a Core2 is going to be exactly the same issues as a Pentium4.
Comment 13 Adric Blake 2018-05-26 00:14:36 UTC
Would thoroughly testing with gem_mmap_gtt (or another test?) be enough to determine if your first point is the case? Or would it be better to just go ahead and make a new bug?
Comment 14 Martin Peres 2018-05-28 15:18:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_40/fi-gdg-551/igt@gen3_render_tiledx_blits.html

(gen3_render_tiledx_blits:1160) CRITICAL: Test assertion failure function check_bo, file ../tests/gen3_render_tiledx_blits.c:318:
(gen3_render_tiledx_blits:1160) CRITICAL: Failed assertion: v[i] == val
(gen3_render_tiledx_blits:1160) CRITICAL: Expected 0x0077ffb0, found 0x0077ffa0 at offset 0x000ffec0
Test gen3_render_tiledx_blits failed.
Comment 15 Martin Peres 2018-06-05 06:59:19 UTC
(In reply to Adric Blake from comment #13)
> Would thoroughly testing with gem_mmap_gtt (or another test?) be enough to
> determine if your first point is the case? Or would it be better to just go
> ahead and make a new bug?

Sorry for the delay, but in case of doubt: go ahead and make a new bug.

Please reference back to this bug, so we can keep the history :)
Comment 16 Chris Wilson 2018-06-11 13:26:36 UTC
So I thought this was a cpu issue, turns out to be unknown swizzling instead:

commit a0f2d23b7d3d4226a0a7637a9240bfa86f08c1d3 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 8 17:29:46 2018 +0100

    igt/gem_mmap_gtt: Checking tiling pattern requires known swizzling
    
    As the swizzling is baked into the tiling pattern, the swizzling has to
    be consistent across the entire GTT mmap for our tests to work. However,
    under L-shaped memory configurations on older architectures, the
    swizzling varied depending on which region the page found itself in --
    invalidating our assumptions and ability to predict the tiling pattern.
    
    Reported-by: Adric Blake <promarbler14@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106848
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some gdg oddities still remain. Hopefully this will throw some light on to them as well.
Comment 17 Martin Peres 2018-06-14 14:48:36 UTC
(In reply to Chris Wilson from comment #16)
> So I thought this was a cpu issue, turns out to be unknown swizzling instead:
> 
> commit a0f2d23b7d3d4226a0a7637a9240bfa86f08c1d3 (HEAD, upstream/master)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Jun 8 17:29:46 2018 +0100
> 
>     igt/gem_mmap_gtt: Checking tiling pattern requires known swizzling
>     
>     As the swizzling is baked into the tiling pattern, the swizzling has to
>     be consistent across the entire GTT mmap for our tests to work. However,
>     under L-shaped memory configurations on older architectures, the
>     swizzling varied depending on which region the page found itself in --
>     invalidating our assumptions and ability to predict the tiling pattern.
>     
>     Reported-by: Adric Blake <promarbler14@gmail.com>
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106848
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Some gdg oddities still remain. Hopefully this will throw some light on to
> them as well.

Still happening, unless the following errors should be in a separate bug:


https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_partial_pwrite_pread@writes-after-reads.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_pread_pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_tiled_partial_pwrite_pread@writes.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_62/fi-gdg-551/igt@gem_set_tiling_vs_pwrite.html
Comment 18 Chris Wilson 2018-06-15 13:20:38 UTC
intel_os-WARNING: Insufficient free memory; /proc/meminfo:
MemTotal:         965812 kB
MemFree:          520256 kB
MemAvailable:     510932 kB
Buffers:             184 kB
Cached:            97664 kB
SwapCached:            0 kB
Active:           188800 kB
Inactive:         161496 kB
Active(anon):     128000 kB
Inactive(anon):   125308 kB
Active(file):      60800 kB
Inactive(file):    36188 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1030140 kB
SwapFree:        1030140 kB
Dirty:               132 kB
Writeback:             0 kB
AnonPages:        252476 kB
Mapped:            96804 kB
Shmem:               856 kB
Slab:              67848 kB
SReclaimable:      26888 kB
SUnreclaim:        40960 kB
KernelStack:        2512 kB
PageTables:         6080 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1513044 kB
Committed_AS:     538144 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:     75776 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:       81356 kB
DirectMap2M:      950272 kB
Comment 19 Martin Peres 2018-06-23 12:50:08 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_69/fi-gdg-551/igt@gen3_render_tiledy_blits.html

Test assertion failure function check_bo, file ../tests/gen3_render_tiledy_blits.c:318:
(gen3_render_tiledy_blits:1420) CRITICAL: Failed assertion: v[i] == val
(gen3_render_tiledy_blits:1420) CRITICAL: Expected 0x047ff138, found 0x047ff538 at offset 0x000fc4e0
Test gen3_render_tiledy_blits failed.

Seems like another random bitflip!
Comment 20 Chris Wilson 2018-07-06 10:30:35 UTC
commit 78071c2fa53db2f04b8eddc6e6118be4fbc5c2fe (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 15 13:44:33 2018 +0100

    igt/gem_tiled_partial_pwrite_pread: Check for known swizzling
    
    As we want to compare a templated tiling pattern against the target_bo,
    we need to know that the swizzling is compatible. Or else the two
    tiling pattern may differ due to underlying page address that we cannot
    know, and so the test may sporadically fail.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=102575
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Ready for more info on any remaining flip-flops.
Comment 21 Chris Wilson 2018-07-14 17:34:11 UTC
Ok, the remaining bug here is pread-vs-swizzling; whereby we may or may not swizzle on either path leading to an inconsistency.
Comment 22 Martin Peres 2018-07-18 14:51:32 UTC
igt@gem_set_tiling_vs_pwrite started failing consistently on GDG starting with drmtip_79:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_79/fi-gdg-551/igt@gem_set_tiling_vs_pwrite.html

(gem_set_tiling_vs_pwrite:1174) CRITICAL: Test assertion failure function __real_main49, file ../tests/gem_set_tiling_vs_pwrite.c:78:
(gem_set_tiling_vs_pwrite:1174) CRITICAL: Failed assertion: data[i] == i
(gem_set_tiling_vs_pwrite:1174) CRITICAL: error: 0x10 != 0
Test gem_set_tiling_vs_pwrite failed.
Comment 23 Tvrtko Ursulin 2018-11-02 14:53:35 UTC
Proposal from Chris pending rebase, resend and review is to remove the half-hearted unswizzling attempts: https://patchwork.freedesktop.org/series/47043/
Comment 24 Francesco Balestrieri 2019-01-09 07:27:05 UTC
The patch above is in drm-tip, but CI Bug Log still report occurrences of this bug (although I'm not sure whether it's really the same issue). Most recent example:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_187/fi-gdg-551/igt@gem_tiled_pread_pwrite.html

What next?
Comment 25 Francesco Balestrieri 2019-02-06 07:33:07 UTC
Used to occur every 10-20 runs, now last seen 675 runs ago. I'll go out on a limb and call it resolved.
Comment 26 Martin Peres 2019-03-06 18:09:34 UTC
(In reply to Francesco Balestrieri from comment #25)
> Used to occur every 10-20 runs, now last seen 675 runs ago. I'll go out on a
> limb and call it resolved.

Definitely looks fixed, indeed!
Comment 27 CI Bug Log 2019-03-06 18:09:42 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.