Bug 102577 - [BAT][GDG] Failed assertion: gtt[1024*i] == ~i in igt@prime_vgem@basic-gtt
Summary: [BAT][GDG] Failed assertion: gtt[1024*i] == ~i in igt@prime_vgem@basic-gtt
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 102066 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-09-07 07:54 UTC by Martin Peres
Modified: 2017-10-03 16:47 UTC (History)
1 user (show)

See Also:
i915 platform: I915GM
i915 features:


Attachments

Description Martin Peres 2017-09-07 07:54:34 UTC
When running CI_DRM_3051, the machine fi-gdg-551 failed the following assert when running igt@prime_vgem@basic-gtt:

(prime_vgem:2359) CRITICAL: Test assertion failure function test_gtt, file prime_vgem.c:244:
(prime_vgem:2359) CRITICAL: Failed assertion: gtt[1024*i] == ~i
(prime_vgem:2359) CRITICAL: error: 135 != -136
Subtest basic-gtt failed.

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3051/fi-gdg-551/igt@prime_vgem@basic-gtt.html
Comment 1 Chris Wilson 2017-09-07 08:34:40 UTC
This is just telling us that the GTT/WC on i915g isn't fully coherent, that a write into the GTT isn't immediately visible by a read in WC. Very similar to gem_mmap_gtt/coherency (which also fails) but different timing due to using WC rather than WB.

Not sure what we can do to prevent userspace falling into this trap. Ban PRIME on i915g? That seems overkill -- the same coherency issue is there on byt/bsw/bxt/glk, we just haven't hit it on this path.

Move the fine grained coherency check (i.e. concurrent access to vgem and i915 mmaps) to another test and accept the failure.
Comment 2 Martin Peres 2017-09-07 08:38:32 UTC
(In reply to Chris Wilson from comment #1)
> Move the fine grained coherency check (i.e. concurrent access to vgem and
> i915 mmaps) to another test and accept the failure.

That sounds like the best possible solution to me!
Comment 3 Chris Wilson 2017-09-07 19:37:56 UTC
*** Bug 102066 has been marked as a duplicate of this bug. ***
Comment 4 Chris Wilson 2017-09-07 21:03:27 UTC
For a similar issue within the kernel (i.e. not this test but others):
commit c5ba5b24657e473b1c64b0a614b168a635a2c935
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 7 19:45:20 2017 +0100

    drm/i915: Apply the GTT write flush for all !llc machines
    
    We also see the delayed GTT write issue on i915g/i915gm, so let's
    presume that it is a universal problem for all !llc machines, and that we
    just haven't yet noticed on g33, gen4 and gen5 machines.
Comment 5 Martin Peres 2017-09-08 09:32:08 UTC
(In reply to Chris Wilson from comment #4)
> For a similar issue within the kernel (i.e. not this test but others):
> commit c5ba5b24657e473b1c64b0a614b168a635a2c935
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Sep 7 19:45:20 2017 +0100
> 
>     drm/i915: Apply the GTT write flush for all !llc machines
>     
>     We also see the delayed GTT write issue on i915g/i915gm, so let's
>     presume that it is a universal problem for all !llc machines, and that we
>     just haven't yet noticed on g33, gen4 and gen5 machines.

This does not seem to be sufficient...

https://intel-gfx-ci.01.org/cibuglog/index.html%3Faction_failures_history=-1&failures_test=igt@prime_vgem@basic-gtt&failures_machine=fi-gdg-551.html
Comment 6 Chris Wilson 2017-09-08 09:55:23 UTC
(In reply to Martin Peres from comment #5)
> (In reply to Chris Wilson from comment #4)
> > For a similar issue within the kernel (i.e. not this test but others):
> > commit c5ba5b24657e473b1c64b0a614b168a635a2c935
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu Sep 7 19:45:20 2017 +0100
> > 
> >     drm/i915: Apply the GTT write flush for all !llc machines
> >     
> >     We also see the delayed GTT write issue on i915g/i915gm, so let's
> >     presume that it is a universal problem for all !llc machines, and that we
> >     just haven't yet noticed on g33, gen4 and gen5 machines.
> 
> This does not seem to be sufficient...

It's for the same issue but inside the kernel (should show up in the pwrite tests, some reloc tests). Here userspace is able to expose the coherency issue using two mmaps that are expected to coherent, there's no kernel intervention in the test; there's an equivalent test in gem_mmap_gtt to show how the same issue can be detected during pure i915 mmappings.

It is theorectically possible to prevent it, we would have to trap every fault and do exclusive domain management. We would have to intercept all contending mmaps -- challenging.
Comment 7 Chris Wilson 2017-09-21 16:18:30 UTC
commit f86dc17cfc81f53f3bf216009ffda1ac05208bcc (upstream/master, origin/master,
 origin/HEAD)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 7 15:24:23 2017 +0100

    igt/prime_vgem: Split out the fine-grain coherency check
Comment 8 Martin Peres 2017-09-21 18:32:10 UTC
Verified to be fixed! Thanks a lot Chris!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.