When running CI_DRM_3051, the machine fi-gdg-551 failed the following assert when running igt@prime_vgem@basic-gtt: (prime_vgem:2359) CRITICAL: Test assertion failure function test_gtt, file prime_vgem.c:244: (prime_vgem:2359) CRITICAL: Failed assertion: gtt[1024*i] == ~i (prime_vgem:2359) CRITICAL: error: 135 != -136 Subtest basic-gtt failed. Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3051/fi-gdg-551/igt@prime_vgem@basic-gtt.html
This is just telling us that the GTT/WC on i915g isn't fully coherent, that a write into the GTT isn't immediately visible by a read in WC. Very similar to gem_mmap_gtt/coherency (which also fails) but different timing due to using WC rather than WB. Not sure what we can do to prevent userspace falling into this trap. Ban PRIME on i915g? That seems overkill -- the same coherency issue is there on byt/bsw/bxt/glk, we just haven't hit it on this path. Move the fine grained coherency check (i.e. concurrent access to vgem and i915 mmaps) to another test and accept the failure.
(In reply to Chris Wilson from comment #1) > Move the fine grained coherency check (i.e. concurrent access to vgem and > i915 mmaps) to another test and accept the failure. That sounds like the best possible solution to me!
*** Bug 102066 has been marked as a duplicate of this bug. ***
For a similar issue within the kernel (i.e. not this test but others): commit c5ba5b24657e473b1c64b0a614b168a635a2c935 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 7 19:45:20 2017 +0100 drm/i915: Apply the GTT write flush for all !llc machines We also see the delayed GTT write issue on i915g/i915gm, so let's presume that it is a universal problem for all !llc machines, and that we just haven't yet noticed on g33, gen4 and gen5 machines.
(In reply to Chris Wilson from comment #4) > For a similar issue within the kernel (i.e. not this test but others): > commit c5ba5b24657e473b1c64b0a614b168a635a2c935 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Sep 7 19:45:20 2017 +0100 > > drm/i915: Apply the GTT write flush for all !llc machines > > We also see the delayed GTT write issue on i915g/i915gm, so let's > presume that it is a universal problem for all !llc machines, and that we > just haven't yet noticed on g33, gen4 and gen5 machines. This does not seem to be sufficient... https://intel-gfx-ci.01.org/cibuglog/index.html%3Faction_failures_history=-1&failures_test=igt@prime_vgem@basic-gtt&failures_machine=fi-gdg-551.html
(In reply to Martin Peres from comment #5) > (In reply to Chris Wilson from comment #4) > > For a similar issue within the kernel (i.e. not this test but others): > > commit c5ba5b24657e473b1c64b0a614b168a635a2c935 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Thu Sep 7 19:45:20 2017 +0100 > > > > drm/i915: Apply the GTT write flush for all !llc machines > > > > We also see the delayed GTT write issue on i915g/i915gm, so let's > > presume that it is a universal problem for all !llc machines, and that we > > just haven't yet noticed on g33, gen4 and gen5 machines. > > This does not seem to be sufficient... It's for the same issue but inside the kernel (should show up in the pwrite tests, some reloc tests). Here userspace is able to expose the coherency issue using two mmaps that are expected to coherent, there's no kernel intervention in the test; there's an equivalent test in gem_mmap_gtt to show how the same issue can be detected during pure i915 mmappings. It is theorectically possible to prevent it, we would have to trap every fault and do exclusive domain management. We would have to intercept all contending mmaps -- challenging.
commit f86dc17cfc81f53f3bf216009ffda1ac05208bcc (upstream/master, origin/master, origin/HEAD) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 7 15:24:23 2017 +0100 igt/prime_vgem: Split out the fine-grain coherency check
Verified to be fixed! Thanks a lot Chris!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.