CI history suggests it is sporadic since CI_DRM_1062. Results for igt@gem_sync@basic-bsd Overview Result: dmesg-fail Back to summary Details Detail Value Returncode 99 Time 0:00:10.400008 Stdout IGT-Version: 1.13-gd627e30 (x86_64) (Linux: 4.5.0-rc5-gfxbench+ x86_64) Completed 147082 cycles: 67.990 us Stack trace: #0 [__igt_fail_assert+0x101] #1 [__real_main106+0x2f7] #2 [main+0x23] #3 [__libc_start_main+0xf0] #4 [_start+0x29] Subtest basic-bsd: FAIL (10.004s) Stderr (gem_sync:5862) CRITICAL: Test assertion failure function sync_ring, file gem_sync.c:103: (gem_sync:5862) CRITICAL: Failed assertion: intel_detect_and_clear_missed_interrupts(fd) == 0 (gem_sync:5862) CRITICAL: error: 4 != 0 Subtest basic-bsd failed. **** DEBUG **** (gem_sync:5862) ioctl-wrappers-DEBUG: Test requirement passed: gem_has_ring(fd, ring_id) (gem_sync:5862) INFO: Completed 147082 cycles: 67.990 us (gem_sync:5862) CRITICAL: Test assertion failure function sync_ring, file gem_sync.c:103: (gem_sync:5862) CRITICAL: Failed assertion: intel_detect_and_clear_missed_interrupts(fd) == 0 (gem_sync:5862) CRITICAL: error: 4 != 0 **** END **** Environment PIGLIT_SOURCE_DIR="/opt/igt/piglit" PIGLIT_PLATFORM="mixed_glx_egl" Command /opt/igt/tests/gem_sync --run-subtest basic-bsd dmesg [ 226.822964] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... bsd ring idle
History suggests that we never fixed it, the odd bug report filed here kept being closed as inconclusive. It appears I have a ilk (x201s) machine that reasonably reliably reproduces it on -nightly but my breadcrumbs kernel seems clean. Promising.
*** Bug 78681 has been marked as a duplicate of this bug. ***
Created attachment 122009 [details] [review] Only start retire worker when idle The surprise winner.
Just the error message is being suppressed and it barely affects the timings.
static void gen5_seqno_barrier(struct intel_engine_cs *ring) { udelay(100); } incoherent MI_STORE* for the win.
75us is as low as I can go and not start seeing missed seqno writes. An uncached mmio read + clflush + mfences on this machine is <500ns so the only explanation is that the GPU adds a significant buffer to the MI_STORE command that we cannot flush. It's also not documented whether poking SyncFlush flushes the internal MI_STORE buffer (seems unlikely).
SyncFlush takes ~10-20us but does not wait for the MI_STORE to complete, i.e. the problem persists.
Sighted on HSW igt@gem_storedw_loop@basic-blt as well now: /archive/results/CI_IGT_test/Patchwork_1521/
(In reply to Tvrtko Ursulin from comment #8) > Sighted on HSW igt@gem_storedw_loop@basic-blt as well now: > > /archive/results/CI_IGT_test/Patchwork_1521/ Likely related (hard to tell without actual access, there are some potential false positives in that hangcheck path as well) but since the hw is quite different, it would be best to keep them separate. The Ironlake bug is definitely that we don't apply a seqno-barrier for the BSD engine (but we do for the render path hence the discrepancy). Whereas for HSW, it is possible that the existing barrier is not long enough for that machine, or we have one of the false positives etc.
Since CI_DRM_1064, test gem_storedw_loop@basic-blt is passing on HSW [1] On ILK, test gem_sync@basic-bsd still fails intermittently [2] in CI_DRM_1113 and CI_DRM_1125 On il5-i5-650 in Ro Jenkins, the gem_sync@basic-bsd test is passing for several builds, archive/results/CI_IGT_test/ilk1-i5-650.html [1] /archive/results/CI_IGT_test/igt@gem_storedw_loop@basic-blt.html [2] /archive/results/CI_IGT_test/igt@gem_sync@basic-bsd.html
assigning to Gabriel to investigate
The following could be related: http://benchsrv.fi.intel.com/archive/results/CI_IGT_test/Patchwork_1535/ilk-hp8440p/html/ilk-hp8440p@Patchwork_1535@1/igt@gem_sync@basic-all.html
Any tricks to reproduce this on a ILK? I tried on 2 different ILKs (i5-650), using something like: $for i in {1..100}; do sudo ./gem_sync --run-subtest basic-bsd; done Both showed a 100% success ratio. The same with basic-all.
It's a low level timing issue that will be dependent upon the GPU clock speeds, CPU clock speeds and the memory clocks -- i.e. different machines will have different error rates. What you can try is restricting the CPU to c0, disable rc6, or keep swapping machines until you find one that is easy to reproduce!
Another instance for the basic-each subtest: http://benchsrv.fi.intel.com//archive/results/CI_IGT_test/Patchwork_1967/ilk-hp8440p/html/ilk-hp8440p@Patchwork_1967@1/igt@gem_sync@basic-each.html IGT-Version: 1.14-gf650aa2 (x86_64) (Linux: 4.6.0-rc4-CI-Patchwork_1967+ x86_64) bsd completed 49152 cycles: 203.635 us render completed 142336 cycles: 70.493 us Stack trace: #0 [__igt_fail_assert+0x101] #1 [sync_ring+0x301] #2 [__real_main204+0xe8] #3 [main+0x23] #4 [__libc_start_main+0xf0] #5 [_start+0x29] Subtest basic-each: FAIL (10.075s) (gem_sync:5959) CRITICAL: Test assertion failure function sync_ring, file gem_sync.c:138: (gem_sync:5959) CRITICAL: Failed assertion: intel_detect_and_clear_missed_interrupts(fd) == 0 (gem_sync:5959) CRITICAL: error: 4 != 0 Subtest basic-each failed. **** DEBUG **** (gem_sync:5959) CRITICAL: Test assertion failure function sync_ring, file gem_sync.c:138: (gem_sync:5959) CRITICAL: Failed assertion: intel_detect_and_clear_missed_interrupts(fd) == 0 (gem_sync:5959) CRITICAL: error: 4 != 0 **** END ****
priority aligned for igt basic tests on gen7 to High+Critical
Should be "fixed" now... commit f8973c217f07903247d222ab92ad37e2529aff2e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jul 1 17:23:21 2016 +0100 drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
Closing as fixed -confirmed passing at least since CI_DRM_1649, more than 2 weeks ago (CI_IGT_test/fi-ilk-m540.html & CI_IGT_test/fi-ilk-650.html)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.