This test seems to fail sporadically with a couple different failure errors. The most common one is: (gem_exec_flush:6041) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:589: (gem_exec_flush:6041) ioctl-wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0 (gem_exec_flush:6041) ioctl-wrappers-CRITICAL: error: -22 != 0 But looking through the CI history, it appears there's also sometimes: (gem_exec_flush:6131) CRITICAL: Test assertion failure function batch, file gem_exec_flush.c:456: (gem_exec_flush:6131) CRITICAL: Failed assertion: map[i] == cycles + i (gem_exec_flush:6131) CRITICAL: error: 0xabcdabcd != 0x3 CI history: /archive/results/CI_IGT_test/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html
The only problem here is the sporadic failure - and that is mostly due to the overhead of the CI kernels hiding the issue. Since we are under severe time constraints for BAT, making the tests longer to improve detection rates is also problematic. Stuck between a rock and a hard place!
This test it fails quite often on BYT: /archive/results/CI_IGT_test/RO_CI_DRM_365/fi-byt-n2820/html/fi-byt-n2820@RO_CI_DRM_365@1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html /archive/results/CI_IGT_test/RO_CI_DRM_365/ro-byt-n2820/html/ro-byt-n2820@RO_CI_DRM_365@1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html Attaching also dmesg logs
Created attachment 123671 [details] dmesg fi-byt-n2820
Created attachment 123672 [details] dmesg ro-byt-n2820
Another instance: http://gfxci.rb.intel.com/archive/results/CI_IGT_test/RO_Patchwork_993/ro-byt-n2820/html/ro-byt-n2820@RO_Patchwork_993@1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html
The second of the dmesg logs that Daniela posted contains the line: [ 313.349534] [drm:i915_parse_cmds] CMD: Command length exceeds batch length: 0x7FDEE770 length=114 batchlen=4 I can't see anywhere in the i-g-t tests that submits such a batch; firstly, the length is not a multiple of 8, whereas we normally pad them to an even DWord, and secondly, that hex number doesn't appear to be a valid instruction. Is the parser perhaps picking up undefined data? That would explain why we see these failures only on BYT, and only intermittently. .Dave.
Yes. For the cmdparser there are 2 sources of incoherency: writes from the CPU cache to memory are not being ordered with mfence; clflush; mfence and secondly writes through the GTT are not immediately coherent. More details, ideas and patches, on the mailing list from last year and other bugs that are even older.
commit 3b5724d702ef24ee41ca008a1fab1cf94f3d31b5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 18 17:16:49 2016 +0100 drm/i915: Wait for writes through the GTT to land before reading back If we quickly switch from writing through the GTT to a read of the physical page directly with the CPU (e.g. performing relocations through the GTT and then running the command parser), we can observe that the writes are not visible to the CPU. It is not a coherency problem, as extensive investigations with clflush have demonstrated, but a mere timing issue - we have to wait for the GTT to complete it's write before we start our read from the CPU.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.