https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-icl-u2/igt@gem_tiled_pread_pwrite.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-icl-u2/igt@gem_pwrite_pread@uncached-pwrite-blt-gtt_mmap-performance.html Received signal SIGQUIT. Stack trace: #0 [fatal_sig_handler+0xd5] #1 [killpg+0x40] #2 [memcpy_from_wc_sse41+0x184] #3 [copy_wc_page+0x28] #4 [__real_main108+0x1c8] #5 [main+0x44] #6 [__libc_start_main+0xe7] #7 [_start+0x2a]
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@gem_* - timeout - Received signal SIGQUIT - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-icl-u2/igt@gem_tiled_pread_pwrite.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_221/fi-icl-u2/igt@gem_pwrite_pread@uncached-pwrite-blt-gtt_mmap-performance.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_224/fi-icl-u2/igt@gem_mmap_gtt@big-copy-xy.html
big-copy and gtt-mmap-performance are expected to be fairly slow, so it's not surprising that they may timeout (and so probably shouldn't conflate bug reports) gem_tiled_pread_pwrite should only take about 10s. It's pretty much as if the cpu throttled itself. There's nothing here that would vary between runs.
Won't fix then?
These sporadic pauses shouldn't be happening, and I don't know why they are happening. I think they are external to i915, but I just can't be sure... The slow test cases that only exist to give perf metrics we can (and will) drop from CI (in exchange for dedicated perf metrics???) but there's a wider issue here that seems to be affecting icl at large.
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@gem_* - timeout - Received signal SIGQUIT -} {+ ICL: igt@gem_* - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_229/fi-icl-u2/igt@gem_linear_blits@interruptible.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@gem_* - timeout - Received signal SIGQUIT -} {+ ICL: igt@gem_* - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5668/fi-icl-y/igt@gem_mmap_gtt@basic-small-copy.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@gem_* - timeout - Received signal SIGQUIT -} {+ ICL: igt@gem_* - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-icl-u2/igt@gem_linear_blits@normal.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-icl-u2/igt@gem_tiled_blits@interruptible.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-icl-u3/igt@gem_linear_blits@normal.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_232/fi-icl-u3/igt@gem_tiled_blits@interruptible.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@gem_* - timeout - Received signal SIGQUIT -} {+ ICL: igt@ random tests - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4862/fi-icl-u3/igt@kms_plane@pixel-format-pipe-c-planes-source-clamping.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@ random tests - timeout - Received signal SIGQUIT -} {+ ICL: Random tests - timeout - Received signal SIGQUIT +} No new failures caught with the new filter
A CI Bug Log filter associated to this bug has been updated: {- ICL: Random tests - timeout - Received signal SIGQUIT -} {+ ICL: Random tests - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-u3/igt@gem_pwrite@big-gtt-fbr.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-y/igt@sw_sync@sync_expired_merge.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-y/igt@gem_mmap_gtt@forked-big-copy-odd.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-u3/igt@gem_tiled_fence_blits@normal.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-u3/igt@kms_plane@pixel-format-pipe-b-planes-source-clamping.html
(In reply to CI Bug Log from comment #10) > A CI Bug Log filter associated to this bug has been updated: > > {- ICL: Random tests - timeout - Received signal SIGQUIT -} > {+ ICL: Random tests - timeout - Received signal SIGQUIT +} > > New failures caught by the filter: > > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-u3/ > igt@gem_pwrite@big-gtt-fbr.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-y/ > igt@sw_sync@sync_expired_merge.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_235/fi-icl-y/ > igt@gem_mmap_gtt@forked-big-copy-odd.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-u3/ > igt@gem_tiled_fence_blits@normal.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-u3/ > igt@kms_plane@pixel-format-pipe-b-planes-source-clamping.html These failures are timeout.
A CI Bug Log filter associated to this bug has been updated: {- ICL: Random tests - timeout - Received signal SIGQUIT -} {+ ICL: all tests - timeout - Received signal SIGQUIT +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5721/shard-iclb8/igt@gem_tiled_blits@normal.html * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4877/shard-iclb5/igt@gem_partial_pwrite_pread@write-display.html * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4877/shard-iclb7/igt@perf_pmu@semaphore-wait-idle-vecs0.html * https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4877/shard-iclb7/igt@gem_mmap_gtt@big-copy.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5724/shard-iclb7/igt@perf_pmu@busy-accuracy-50-vecs0.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5725/shard-iclb5/igt@gem_pwrite@big-cpu-fbr.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5727/shard-iclb2/igt@gem_partial_pwrite_pread@write.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5727/shard-iclb7/igt@perf_pmu@semaphore-wait-idle-vecs0.html
(In reply to Chris Wilson from comment #4) > These sporadic pauses shouldn't be happening, and I don't know why they are > happening. I think they are external to i915, but I just can't be sure... > > The slow test cases that only exist to give perf metrics we can (and will) > drop from CI (in exchange for dedicated perf metrics???) but there's a wider > issue here that seems to be affecting icl at large. There was an IRQ storm caused by a BIOS issue. Now we only see the issue on icl-y. Jani, can you check if we disabled the faulty i2c controler on fi-icl-y, since we cannot update the BIOS?
HI, NO we have not updated anything on BIOS on icl-y, should we?
We should update BIOS here too.
We did not update BIOS but we checked with Core team member that there was IRQ storm also on this machine and workaround was taken into use now (disable some BIOS setting) to get rid off this IRQ storm. Hopefully now ICL-Y also works more reliably.
So on that: On ICLY these now disabled: I2C4 and I2C5
And not seen after that. Of course it's been only three days, so too early to celebrate.
The SIGQUIT are worth writing off as commit 3970564940ba0322bcefce7fd8fd35c2b85846bf Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 7 13:11:08 2019 +0100 drm/i915: Stop spinning for DROP_IDLE (debugfs/i915_drop_caches) If the user is racing a call to debugfs/i915_drop_caches with ongoing submission from another thread/process, we may never end up idling the GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying to catch an idle moment. Just flush the work once, that should be enough to park the system under correct conditions. Outside of those we either have a driver bug or the user is racing themselves. Sadly, because the user may be provoking the unwanted situation we can't put a warn here to attract attention to a probable bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507121108.18377-4-chris@chris-wilson.co.uk Unless any remain...
CI is still reporting this error, see e.g. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/re-icl-u/igt@gem_mmap_gtt@forked-medium-copy-odd.html Should we reopen this or is it another issue?
(In reply to Francesco Balestrieri from comment #20) > CI is still reporting this error, see e.g. > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/re-icl-u/ > igt@gem_mmap_gtt@forked-medium-copy-odd.html > > Should we reopen this or is it another issue? That's a very very particular issue and not random at all. I thought we had it logged already.
OK. For completeness, there is also https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/re-icl-u/igt@gem_mmap_gtt@forked-big-copy-xy.html
And https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6179/re-icl-u/igt@gem_mmap_gtt@forked-big-copy.html
Still happening: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_367/fi-icl-u4/igt@perf_pmu@cpu-hotplug.html
(In reply to Martin Peres from comment #24) > Still happening: > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_367/fi-icl-u4/ > igt@perf_pmu@cpu-hotplug.html Nevermind, this is another bug!
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.