https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_224/fi-icl-u3/igt@gem_mmap_gtt@hang.html Starting subtest: hang Subtest hang failed. **** DEBUG **** (gem_mmap_gtt:2672) drmtest-DEBUG: Test requirement passed: is_i915_device(fd) && has_known_intel_chipset(fd) (gem_mmap_gtt:2672) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_mmap_gtt:2672) ioctl_wrappers-DEBUG: Test requirement passed: dir >= 0 (gem_mmap_gtt:2672) ioctl_wrappers-DEBUG: Test requirement passed: err == 0 (gem_mmap_gtt:2672) i915/gem_context-DEBUG: Test requirement passed: has_ban_period || has_bannable (gem_mmap_gtt:2672) igt_gt-DEBUG: Test requirement passed: has_gpu_reset(fd) (gem_mmap_gtt:2672) DEBUG: Test requirement passed: igt_sysfs_set_parameter(fd, "reset", "1") (gem_mmap_gtt:2672) igt_debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_mmap_gtt:2672) INFO: 1099 resets (gem_mmap_gtt:2672) igt_core-INFO: Timed out waiting for children **** END **** Subtest hang: FAIL (7.417s)
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@gem_mmap_gtt@hang - fail - Timed out waiting for children - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_224/fi-icl-u3/igt@gem_mmap_gtt@hang.html
https://patchwork.freedesktop.org/patch/286884/
Oh you reported the icl bogosity and not the genuine bug. Forget about icl, pnv/blb is broken.
I am ignoring the icl as that is not interesting (just another clock drift)... commit 43a8f684b6d1e16c6ecf918332f9b35686bf7edd (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Feb 21 10:29:19 2019 +0000 drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault() Annoyingly, struct_mutex was not entirely eliminated from the reset pathway; for reasons of its own, intel_display_resume() requires struct_mutex to prepare the planes it already captured. To avoid the immediate problem of a deadlock between the struct_mutex and the reset srcu, we have to acquire the reset_lock before struct_mutex in i915_gem_fault(). Now any wait underneath struct_mutex will result us in having to forcibly reset all inflight rendering, less than ideal, but better than a deadlock (and will do for the short term). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190221102924.13442-1-chris@chris-wilson.co.uk Was the bug that should have been reported!
(In reply to Chris Wilson from comment #4) > I am ignoring the icl as that is not interesting (just another clock > drift)... > > commit 43a8f684b6d1e16c6ecf918332f9b35686bf7edd (HEAD -> > drm-intel-next-queued, drm-intel/drm-intel-next-queued) > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Feb 21 10:29:19 2019 +0000 > > drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault() > > Annoyingly, struct_mutex was not entirely eliminated from the reset > pathway; for reasons of its own, intel_display_resume() requires > struct_mutex to prepare the planes it already captured. To avoid the > immediate problem of a deadlock between the struct_mutex and the reset > srcu, we have to acquire the reset_lock before struct_mutex in > i915_gem_fault(). Now any wait underneath struct_mutex will result us in > having to forcibly reset all inflight rendering, less than ideal, but > better than a deadlock (and will do for the short term). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Link: > https://patchwork.freedesktop.org/patch/msgid/20190221102924.13442-1- > chris@chris-wilson.co.uk > > Was the bug that should have been reported! Thanks for fixing this! However, since this bug was ICL-specific I'm re-opening it, and we know we need to investigate this timer wonkyness...
Also visible in shards! Bumping the priority!
This commit: commit 79ffac8599c4d8aa84d313920d3d86d7361c252b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 24 21:07:17 2019 +0100 drm/i915: Invert the GEM wakeref hierarchy should make the issue disappear from the test results. We still don't know what causes the sudden slowdown of ICL (also seen elsewhere). Let's continue monitoring this before resolving, but in any case at least for this particular case it should be sporadic and transient enough to have basically no user impact.
Long time no see. Let's pretend we did manage to remove a delay with the new and improved reset flush.
This issue is happening very regularly https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_377/fi-icl-u2/igt@gem_mmap_gtt@hang.html
(In reply to Chris Wilson from comment #8) > Long time no see. Let's pretend we did manage to remove a delay with the new > and improved reset flush. (In reply to Lakshmi from comment #9) > This issue is happening very regularly > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_377/fi-icl-u2/ > igt@gem_mmap_gtt@hang.html There is a similar failure in PNV as well https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_374/fi-pnv-d510/igt@gem_mmap_gtt@hang.html Are these failures are (are same?) AND different than the original bug?
Optimistically, commit 3499c5eb17054e2abd88023fe962768140d24302 (upstream/master, origin/master, origin/HEAD) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 24 13:15:03 2019 +0100 i915/gem_map_gtt: Escape from slow forked GTT access Beware the slithy t'oves. Forked GTT access on icl is notoriously slow, so rather than spend an eternity checking the whole object, check for a completion event after handling the pagefault. It's is the race of the pagefault vs reset that we care most about, and we expect the bug to result in the pagefault being blocked indefinitely, so checking afterwards does not reduce coverage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@gem_mmap_gtt@hang - fail - Timed out waiting for children -} {+ ICL: igt@gem_mmap_gtt@hang - fail - Timed out waiting for children +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_374/fi-pnv-d510/igt@gem_mmap_gtt@hang.html
Last seen drmtip_377 (1 month, 4 weeks old), not seen in the last 30 runs, so closing and archiving this
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.