Summary: | [CI][RESUME] igt@gem_vm_create@isolation - dmesg-warn - GEM_BUG_ON(!intel_context_is_pinned(ce)) | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | not set | ||
Priority: | high | CC: | intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | TGL | i915 features: | GEM/Other |
Description
Martin Peres
2019-09-11 05:35:02 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * TGL: igt@gem_vm_create@isolation - dmesg-warn - GEM_BUG_ON(!intel_context_is_pinned(ce)) - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_vm_create@isolation.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_vm_create@isolation.html * TGL: igt@runner@aborted - fail - Previous test: gem_vm_create (isolation) (No new failures associated) Hmm, it looks like we dropped a pin on the rcs0->kernel_context. Bad, very bad. Test is passing @ https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_371/fi-tgl-u/igt%40gem_vm_create%40isolation.html (In reply to Sudeep Dutt from comment #3) > Test is passing @ > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_371/fi-tgl-u/ > igt%40gem_vm_create%40isolation.html It's not a causal link, the bug is unrelated to the test. It just happened to show up as we cleaned up the tgl hangs. My presumption is that we hit an error path and tried to clean up a context twice (the most fragile link there is the active pin). commit ae911b23d2f06c5d0a3e32768bedea857cadd269 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 23 12:00:53 2019 +0100 drm/i915/execlists: Relax assertion for a pinned context image on reset A gpu hang can occur at any time, given a sufficiently angry gpu. An example is when it forgets to perform a context-switch at the end of a request, leaving us with a hanging GPU on a completed request. Here, we may retire the request, only leaving its context alive via the active barrier. When we reset the GPU on a completed request, we do not modify its context image (just updating the ring state) and can safely defer the assertion that we have the image pinned and ready to modify. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111639 Fixes: dffa8feb3084 ("drm/i915/perf: Assert locking for i915_init_oa_perf_state()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-1-chris@chris-wilson.co.uk (In reply to Chris Wilson from comment #5) > commit ae911b23d2f06c5d0a3e32768bedea857cadd269 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Sep 23 12:00:53 2019 +0100 > > drm/i915/execlists: Relax assertion for a pinned context image on reset > > A gpu hang can occur at any time, given a sufficiently angry gpu. An > example is when it forgets to perform a context-switch at the end of a > request, leaving us with a hanging GPU on a completed request. Here, we > may retire the request, only leaving its context alive via the active > barrier. When we reset the GPU on a completed request, we do not modify > its context image (just updating the ring state) and can safely defer > the assertion that we have the image pinned and ready to modify. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111639 > Fixes: dffa8feb3084 ("drm/i915/perf: Assert locking for > i915_init_oa_perf_state()") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Link: > https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-1- > chris@chris-wilson.co.uk Thanks, this issue was seen twice on fi-tgl-u, two runs apart. Now it has not been seen in 23 runs. Looks good! The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.