https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6557/shard-iclb1/igt@perf_pmu@busy-hang-bcs0.html Starting subtest: busy-hang-bcs0 (perf_pmu:1387) CRITICAL: Test assertion failure function single, file ../tests/perf_pmu.c:306: (perf_pmu:1387) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(0) && (double)(val) >= (1.0 - (tolerance)) * (double)(0) (perf_pmu:1387) CRITICAL: 'val' != '0' (8780.000000 not within +5.000000%/-5.000000% tolerance of 0.000000) Subtest busy-hang-bcs0 failed.
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@perf_pmu@busy-hang-bcs0 - fail - Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(0) && (double)(val) >= (1.0 - (tolerance)) * (double)(0) - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6557/shard-iclb1/igt@perf_pmu@busy-hang-bcs0.html
@Don, Can you assess this bug and set appropriate Priority/severity?
The IGT 'tests/perf_pmu.c' busy-hang subtest attempts to hang the gpu and then issues a igt_force_gpu_reset(). In this case, the assert is triggered as the gpu is not coming out of reset within the expected tolerance. I'll try and figure out the priority/severity next.
This seems to have failed only one time for CI_DRM_6557_full (4 days, 21 hours old) (http://gfx-ci.fi.intel.com/cibuglog-ng/runcfg/23470) and started passing with CI_DRM_6563_full up to CI_DRM_6586_full(6 hours, 45 minutes old). As this bug appears to be a one of I think we can set this as a low priority and monitor. http://gfx-ci.fi.intel.com/cibuglog-ng/results/all?query=test_name+%3D+%27igt%40perf_pmu%40busy-hang-bcs0%27+AND+machine_name+ICONTAINS+%27shard-iclb1%27
If my guess is correct this is due to not idling correctly after hang and so sampling the idle-barrier. So commit c7302f204490f3eb4ef839bec228315bcd3ba43f (drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 8 21:27:58 2019 +0100 drm/i915: Defer final intel_wakeref_put to process context As we need to acquire a mutex to serialise the final intel_wakeref_put, we need to ensure that we are in process context at that time. However, we want to allow operation on the intel_wakeref from inside timer and other hardirq context, which means that need to defer that final put to a workqueue. Inside the final wakeref puts, we are safe to operate in any context, as we are simply marking up the HW and state tracking for the potential sleep. It's only the serialisation with the potential sleeping getting that requires careful wait avoidance. This allows us to retain the immediate processing as before (we only need to sleep over the same races as the current mutex_lock). v2: Add a selftest to ensure we exercise the code while lockdep watches. v3: That test was extremely loud and complained about many things! v4: Not a whale! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295 References: https://bugs.freedesktop.org/show_bug.cgi?id=111245 References: https://bugs.freedesktop.org/show_bug.cgi?id=111256 Fixes: 18398904ca9e ("drm/i915: Only recover active engines") Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk should help.
Hey Chris, I think that commit as well as your 'v3-drm-i915-pmu-Use-GT-parked-for-estimating-RC6-while-asleep.patch' will also fix https://bugs.freedesktop.org/show_bug.cgi?id=110877 as you indicated in our email. I finally started understanding the 'gt-parked' code and the 'perf_pmu --run-subtest rc6' are passing just fine with those changes. I'm still trying to understand this 'defer-final' change set. Thanks! don
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.