Starting subtest: busy-hang-bcs0
(perf_pmu:1387) CRITICAL: Test assertion failure function single, file ../tests/perf_pmu.c:306:
(perf_pmu:1387) CRITICAL: Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(0) && (double)(val) >= (1.0 - (tolerance)) * (double)(0)
(perf_pmu:1387) CRITICAL: 'val' != '0' (8780.000000 not within +5.000000%/-5.000000% tolerance of 0.000000)
Subtest busy-hang-bcs0 failed.
The CI Bug Log issue associated to this bug has been updated.
### New filters associated
* ICL: igt@perf_pmu@busy-hang-bcs0 - fail - Failed assertion: (double)(val) <= (1.0 + (tolerance)) * (double)(0) && (double)(val) >= (1.0 - (tolerance)) * (double)(0)
@Don, Can you assess this bug and set appropriate Priority/severity?
The IGT 'tests/perf_pmu.c' busy-hang subtest attempts to hang the gpu and then issues a igt_force_gpu_reset(). In this case, the assert is triggered as the gpu is not coming out of reset within the expected tolerance.
I'll try and figure out the priority/severity next.
This seems to have failed only one time for CI_DRM_6557_full (4 days, 21 hours old) (http://gfx-ci.fi.intel.com/cibuglog-ng/runcfg/23470) and started passing with CI_DRM_6563_full up to CI_DRM_6586_full(6 hours, 45 minutes old).
As this bug appears to be a one of I think we can set this as a low priority and monitor.
If my guess is correct this is due to not idling correctly after hang and so sampling the idle-barrier. So
commit c7302f204490f3eb4ef839bec228315bcd3ba43f (drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <email@example.com>
Date: Thu Aug 8 21:27:58 2019 +0100
drm/i915: Defer final intel_wakeref_put to process context
As we need to acquire a mutex to serialise the final
intel_wakeref_put, we need to ensure that we are in process context at
that time. However, we want to allow operation on the intel_wakeref from
inside timer and other hardirq context, which means that need to defer
that final put to a workqueue.
Inside the final wakeref puts, we are safe to operate in any context, as
we are simply marking up the HW and state tracking for the potential
sleep. It's only the serialisation with the potential sleeping getting
that requires careful wait avoidance. This allows us to retain the
immediate processing as before (we only need to sleep over the same
races as the current mutex_lock).
v2: Add a selftest to ensure we exercise the code while lockdep watches.
v3: That test was extremely loud and complained about many things!
v4: Not a whale!
Fixes: 18398904ca9e ("drm/i915: Only recover active engines")
Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
Signed-off-by: Chris Wilson <firstname.lastname@example.org>
Cc: Tvrtko Ursulin <email@example.com>
Cc: Mika Kuoppala <firstname.lastname@example.org>
Reviewed-by: Mika Kuoppala <email@example.com>
I think that commit as well as your 'v3-drm-i915-pmu-Use-GT-parked-for-estimating-RC6-while-asleep.patch' will also fix https://bugs.freedesktop.org/show_bug.cgi?id=110877 as you indicated in our email. I finally started understanding the 'gt-parked' code and the 'perf_pmu --run-subtest rc6' are passing just fine with those changes. I'm still trying to understand this 'defer-final' change set.