Summary: | [CI][SHARDS] igt@* - incomplete - NMI backtrace for cpu \d skipped: idling at acpi_processor_ffh_cstate_enter | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Andi <andi.shyti> |
Status: | RESOLVED MOVED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | low | CC: | intel-gfx-bugs |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | G45, ICL | i915 features: |
Description
Martin Peres
2019-04-05 06:41:23 UTC
This seems to be related to the known BIOS issue. Let's monitor that! The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: all tests - incomplete - NMI backtrace for cpu \d skipped: idling at acpi_processor_ffh_cstate_enter - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4926/shard-iclb5/igt@kms_pipe_crc_basic@hang-read-crc-pipe-d.html Let's monitor the failure rate now that the following commit has landed: https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for-CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d CL HACK: Disable ACPI idle drivertopic/core-for-CI There were few system hung observed while running i915_pm_rpm igt test. FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840 Root cause is believed to due to page fault in ACPI idle driver. (FDO comment 18). It has been suggested by Daniel Vetter to disable ACPI idle driver for Core-for-CI, only for ICL. This hacky patch is only for ICL processor and for Core-for-CI branch. v2: Fixed compilation errors raised by lkp. commit message improvement. Cc: martin.peres@intel.com Cc: daniel.vetter@intel.com Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send-email-anshuman.gupta@intel.com Not seen after change in topic for CI. Monitor still.. (In reply to Martin Peres from comment #3) > Let's monitor the failure rate now that the following commit has landed: > > https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for- > CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d > > CL HACK: Disable ACPI idle drivertopic/core-for-CI > There were few system hung observed while running i915_pm_rpm igt test. > FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840 > Root cause is believed to due to page fault in ACPI idle driver. > (FDO comment 18). > It has been suggested by Daniel Vetter to disable ACPI idle > driver for Core-for-CI, only for ICL. > > This hacky patch is only for ICL processor and for Core-for-CI branch. > > v2: Fixed compilation errors raised by lkp. > commit message improvement. > > Cc: martin.peres@intel.com > Cc: daniel.vetter@intel.com > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Link: > https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send- > email-anshuman.gupta@intel.com Ugh. I think this means we've effectively disabled all C-states in CI. Last I looked intel_idle icl support wasn't upstream yet. (In reply to Ville Syrjala from comment #5) > (In reply to Martin Peres from comment #3) > > Let's monitor the failure rate now that the following commit has landed: > > > > https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for- > > CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d > > > > CL HACK: Disable ACPI idle drivertopic/core-for-CI > > There were few system hung observed while running i915_pm_rpm igt test. > > FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840 > > Root cause is believed to due to page fault in ACPI idle driver. > > (FDO comment 18). > > It has been suggested by Daniel Vetter to disable ACPI idle > > driver for Core-for-CI, only for ICL. > > > > This hacky patch is only for ICL processor and for Core-for-CI branch. > > > > v2: Fixed compilation errors raised by lkp. > > commit message improvement. > > > > Cc: martin.peres@intel.com > > Cc: daniel.vetter@intel.com > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Link: > > https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send- > > email-anshuman.gupta@intel.com > > Ugh. I think this means we've effectively disabled all C-states in CI. Last > I looked intel_idle icl support wasn't upstream yet. Yeah, no C-states visible on icl-u2. So low power watermarks and whatnot not getting tested at all currently. (In reply to Ville Syrjala from comment #6) > (In reply to Ville Syrjala from comment #5) > > (In reply to Martin Peres from comment #3) > > > Let's monitor the failure rate now that the following commit has landed: > > > > > > https://cgit.freedesktop.org/drm/drm-intel/commit/?h=topic/core-for- > > > CI&id=b573fba52f339dc4fadef7282af4a9413fd6173d > > > > > > CL HACK: Disable ACPI idle drivertopic/core-for-CI > > > There were few system hung observed while running i915_pm_rpm igt test. > > > FDO https://bugs.freedesktop.org/show_bug.cgi?id=108840 > > > Root cause is believed to due to page fault in ACPI idle driver. > > > (FDO comment 18). > > > It has been suggested by Daniel Vetter to disable ACPI idle > > > driver for Core-for-CI, only for ICL. > > > > > > This hacky patch is only for ICL processor and for Core-for-CI branch. > > > > > > v2: Fixed compilation errors raised by lkp. > > > commit message improvement. > > > > > > Cc: martin.peres@intel.com > > > Cc: daniel.vetter@intel.com > > > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Link: > > > https://patchwork.freedesktop.org/patch/msgid/1554791361-27684-1-git-send- > > > email-anshuman.gupta@intel.com > > > > Ugh. I think this means we've effectively disabled all C-states in CI. Last > > I looked intel_idle icl support wasn't upstream yet. > > Yeah, no C-states visible on icl-u2. So low power watermarks and whatnot not > getting tested at all currently. Changing the priority to highest because of the testing gap that will potentially lead to flickering screens. Should we actually consider revert of ICL HACK: Disable ACPI idle drivertopic/core-for-CI? According to Martin no. So now discussing with Core team to look at fixing ACPI idle. Patch has been taken away, waiting devs to confirm next steps here. This issue has been seen only once also. I propose to lower priority on this. This issue seems due to kernel task kcompactd0 is blocked for more then 61 seconds. There is cicular lock dpepedency <4>[ 938.494641] ====================================================== <4>[ 938.494641] WARNING: possible circular locking dependency detected <4>[ 938.494641] 5.1.0-rc3-CI-CI_DRM_5866+ #1 Tainted: G U <4>[ 938.494642] ------------------------------------------------------ <4>[ 938.494642] kworker/3:11/1689 is trying to acquire lock: <4>[ 938.494642] 000000008b3d0788 ((console_sem).lock){-.-.}, at: down_trylock+0xa/0x30 <4>[ 938.494643] <4>[ 938.494644] but task is already holding lock: <4>[ 938.494644] 000000002522a786 (&rq->lock){-.-.}, at: try_to_wake_up+0x1e2/0x5f0 <4>[ 938.494645] <4>[ 938.494645] which lock already depends on the new lock. <4>[ 938.494646] <4>[ 938.494646] <4>[ 938.494646] the existing dependency chain (in reverse order) is: And this test hang-read-crc-pipe checks crc after doing a hang and gpu reset (not sure what is it). Can some gem expert provide their comments whether the hang and gpu reset done by this test is causing to block kcompactd0 for more then 61 seconds. Update: For now, this issue has occurred twice CI_DRM_6145_full (1 week, 1 day old), current CI_DRM run is 6189. IGT_4926_full (2 months old), current IGT run is 5037. Francesco and team have not been able to reproduce issue. Dropping priority. And not seen now 1,5 weeks either. Has seen only 2 times altogether. Same status, seen twice and not for the last 3 weeks. I did not manage to reproduce the issue. This bug has a reproduction rate of once every (a bit less than) 2 months... shall we lower the priority? Agreed, setting to medium and let's see when it happens again. The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ELK: igt@perf_pmu@cpu-hotplug - dmesg-warn - INFO: task kworker/u8:20:1058 blocked for more than 61 seconds - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_376/fi-elk-e7500/igt@perf_pmu@cpu-hotplug.html Last happened two months ago. Given the reproduction rate of once every couple of months, we can't say this is gone, but at least setting priority to low. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/256. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.