Bug 106679

Summary: [CI] igt@* - dmesg-warn - RPM wakelock ref not held during HW access
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: CFL, KBL i915 features: firmware/guc

Description Martin Peres 2018-05-28 07:53:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_42/fi-cfl-guc/igt@pm_rpm@debugfs-read.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_42/fi-kbl-guc/igt@drv_suspend@debugfs-reader.html

[  100.888982] ------------[ cut here ]------------
[  100.889023] RPM wakelock ref not held during HW access
[  100.889202] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/i915/intel_drv.h:1964 fwtable_read32+0x1f1/0x280 [i915]
[  100.889220] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_hda_codec snd_hwdep coretemp snd_hda_core crct10dif_pclmul snd_pcm crc32_pclmul ghash_clmulni_intel e1000e mei_me prime_numbers mei
[  100.889336] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U            4.17.0-rc5-g5e68b6c9b2d2-drmtip_42+ #1
[  100.889341] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
[  100.889463] RIP: 0010:fwtable_read32+0x1f1/0x280 [i915]
[  100.889472] RSP: 0018:ffffa155e6203e58 EFLAGS: 00010086
[  100.889483] RAX: 0000000000000000 RBX: ffffa155d4440000 RCX: 0000000000010003
[  100.889489] RDX: 0000000080010003 RSI: ffffffffa50fb619 RDI: 00000000ffffffff
[  100.889495] RBP: 000000000000c1bc R08: 0000000000000000 R09: 0000000000000001
[  100.889501] R10: ffffa155e6203e10 R11: 0000000000000000 R12: ffffa155d4441480
[  100.889507] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa155d235ce88
[  100.889514] FS:  0000000000000000(0000) GS:ffffa155e6200000(0000) knlGS:0000000000000000
[  100.889520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.889526] CR2: 00007f4a5ff4af78 CR3: 0000000117210006 CR4: 00000000003606f0
[  100.889530] Call Trace:
[  100.889538]  <IRQ>
[  100.889669]  intel_guc_to_host_event_handler_mmio+0x38/0x90 [i915]
[  100.889871]  gen8_irq_handler+0x94/0xc0 [i915]
[  100.890211]  __handle_irq_event_percpu+0x42/0x370
[  100.890230]  handle_irq_event_percpu+0x2b/0x70
[  100.890245]  handle_irq_event+0x2f/0x50
[  100.890257]  handle_edge_irq+0xe7/0x190
[  100.890267]  handle_irq+0x67/0x160
[  100.890281]  do_IRQ+0x5e/0x120
[  100.890296]  common_interrupt+0xf/0xf
[  100.890305]  </IRQ>
[  100.890316] RIP: 0010:cpuidle_enter_state+0xac/0x360
[  100.890322] RSP: 0018:ffffffffa5203e70 EFLAGS: 00000212 ORIG_RAX: ffffffffffffffde
[  100.890333] RAX: ffffffffa52167c0 RBX: 00000000004e050e RCX: 0000000000000000
[  100.890339] RDX: 0000000000000046 RSI: ffffffffa50fb619 RDI: ffffffffa50a8827
[  100.890344] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000000
[  100.890349] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa52963d8
[  100.890357] R13: ffffcc983fa00a50 R14: 0000000000000000 R15: 000000177d234b83
[  100.890396]  do_idle+0x1f3/0x250
[  100.890412]  cpu_startup_entry+0x6a/0x70
[  100.890428]  start_kernel+0x4a2/0x4c2
[  100.890446]  secondary_startup_64+0xa5/0xb0
[  100.890472] Code: ff e8 66 ae bb e3 e9 d3 fe ff ff 80 3d a4 48 18 00 00 0f 85 4f fe ff ff 48 c7 c7 a8 a1 53 c0 c6 05 90 48 18 00 01 e8 9f 0f c4 e3 <0f> 0b e9 35 fe ff ff b9 01 00 00 00 ba 01 00 00 00 89 ee 48 89 
[  100.890894] irq event stamp: 2403588
[  100.890910] hardirqs last  enabled at (2403585): [<ffffffffa477c168>] cpuidle_enter_state+0xa8/0x360
[  100.890928] hardirqs last disabled at (2403586): [<ffffffffa4a00951>] interrupt_entry+0xc1/0xf0
[  100.890948] softirqs last  enabled at (2403588): [<ffffffffa408f668>] irq_enter+0x58/0x60
[  100.890961] softirqs last disabled at (2403587): [<ffffffffa408f64d>] irq_enter+0x3d/0x60
[  100.891080] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/i915/intel_drv.h:1964 fwtable_read32+0x1f1/0x280 [i915]
[  100.891087] ---[ end trace cca107914825f295 ]---

Many other tests also trigger this warning (48 instances).
Comment 1 Martin Peres 2018-05-28 07:57:11 UTC
Bumping the priority since it is a regression.
Comment 2 Martin Peres 2018-05-28 08:01:59 UTC
Also seen on the sharded runs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4229/shard-kbl6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-move.html

(kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Test assertion failure function intel_batchbuffer_flush_on_ring, file ../lib/intel_batchbuffer.c:239:
(kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Failed assertion: (drm_intel_gem_bo_context_exec(batch->bo, ctx, used, ring)) == 0
(kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Last errno: 5, Input/output error
Subtest fbcpsr-1p-offscren-pri-indfb-draw-mmap-wc failed.
Comment 3 Daniel Vetter 2018-05-28 08:40:34 UTC
Per discussion with Joonas, guc regressions should also be deprioritized.
Comment 4 Martin Peres 2018-05-28 09:38:21 UTC
(In reply to Martin Peres from comment #2)
> Also seen on the sharded runs:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4229/shard-kbl6/
> igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-move.html
> 
> (kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Test assertion
> failure function intel_batchbuffer_flush_on_ring, file
> ../lib/intel_batchbuffer.c:239:
> (kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Failed
> assertion: (drm_intel_gem_bo_context_exec(batch->bo, ctx, used, ring)) == 0
> (kms_frontbuffer_tracking:1648) intel_batchbuffer-CRITICAL: Last errno: 5,
> Input/output error
> Subtest fbcpsr-1p-offscren-pri-indfb-draw-mmap-wc failed.

This was meant for https://bugs.freedesktop.org/show_bug.cgi?id=106067
Comment 5 Chris Wilson 2018-07-28 16:24:10 UTC
commit e5cae659597811f8bacc4abc70135dffb48711d5
Author: Michał Winiarski <michal.winiarski@intel.com>
Date:   Sat Jul 14 18:37:03 2018 +0100

    drm/i915/guc: Disable rpm wakeref asserts in GuC irq handler
    
    We're seeing "RPM wakelock ref not held during HW access" warning
    otherwise. Since IRQs are synced for runtime suspend we can just disable
    the wakeref asserts.
    
    Reported-by: Marta Löfstedt <marta.lofstedt@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105710
    Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180714173703.7894-1-ch
ris@chris-wilson.co.uk
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 6 Francesco Balestrieri 2018-08-07 08:11:33 UTC
Last seen 3 weeks ago, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.