Bug 107588

Summary: [CI][BAT] igt@gem_exec_suspend@basic-S3 RPM wakelock ref not held during HW access RIP: 0010:gen5_read32
Product: DRI Reporter: Tomi Sarvela <tomi.p.sarvela>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: ILK i915 features: power/runtime PM

Description Tomi Sarvela 2018-08-16 07:18:44 UTC
From CI_DRM_4670 onwards:

[  100.667598] ------------[ cut here ]------------
[  100.667631] RPM wakelock ref not held during HW access
[  100.667727] WARNING: CPU: 2 PID: 316 at drivers/gpu/drm/i915/intel_drv.h:1990 gen5_read32+0x1a5/0x1e0 [i915]
[  100.667729] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_ips snd_hda_intel i915 snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich mei_me mei e1000e prime_numbers
[  100.667761] CPU: 2 PID: 316 Comm: ips-monitor Tainted: G     U            4.18.0-CI-CI_DRM_4676+ #1
[  100.667763] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013
[  100.667805] RIP: 0010:gen5_read32+0x1a5/0x1e0 [i915]
[  100.667806] Code: bc ee e0 0f 0b e9 98 fe ff ff 80 3d 8e 14 19 00 00 0f 85 99 fe ff ff 48 c7 c7 80 6e 29 a0 c6 05 7a 14 19 00 01 e8 1b bc ee e0 <0f> 0b e9 7f fe ff ff e8 6f dc f6 e0 85 c0 75 92 48 c7 c2 b0 6e 29 
[  100.667888] RSP: 0000:ffffc9000036be40 EFLAGS: 00010086
[  100.667891] RAX: 0000000000000000 RBX: ffff880124a80000 RCX: 0000000000000002
[  100.667893] RDX: 0000000080000002 RSI: ffffffff82128df3 RDI: 00000000ffffffff
[  100.667895] RBP: 00000000000112e4 R08: 0000000000000000 R09: 0000000000000001
[  100.667896] R10: ffffc9000036be20 R11: 0000000000000000 R12: 0000000000000001
[  100.667898] R13: ffff880124064260 R14: 0000000000000032 R15: ffff88011fe3cc88
[  100.667900] FS:  0000000000000000(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000
[  100.667902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.667904] CR2: 0000000000000000 CR3: 0000000002210000 CR4: 00000000000006e0
[  100.667905] Call Trace:
[  100.667941]  __i915_chipset_val+0x54/0x1a0 [i915]
[  100.667975]  i915_read_mch_val+0x24/0x50 [i915]
[  100.667980]  ? ips_monitor+0x3da/0x5f0 [intel_ips]
[  100.667983]  ips_monitor+0x4f5/0x5f0 [intel_ips]
[  100.667990]  ? ips_irq_handler+0x170/0x170 [intel_ips]
[  100.667995]  kthread+0x119/0x130
[  100.667998]  ? kthread_flush_work_fn+0x10/0x10
[  100.668005]  ret_from_fork+0x3a/0x50
[  100.668015] irq event stamp: 2572
[  100.668018] hardirqs last  enabled at (2571): [<ffffffff81941064>] _raw_spin_unlock_irq+0x24/0x50
[  100.668021] hardirqs last disabled at (2572): [<ffffffff81940eea>] _raw_spin_lock_irq+0xa/0x40
[  100.668024] softirqs last  enabled at (826): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505
[  100.668028] softirqs last disabled at (821): [<ffffffff8108c8a5>] irq_exit+0xa5/0xc0
[  100.668069] WARNING: CPU: 2 PID: 316 at drivers/gpu/drm/i915/intel_drv.h:1990 gen5_read32+0x1a5/0x1e0 [i915]
[  100.668071] ---[ end trace 4f9109bc75ca0ff0 ]---

https://intel-gfx-ci.01.org/tree/drm-tip/fi-ilk-m540.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4676/fi-ilk-m540/igt@gem_exec_suspend@basic-s3.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4670/git-log-oneline.log
Comment 1 Chris Wilson 2018-08-16 07:38:40 UTC
Revealed by

commit 08ea70a417baa1c0e5faa580070cd2ffd04e6285
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Aug 12 23:36:31 2018 +0100

    drm/i915: Disable runtime-pm using lowlevel functions if !HAS_RC6

it's a very old bug (since we introduced runtime-pm tracking many, many years ago) that I've sent fixes for. Will send them again when we have the current wakeref bugs under wraps.
Comment 2 Martin Peres 2018-09-03 13:15:14 UTC
Also seen on igt@debugfs_test@read_all_entries_display_off.

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_99/fi-ilk-650/igt@debugfs_test@read_all_entries_display_off.html

[  227.282219] ------------[ cut here ]------------
[  227.282283] RPM wakelock ref not held during HW access
[  227.282375] WARNING: CPU: 0 PID: 1509 at drivers/gpu/drm/i915/intel_drv.h:2009 gen5_read32+0x16b/0x1a0 [i915]
[  227.282378] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me e1000e mei prime_numbers
[  227.282453] CPU: 0 PID: 1509 Comm: debugfs_test Tainted: G     U  W         4.19.0-rc1-gb4059d5fdbfd-drmtip_99+ #1
[  227.282458] Hardware name: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 786H1 v01.13 07/14/2011
[  227.282522] RIP: 0010:gen5_read32+0x16b/0x1a0 [i915]
[  227.282527] Code: 3e cd f0 0f 0b e9 d2 fe ff ff 80 3d f8 1d 19 00 00 0f 85 d3 fe ff ff 48 c7 c7 58 20 4b c0 c6 05 e4 1d 19 00 01 e8 65 3e cd f0 <0f> 0b e9 b9 fe ff ff e8 79 57 d5 f0 85 c0 75 92 48 c7 c2 88 20 4b
[  227.282531] RSP: 0018:ffffb25480d67d38 EFLAGS: 00010282
[  227.282538] RAX: 0000000000000000 RBX: ffff8de1f7e00000 RCX: 0000000000000001
[  227.282542] RDX: 0000000080000001 RSI: ffffffffb2080d20 RDI: 00000000ffffffff
[  227.282546] RBP: 0000000000011020 R08: 00000000b653baf4 R09: 0000000000000000
[  227.282549] R10: ffffb25480d67d80 R11: 0000000000000000 R12: 0000000000000001
[  227.282553] R13: ffff8de1f7e00068 R14: 0000000000000001 R15: ffff8de20e137488
[  227.282558] FS:  00007f6a919b6980(0000) GS:ffff8de213a00000(0000) knlGS:0000000000000000
[  227.282562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  227.282566] CR2: 000055d4953490e8 CR3: 00000001a1df4004 CR4: 00000000000206f0
[  227.282569] Call Trace:
[  227.282633]  i915_mch_val+0x1b/0x60 [i915]
[  227.282684]  i915_emon_status+0x45/0xd0 [i915]
[  227.282697]  seq_read+0xdb/0x3c0
[  227.282707]  ? lockdep_hardirqs_off+0x94/0xd0
[  227.282714]  ? __slab_free+0x253/0x4f0
[  227.282726]  full_proxy_read+0x51/0x80
[  227.282735]  __vfs_read+0x31/0x170
[  227.282748]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
[  227.282755]  ? do_sys_open+0x13b/0x240
[  227.282760]  ? rcu_read_lock_sched_held+0x6f/0x80
[  227.282767]  vfs_read+0x9e/0x140
[  227.282775]  ksys_read+0x50/0xc0
[  227.282787]  do_syscall_64+0x55/0x190
[  227.282795]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  227.282800] RIP: 0033:0x7f6a9134b34e
[  227.282805] Code: 00 00 00 00 48 8b 15 71 8c 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 8b 05 ba d0 20 00 85 c0 75 16 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5a f3 c3 0f 1f 84 00 00 00 00 00 41 54 55 49
[  227.282808] RSP: 002b:00007ffd37113318 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  227.282816] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6a9134b34e
[  227.282820] RDX: 0000000000000200 RSI: 00007ffd37113360 RDI: 0000000000000006
[  227.282825] RBP: 00007ffd37113570 R08: 0000000000000000 R09: 0000000000000020
[  227.282828] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d49510a2c0
[  227.282832] R13: 00007ffd37113a60 R14: 0000000000000000 R15: 0000000000000000
[  227.282852] irq event stamp: 54538
[  227.282859] hardirqs last  enabled at (54537): [<ffffffffb10fbb46>] console_unlock+0x1b6/0x5e0
[  227.282864] hardirqs last disabled at (54538): [<ffffffffb1001910>] trace_hardirqs_off_thunk+0x1a/0x1c
[  227.282870] softirqs last  enabled at (54368): [<ffffffffb1c0031d>] __do_softirq+0x31d/0x483
[  227.282875] softirqs last disabled at (54349): [<ffffffffb1090099>] irq_exit+0xa9/0xc0
[  227.282936] WARNING: CPU: 0 PID: 1509 at drivers/gpu/drm/i915/intel_drv.h:2009 gen5_read32+0x16b/0x1a0 [i915]
[  227.282940] ---[ end trace 5e2fef4de3145efe ]---
Comment 3 Martin Peres 2018-11-15 13:00:27 UTC
Increasing the priority of the bug as it blocks ILK testing on BAT as IGT_4715:

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4715/fi-ilk-m540/igt@runner@aborted.html
	
Aborting.
Previous test: gem_exec_suspend (basic-s3)
Next test: gem_exec_suspend (basic-s4-devices)

Kernel tainted (0x240 -- 200)
Comment 4 Chris Wilson 2018-12-07 10:04:25 UTC
I'm sure this patch is linked to elsewhere in bugs.fdo...

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=prescheduler&id=13ab88965bb597e1b228cc50fad76da09a5b8da0
Comment 5 CI Bug Log 2019-01-14 16:40:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ILK: random tests - dmesg-warn - RPM wakelock ref not held during HW access -}
{+ ILK: suspend tests + random ones - dmesg-warn - RPM wakelock ref not held during HW access +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4768/fi-ilk-m540/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
Comment 6 Chris Wilson 2019-01-14 18:28:14 UTC
commit 4a8ab5ea0cde753b03bfefe4c98a8c4c61f46550
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 14 14:21:29 2019 +0000

    drm/i915: Mark up Ironlake ips with rpm wakerefs
    
    Currently Ironlake operates under the assumption that rpm awake (and its
    error checking is disabled). As such, we have missed a few places where we
    access registers without taking the rpm wakeref and thus trigger
    warnings. intel_ips being one culprit.
    
    As this involved adding a potentially sleeping rpm_get, we have to
    rearrange the spinlocks slightly and so switch to acquiring a device-ref
    under the spinlock rather than hold the spinlock for the whole
    operation. To be consistent, we make the change in pattern common to the
    intel_ips interface even though this adds a few more atomic operations
    than necessary in a few cases.
    
    v2: Sagar noted the mb around setting mch_dev were overkill as we only
    need ordering there, and that i915_emon_status was still using
    struct_mutex for no reason, but lacked rpm.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Jani Nikula <jani.nikula@intel.com>
    Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-21-chris@chris-wilson.co.uk
Comment 7 Martin Peres 2019-01-16 09:11:37 UTC
(In reply to Chris Wilson from comment #6)
> commit 4a8ab5ea0cde753b03bfefe4c98a8c4c61f46550
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jan 14 14:21:29 2019 +0000
> 
>     drm/i915: Mark up Ironlake ips with rpm wakerefs
>     
>     Currently Ironlake operates under the assumption that rpm awake (and its
>     error checking is disabled). As such, we have missed a few places where
> we
>     access registers without taking the rpm wakeref and thus trigger
>     warnings. intel_ips being one culprit.
>     
>     As this involved adding a potentially sleeping rpm_get, we have to
>     rearrange the spinlocks slightly and so switch to acquiring a device-ref
>     under the spinlock rather than hold the spinlock for the whole
>     operation. To be consistent, we make the change in pattern common to the
>     intel_ips interface even though this adds a few more atomic operations
>     than necessary in a few cases.
>     
>     v2: Sagar noted the mb around setting mch_dev were overkill as we only
>     need ordering there, and that i915_emon_status was still using
>     struct_mutex for no reason, but lacked rpm.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Jani Nikula <jani.nikula@intel.com>
>     Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-21-
> chris@chris-wilson.co.uk

Thanks, it did the trick!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.