Bug 106948

Summary: [CI] igt@* - dmesg-warn/fail - WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (0x6f900) })), true) & (1<<31))
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Dhinakaran Pandiyan <dhinakaran.pandiyan>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs, tomi.p.sarvela
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: CFL, CNL, KBL, SKL i915 features: display/PSR

Description Martin Peres 2018-06-18 07:24:55 UTC
Starting with drmtip_64, the following WARNs happen in many CI tests:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-cnl-psr/igt@kms_vblank@pipe-a-wait-idle.html

[  308.385231] ------------[ cut here ]------------
[  308.385280] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (0x6f900) })), true) & (1<<31))
[  308.385334] WARNING: CPU: 0 PID: 36 at drivers/gpu/drm/i915/intel_psr.c:580 intel_psr_activate+0xd3/0x100 [i915]
[  308.385337] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec e1000e snd_hwdep snd_hda_core snd_pcm mei_me mei prime_numbers [last unloaded: i915]
[  308.385369] CPU: 0 PID: 36 Comm: kworker/0:1 Tainted: G     U  W         4.17.0-rc7-g02d8db1a894b-drmtip_64+ #1
[  308.385370] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X114.B11.1712190231 12/19/2017
[  308.385400] Workqueue: events intel_psr_work [i915]
[  308.385429] RIP: 0010:intel_psr_activate+0xd3/0x100 [i915]
[  308.385431] RSP: 0018:ffffa731001cfe20 EFLAGS: 00010282
[  308.385434] RAX: 0000000000000000 RBX: ffff8da3da820000 RCX: 0000000000000001
[  308.385435] RDX: 0000000080000001 RSI: ffffffff930fc071 RDI: 00000000ffffffff
[  308.385437] RBP: ffff8da3d8c1a260 R08: 00000000be9495f1 R09: 0000000000000000
[  308.385438] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8da3da820000
[  308.385440] R13: 000000000006f940 R14: 00000000f0000000 R15: 0000000000000001
[  308.385442] FS:  0000000000000000(0000) GS:ffff8da3f1000000(0000) knlGS:0000000000000000
[  308.385443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  308.385445] CR2: 00007fb03d81f2f8 CR3: 0000000120210002 CR4: 0000000000760ef0
[  308.385446] PKRU: 55555554
[  308.385448] Call Trace:
[  308.385477]  intel_psr_work+0xcf/0xe0 [i915]
[  308.385482]  process_one_work+0x229/0x6a0
[  308.385488]  worker_thread+0x1f9/0x380
[  308.385492]  ? process_one_work+0x6a0/0x6a0
[  308.385495]  kthread+0x119/0x130
[  308.385498]  ? kthread_flush_work_fn+0x10/0x10
[  308.385502]  ret_from_fork+0x3a/0x50
[  308.385510] Code: a4 48 c7 c6 f8 1e 91 c0 48 c7 c7 3c b5 8e c0 e8 b4 fc 82 d1 0f 0b eb 8d 48 c7 c6 d0 1d 91 c0 48 c7 c7 3c b5 8e c0 e8 9d fc 82 d1 <0f> 0b e9 5c ff ff ff 48 c7 c6 3f b5 8e c0 48 c7 c7 3c b5 8e c0 
[  308.385589] irq event stamp: 1449644
[  308.385592] hardirqs last  enabled at (1449643): [<ffffffff920fcb96>] console_unlock+0x426/0x640
[  308.385595] hardirqs last disabled at (1449644): [<ffffffff92a0111c>] error_entry+0x7c/0x100
[  308.385597] softirqs last  enabled at (1448906): [<ffffffff92c0032b>] __do_softirq+0x32b/0x4e1
[  308.385600] softirqs last disabled at (1448899): [<ffffffff92090104>] irq_exit+0xa4/0xb0
[  308.385626] WARNING: CPU: 0 PID: 36 at drivers/gpu/drm/i915/intel_psr.c:580 intel_psr_activate+0xd3/0x100 [i915]
[  308.385628] ---[ end trace 88df950472f3aa36 ]---
[  308.385630] ------------[ cut here ]------------
[  308.385631] WARN_ON(dev_priv->psr.active)
[  308.385668] WARNING: CPU: 0 PID: 36 at drivers/gpu/drm/i915/intel_psr.c:583 intel_psr_activate+0xed/0x100 [i915]
[  308.385670] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec e1000e snd_hwdep snd_hda_core snd_pcm mei_me mei prime_numbers [last unloaded: i915]
[  308.385699] CPU: 0 PID: 36 Comm: kworker/0:1 Tainted: G     U  W         4.17.0-rc7-g02d8db1a894b-drmtip_64+ #1
[  308.385700] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X114.B11.1712190231 12/19/2017
[  308.385726] Workqueue: events intel_psr_work [i915]
[  308.385753] RIP: 0010:intel_psr_activate+0xed/0x100 [i915]
[  308.385755] RSP: 0018:ffffa731001cfe20 EFLAGS: 00010282
[  308.385757] RAX: 0000000000000000 RBX: ffff8da3da820000 RCX: 0000000000000001
[  308.385759] RDX: 0000000080000001 RSI: ffffffff930fc071 RDI: 00000000ffffffff
[  308.385760] RBP: ffff8da3d8c1a260 R08: 00000000be9495f1 R09: 0000000000000000
[  308.385762] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8da3da820000
[  308.385763] R13: 000000000006f940 R14: 00000000f0000000 R15: 0000000000000001
[  308.385765] FS:  0000000000000000(0000) GS:ffff8da3f1000000(0000) knlGS:0000000000000000
[  308.385767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  308.385769] CR2: 00007fb03d81f2f8 CR3: 0000000120210002 CR4: 0000000000760ef0
[  308.385770] PKRU: 55555554
[  308.385771] Call Trace:
[  308.385798]  intel_psr_work+0xcf/0xe0 [i915]
[  308.385802]  process_one_work+0x229/0x6a0
[  308.385807]  worker_thread+0x1f9/0x380
[  308.385811]  ? process_one_work+0x6a0/0x6a0
[  308.385813]  kthread+0x119/0x130
[  308.385816]  ? kthread_flush_work_fn+0x10/0x10
[  308.385820]  ret_from_fork+0x3a/0x50
[  308.385828] Code: c6 d0 1d 91 c0 48 c7 c7 3c b5 8e c0 e8 9d fc 82 d1 0f 0b e9 5c ff ff ff 48 c7 c6 3f b5 8e c0 48 c7 c7 3c b5 8e c0 e8 83 fc 82 d1 <0f> 0b e9 4f ff ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53 
[  308.385907] irq event stamp: 1449658
[  308.385909] hardirqs last  enabled at (1449657): [<ffffffff920fcb96>] console_unlock+0x426/0x640
[  308.385911] hardirqs last disabled at (1449658): [<ffffffff92a0111c>] error_entry+0x7c/0x100
[  308.385914] softirqs last  enabled at (1448906): [<ffffffff92c0032b>] __do_softirq+0x32b/0x4e1
[  308.385916] softirqs last disabled at (1448899): [<ffffffff92090104>] irq_exit+0xa4/0xb0
[  308.385941] WARNING: CPU: 0 PID: 36 at drivers/gpu/drm/i915/intel_psr.c:583 intel_psr_activate+0xed/0x100 [i915]
[  308.385943] ---[ end trace 88df950472f3aa37 ]---
Comment 1 Martin Peres 2018-06-18 07:25:51 UTC
Since this is a serious regression, bumping the priority to highest and assigning to James.
Comment 2 Martin Peres 2018-06-18 07:50:06 UTC
Other failure mode:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-cfl-s3/igt@syncobj_wait@invalid-multi-wait-unsubmitted.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-kbl-7560u/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-kbl-r/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6600u/igt@kms_frontbuffer_tracking@psr-rgb565-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_64/fi-skl-6700hq/igt@kms_frontbuffer_tracking@psr-1p-pri-indfb-multidraw.html

[  137.395037] ------------[ cut here ]------------
[  137.395062] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (dev_priv->psr_mmio_base + 0) })), true) & (1<<31))
[  137.395111] WARNING: CPU: 1 PID: 202 at drivers/gpu/drm/i915/intel_psr.c:582 intel_psr_activate+0x90/0x100 [i915]
[  137.395113] Modules linked in: vgem i915 cdc_ether usbnet r8152 x86_pkg_temp_thermal intel_powerclamp mii coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me e1000e mei prime_numbers
[  137.395136] CPU: 1 PID: 202 Comm: kworker/1:3 Tainted: G     U            4.17.0-rc7-g02d8db1a894b-drmtip_64+ #1
[  137.395137] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
[  137.395166] Workqueue: events intel_psr_work [i915]
[  137.395195] RIP: 0010:intel_psr_activate+0x90/0x100 [i915]
[  137.395196] RSP: 0018:ffffa362806c7e20 EFLAGS: 00010282
[  137.395199] RAX: 0000000000000000 RBX: ffff9da700d30000 RCX: 0000000000000001
[  137.395201] RDX: 0000000080000001 RSI: ffffffffb00fc071 RDI: 00000000ffffffff
[  137.395202] RBP: ffff9da700f9c3b0 R08: 000000003961f289 R09: 0000000000000000
[  137.395203] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9da700d30000
[  137.395205] R13: 000000000006f840 R14: 00000000e0000000 R15: 0000000000000000
[  137.395206] FS:  0000000000000000(0000) GS:ffff9da70d240000(0000) knlGS:0000000000000000
[  137.395208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.395210] CR2: 00007f765d3fe2f8 CR3: 0000000242210004 CR4: 00000000003606e0
[  137.395211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.395212] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  137.395214] Call Trace:
[  137.395242]  intel_psr_work+0xcf/0xe0 [i915]
[  137.395247]  process_one_work+0x229/0x6a0
[  137.395252]  worker_thread+0x1f9/0x380
[  137.395256]  ? process_one_work+0x6a0/0x6a0
[  137.395258]  kthread+0x119/0x130
[  137.395260]  ? kthread_flush_work_fn+0x10/0x10
[  137.395265]  ret_from_fork+0x3a/0x50
[  137.395272] Code: c3 8b b3 20 76 00 00 ba 01 00 00 00 48 89 df e8 e7 c7 63 ef 85 c0 79 b9 48 c7 c6 70 fe 67 c0 48 c7 c7 3c 95 65 c0 e8 e0 1c ac ee <0f> 0b eb a2 48 8d bb c0 a3 00 00 be ff ff ff ff e8 0b 45 b2 ee 
[  137.395346] irq event stamp: 118292
[  137.395348] hardirqs last  enabled at (118291): [<ffffffffaf0fcb96>] console_unlock+0x426/0x640
[  137.395350] hardirqs last disabled at (118292): [<ffffffffafa0111c>] error_entry+0x7c/0x100
[  137.395352] softirqs last  enabled at (118074): [<ffffffffaf0a7a79>] process_one_work+0x229/0x6a0
[  137.395355] softirqs last disabled at (118070): [<ffffffffaf812a3c>] neigh_periodic_work+0x2c/0x300
[  137.395382] WARNING: CPU: 1 PID: 202 at drivers/gpu/drm/i915/intel_psr.c:582 intel_psr_activate+0x90/0x100 [i915]
[  137.395383] ---[ end trace 7fa5d0f0c2a66a48 ]---
[  137.395385] ------------[ cut here ]------------
[  137.395386] WARN_ON(dev_priv->psr.active)
[  137.395420] WARNING: CPU: 1 PID: 202 at drivers/gpu/drm/i915/intel_psr.c:583 intel_psr_activate+0xed/0x100 [i915]
[  137.395421] Modules linked in: vgem i915 cdc_ether usbnet r8152 x86_pkg_temp_thermal intel_powerclamp mii coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me e1000e mei prime_numbers
[  137.395442] CPU: 1 PID: 202 Comm: kworker/1:3 Tainted: G     U  W         4.17.0-rc7-g02d8db1a894b-drmtip_64+ #1
[  137.395444] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
[  137.395470] Workqueue: events intel_psr_work [i915]
[  137.395497] RIP: 0010:intel_psr_activate+0xed/0x100 [i915]
[  137.395498] RSP: 0018:ffffa362806c7e20 EFLAGS: 00010282
[  137.395501] RAX: 0000000000000000 RBX: ffff9da700d30000 RCX: 0000000000000001
[  137.395502] RDX: 0000000080000001 RSI: ffffffffb00fc071 RDI: 00000000ffffffff
[  137.395504] RBP: ffff9da700f9c3b0 R08: 000000003961f289 R09: 0000000000000000
[  137.395505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9da700d30000
[  137.395507] R13: 000000000006f840 R14: 00000000e0000000 R15: 0000000000000000
[  137.395508] FS:  0000000000000000(0000) GS:ffff9da70d240000(0000) knlGS:0000000000000000
[  137.395510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.395511] CR2: 00007f765d3fe2f8 CR3: 0000000242210004 CR4: 00000000003606e0
[  137.395513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.395514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  137.395515] Call Trace:
[  137.395542]  intel_psr_work+0xcf/0xe0 [i915]
[  137.395546]  process_one_work+0x229/0x6a0
[  137.395551]  worker_thread+0x1f9/0x380
[  137.395554]  ? process_one_work+0x6a0/0x6a0
[  137.395557]  kthread+0x119/0x130
[  137.395559]  ? kthread_flush_work_fn+0x10/0x10
[  137.395563]  ret_from_fork+0x3a/0x50
[  137.395569] Code: c6 d0 fd 67 c0 48 c7 c7 3c 95 65 c0 e8 9d 1c ac ee 0f 0b e9 5c ff ff ff 48 c7 c6 3f 95 65 c0 48 c7 c7 3c 95 65 c0 e8 83 1c ac ee <0f> 0b e9 4f ff ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 53 
[  137.395643] irq event stamp: 118306
[  137.395646] hardirqs last  enabled at (118305): [<ffffffffaf0fcb96>] console_unlock+0x426/0x640
[  137.395647] hardirqs last disabled at (118306): [<ffffffffafa0111c>] error_entry+0x7c/0x100
[  137.395649] softirqs last  enabled at (118074): [<ffffffffaf0a7a79>] process_one_work+0x229/0x6a0
[  137.395651] softirqs last disabled at (118070): [<ffffffffaf812a3c>] neigh_periodic_work+0x2c/0x300
[  137.395677] WARNING: CPU: 1 PID: 202 at drivers/gpu/drm/i915/intel_psr.c:583 intel_psr_activate+0xed/0x100 [i915]
[  137.395679] ---[ end trace 7fa5d0f0c2a66a49 ]---
Comment 3 Dhinakaran Pandiyan 2018-06-18 17:43:37 UTC
Most likely culprit
5422b37c907e drm/i915/psr: Kill delays when activating psr back.
Comment 4 Dhinakaran Pandiyan 2018-06-19 02:55:53 UTC
@Martin

Can you trigger a one-off run to check if [1] fixes the issue? We'll need the shards test list on fast feedback machines to test this.

[1] https://patchwork.freedesktop.org/patch/230188/
Comment 5 Martin Peres 2018-06-19 14:31:52 UTC
(In reply to Dhinakaran Pandiyan from comment #4)
> @Martin
> 
> Can you trigger a one-off run to check if [1] fixes the issue? We'll need
> the shards test list on fast feedback machines to test this.
> 
> [1] https://patchwork.freedesktop.org/patch/230188/

@Tomi, can you help here?
Comment 6 Martin Peres 2018-06-20 08:04:00 UTC
Another failure mode:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-cfl-s3/igt@gem_userptr_blits@process-exit-gtt.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-cnl-psr/igt@gem_ctx_isolation@bcs0-none.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-kbl-7560u/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-kbl-r/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-skl-6600u/igt@perf_pmu@other-init-3.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-skl-6700hq/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

[  250.692574] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (dev_priv->psr_mmio_base + 0) })), true) & (1 << 31))
[  250.692641] WARNING: CPU: 6 PID: 131 at drivers/gpu/drm/i915/intel_psr.c:582 intel_psr_activate+0x90/0x100 [i915]
[  250.692643] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm r8169 mii mei_me mei prime_numbers
[  250.692680] CPU: 6 PID: 131 Comm: kworker/6:1 Tainted: G     U  W         4.18.0-rc1-g93475d62c730-drmtip_67+ #1
[  250.692682] Hardware name: TOSHIBA SATELLITE P50-C/06F4                            , BIOS 1.40 03/29/2016
[  250.692722] Workqueue: events intel_psr_work [i915]
[  250.692762] RIP: 0010:intel_psr_activate+0x90/0x100 [i915]
[  250.692764] Code: 8b b3 20 76 00 00 ba 01 00 00 00 48 89 df e8 d7 61 92 c5 85 c0 79 b9 48 c7 c6 d0 73 39 c0 48 c7 c7 aa 09 37 c0 e8 80 ba da c4 <0f> 0b eb a2 48 8d bb c0 a3 00 00 be ff ff ff ff e8 8b e2 e0 c4 85 
[  250.692861] RSP: 0018:ffffa81a004d7e20 EFLAGS: 00010282
[  250.692864] RAX: 0000000000000000 RBX: ffff99b9e6ef0000 RCX: 0000000000000001
[  250.692866] RDX: 0000000080000001 RSI: ffffffff86086d8e RDI: 00000000ffffffff
[  250.692868] RBP: ffff99b9dfff8110 R08: 00000000e95578c9 R09: 0000000000000000
[  250.692870] R10: 0000000000000000 R11: 0000000000000000 R12: ffff99b9e6ef0000
[  250.692872] R13: 000000000006f840 R14: 00000000e0000000 R15: 0000000000000000
[  250.692875] FS:  0000000000000000(0000) GS:ffff99ba01d80000(0000) knlGS:0000000000000000
[  250.692877] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  250.692879] CR2: 0000559accf348e8 CR3: 0000000137210006 CR4: 00000000003606e0
[  250.692880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  250.692882] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  250.692884] Call Trace:
[  250.692923]  ? intel_psr_work+0xcf/0xe0 [i915]
[  250.692929]  ? process_one_work+0x248/0x6c0
[  250.692936]  ? worker_thread+0x1fb/0x380
[  250.692941]  ? process_one_work+0x6c0/0x6c0
[  250.692944]  ? kthread+0x119/0x130
[  250.692947]  ? kthread_flush_work_fn+0x10/0x10
[  250.692953]  ? ret_from_fork+0x3a/0x50
[  250.692962] irq event stamp: 484476
[  250.692965] hardirqs last  enabled at (484475): [<ffffffff850fc74c>] console_unlock+0x3fc/0x600
[  250.692969] hardirqs last disabled at (484476): [<ffffffff85a0111c>] error_entry+0x7c/0x100
[  250.692972] softirqs last  enabled at (479628): [<ffffffff85c0034f>] __do_softirq+0x34f/0x505
[  250.692975] softirqs last disabled at (479607): [<ffffffff850904f9>] irq_exit+0xa9/0xc0
[  250.693012] WARNING: CPU: 6 PID: 131 at drivers/gpu/drm/i915/intel_psr.c:582 intel_psr_activate+0x90/0x100 [i915]
[  250.693014] ---[ end trace bc520f915a69fbb8 ]---
Comment 7 Martin Peres 2018-06-20 08:10:02 UTC
We also have ~80 failures like this on fi-cnl-psr:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_67/fi-cnl-psr/igt@gem_ctx_isolation@bcs0-none.html

[   98.026022] ------------[ cut here ]------------
[   98.026024] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (0x6f900) })), true) & (1 << 31))
[   98.026095] WARNING: CPU: 3 PID: 49 at drivers/gpu/drm/i915/intel_psr.c:580 intel_psr_activate+0xd3/0x100 [i915]
[   98.026097] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec e1000e snd_hwdep snd_hda_core snd_pcm mei_me mei prime_numbers
[   98.026124] CPU: 3 PID: 49 Comm: kworker/3:1 Tainted: G     U  W         4.18.0-rc1-g93475d62c730-drmtip_67+ #1
[   98.026125] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X114.B11.1712190231 12/19/2017
[   98.026165] Workqueue: events intel_psr_work [i915]
[   98.026206] RIP: 0010:intel_psr_activate+0xd3/0x100 [i915]
[   98.026207] Code: 48 c7 c6 58 54 69 c0 48 c7 c7 aa e9 66 c0 e8 54 da aa ec 0f 0b eb 8d 48 c7 c6 30 53 69 c0 48 c7 c7 aa e9 66 c0 e8 3d da aa ec <0f> 0b e9 5c ff ff ff 48 c7 c6 ad e9 66 c0 48 c7 c7 aa e9 66 c0 e8 
[   98.026273] RSP: 0018:ffff9bdf80237e20 EFLAGS: 00010282
[   98.026276] RAX: 0000000000000000 RBX: ffff98bfcdb80000 RCX: 0000000000000001
[   98.026278] RDX: 0000000080000001 RSI: ffffffffae086d8e RDI: 00000000ffffffff
[   98.026279] RBP: ffff98bfcec6a260 R08: 00000000fac6fc42 R09: 0000000000000000
[   98.026281] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98bfcdb80000
[   98.026282] R13: 000000000006f940 R14: 00000000f0000000 R15: 0000000000000000
[   98.026284] FS:  0000000000000000(0000) GS:ffff98bff1180000(0000) knlGS:0000000000000000
[   98.026285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.026287] CR2: 00007ffc372b6fb8 CR3: 0000000263210004 CR4: 0000000000760ee0
[   98.026288] PKRU: 55555554
[   98.026289] Call Trace:
[   98.026330]  ? intel_psr_work+0xcf/0xe0 [i915]
[   98.026336]  ? process_one_work+0x248/0x6c0
[   98.026344]  ? worker_thread+0x1fb/0x380
[   98.026348]  ? process_one_work+0x6c0/0x6c0
[   98.026350]  ? kthread+0x119/0x130
[   98.026353]  ? kthread_flush_work_fn+0x10/0x10
[   98.026358]  ? ret_from_fork+0x3a/0x50
[   98.026368] irq event stamp: 150722
[   98.026371] hardirqs last  enabled at (150721): [<ffffffffad0fdc6c>] vprintk_emit+0x4bc/0x4d0
[   98.026374] hardirqs last disabled at (150722): [<ffffffffada0111c>] error_entry+0x7c/0x100
[   98.026376] softirqs last  enabled at (150400): [<ffffffffad0a7ea8>] process_one_work+0x248/0x6c0
[   98.026379] softirqs last disabled at (150396): [<ffffffffad81b15c>] neigh_periodic_work+0x2c/0x300
[   98.026417] WARNING: CPU: 3 PID: 49 at drivers/gpu/drm/i915/intel_psr.c:580 intel_psr_activate+0xd3/0x100 [i915]
[   98.026419] ---[ end trace 586dc221fee64efc ]---
Comment 8 Dhinakaran Pandiyan 2018-06-20 17:09:08 UTC
With the patch?
Comment 9 Martin Peres 2018-06-20 17:12:25 UTC
(In reply to Dhinakaran Pandiyan from comment #8)
> With the patch?

Sorry, no, this is with a vanilla drmtip. We need to wait for Tomi to make the customer run :s
Comment 10 Martin Peres 2018-06-20 17:22:54 UTC
(In reply to Dhinakaran Pandiyan from comment #8)
> With the patch?

Actually, why don't you just push it? I mean, we have a limitation in our CI system, and testing will not come until Monday at least.

Worst case scenario, PSR is still broken after. Best case scenario, it works after. To me, merging this trivial patch makes sense.
Comment 11 Dhinakaran Pandiyan 2018-06-20 17:39:24 UTC
(In reply to Martin Peres from comment #10)
> (In reply to Dhinakaran Pandiyan from comment #8)
> > With the patch?
> 
> Actually, why don't you just push it? I mean, we have a limitation in our CI
> system, and testing will not come until Monday at least.
> 
> Worst case scenario, PSR is still broken after. Best case scenario, it works
> after. To me, merging this trivial patch makes sense.

Yeah, I was going to push it but preferred to have a confirmation if it was easy to get.
Comment 12 Dhinakaran Pandiyan 2018-06-20 20:38:27 UTC
Do(In reply to Dhinakaran Pandiyan from comment #11)
> (In reply to Martin Peres from comment #10)
> > (In reply to Dhinakaran Pandiyan from comment #8)
> > > With the patch?
> > 
> > Actually, why don't you just push it? I mean, we have a limitation in our CI
> > system, and testing will not come until Monday at least.
> > 
> > Worst case scenario, PSR is still broken after. Best case scenario, it works
> > after. To me, merging this trivial patch makes sense.
> 
> Yeah, I was going to push it but preferred to have a confirmation if it was
> easy to get.

Done.
98fa2aecb509 drm/i915/psr: Fix warning in intel_psr_activate()
Comment 13 Martin Peres 2018-06-20 23:04:37 UTC
(In reply to Dhinakaran Pandiyan from comment #12)
> Do(In reply to Dhinakaran Pandiyan from comment #11)
> > (In reply to Martin Peres from comment #10)
> > > (In reply to Dhinakaran Pandiyan from comment #8)
> > > > With the patch?
> > > 
> > > Actually, why don't you just push it? I mean, we have a limitation in our CI
> > > system, and testing will not come until Monday at least.
> > > 
> > > Worst case scenario, PSR is still broken after. Best case scenario, it works
> > > after. To me, merging this trivial patch makes sense.
> > 
> > Yeah, I was going to push it but preferred to have a confirmation if it was
> > easy to get.
> 
> Done.
> 98fa2aecb509 drm/i915/psr: Fix warning in intel_psr_activate()

Wonderful, thanks DK for the fast turnaround!

I'm resolving the bug, and will close it when I can verify that this indeed fixed it :)
Comment 14 Martin Peres 2018-06-22 22:34:49 UTC
(In reply to Martin Peres from comment #13)
> (In reply to Dhinakaran Pandiyan from comment #12)
> > Do(In reply to Dhinakaran Pandiyan from comment #11)
> > > (In reply to Martin Peres from comment #10)
> > > > (In reply to Dhinakaran Pandiyan from comment #8)
> > > > > With the patch?
> > > > 
> > > > Actually, why don't you just push it? I mean, we have a limitation in our CI
> > > > system, and testing will not come until Monday at least.
> > > > 
> > > > Worst case scenario, PSR is still broken after. Best case scenario, it works
> > > > after. To me, merging this trivial patch makes sense.
> > > 
> > > Yeah, I was going to push it but preferred to have a confirmation if it was
> > > easy to get.
> > 
> > Done.
> > 98fa2aecb509 drm/i915/psr: Fix warning in intel_psr_activate()
> 
> Wonderful, thanks DK for the fast turnaround!
> 
> I'm resolving the bug, and will close it when I can verify that this indeed
> fixed it :)

Not everything has been fixed, unfortunately :s Re-opening!
Comment 15 Dhinakaran Pandiyan 2018-06-23 04:49:12 UTC
Looks like another possible race, submitted a tentative solution - 
https://lists.freedesktop.org/archives/intel-gfx/2018-June/168957.html
Comment 16 Martin Peres 2018-06-23 12:53:11 UTC
(In reply to Dhinakaran Pandiyan from comment #15)
> Looks like another possible race, submitted a tentative solution - 
> https://lists.freedesktop.org/archives/intel-gfx/2018-June/168957.html

Thanks, let's see if it helps :)
Comment 17 Chris Wilson 2018-07-14 17:51:41 UTC
commit c12e0643a05d978657877630d4da1ace06ea3720
Author: Dhinakaran Pandiyan <dhinakaran.pandiyan@gmail.com>
Date:   Sun Jun 24 22:47:40 2018 -0700

    drm/i915/psr: Fix race in intel_psr_work()
    
    Commit 5422b37c907e ("drm/i915/psr: Kill delays when activating psr
    back.") switched from delayed work to the plain variant and while doing so
    removed the check for work_busy() before scheduling a PSR activation.
    This appears to cause consecutive executions of psr_activate() in this
    scenario - after a worker picks up the PSR work item for execution and
    before the work function can acquire the PSR mutex, a psr_flush() can
    get hold of the mutex and schedule another PSR work. Without a psr_exit()
    between the two psr_activate() calls, warning messages get printed.
    Further, since we drop the mutex in the midst of psr_work() to wait for
    PSR to idle, another work item can also get scheduled. Fix this by
    returning if PSR was already active.
    
    Fixes: 5422b37c907e ("drm/i915/psr: Kill delays when activating psr back.")
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106948
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: José Roberto de Souza <jose.souza@intel.com>
    Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180625054741.3919-1-dhinakaran.pandiyan@intel.com
Comment 18 James Ausmus 2018-07-18 01:36:03 UTC
Martin - can you verify if this is ready to be closed after the latest fix?
Comment 19 Martin Peres 2018-07-18 12:09:09 UTC
(In reply to James Ausmus from comment #18)
> Martin - can you verify if this is ready to be closed after the latest fix?

Yes, it was indeed fixed. It started passing with drmtip_73.

Closing!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.