Bug 101517

Summary: [BAT][BYT,BSW] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (0x180000 + 0x650C) })), true) & (1 << 27))
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: krisman
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BYT i915 features: display/Other

Description Martin Peres 2017-06-20 13:30:00 UTC
Starting from CI_DRM_2743, executing the tests igt@kms_pipe_crc_basic@suspend-read-crc-pipe-[ac] on our baytrails and braswells may lead to the following WARNING:

[  579.071526] [drm:chv_set_cdclk [i915]] *ERROR* timed out waiting for CDclk change
[  579.071942] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const i915_reg_t){ .reg = (0x180000 + 0x650C) })), true) & (1 << 27))
[  579.072038] ------------[ cut here ]------------
[  579.072211] WARNING: CPU: 1 PID: 3159 at drivers/gpu/drm/i915/intel_cdclk.c:485 vlv_program_pfi_credits+0xb4/0xc0 [i915]
[  579.072224] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep r8169 snd_hda_core mii lpc_ich snd_pcm i2c_hid prime_numbers i2c_designware_platform i2c_designware_core pinctrl_cherryview
[  579.072518] CPU: 1 PID: 3159 Comm: kworker/u4:25 Tainted: G     U          4.12.0-rc6-CI-CI_DRM_2744+ #1
[  579.072531] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0058.2016.1102.1842 11/02/2016
[  579.072551] Workqueue: events_unbound async_run_entry_fn
[  579.072577] task: ffff880176ad2600 task.stack: ffffc900007b8000
[  579.072745] RIP: 0010:vlv_program_pfi_credits+0xb4/0xc0 [i915]
[  579.072842] RSP: 0000:ffffc900007bba20 EFLAGS: 00010282
[  579.072869] RAX: 000000000000007c RBX: ffff88016ae60000 RCX: 0000000000000006
[  579.072882] RDX: 0000000000000006 RSI: ffffffff81cba051 RDI: ffffffff81c99767
[  579.072895] RBP: ffffc900007bba30 R08: 0000000000000000 R09: 0000000000000001
[  579.072909] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000048004000
[  579.072922] R13: ffff88016ae60000 R14: 00000001000441da R15: ffff88016ae66d98
[  579.072937] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
[  579.072951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  579.072964] CR2: 0000000000000000 CR3: 0000000003e0f000 CR4: 00000000001006e0
[  579.072977] Call Trace:
[  579.073157]  chv_set_cdclk+0x13c/0x170 [i915]
[  579.073337]  intel_set_cdclk+0x5d/0x90 [i915]
[  579.073511]  intel_atomic_commit_tail+0xa9f/0xf70 [i915]
[  579.073716]  intel_atomic_commit+0x3fb/0x500 [i915]
[  579.073832]  ? drm_atomic_check_only+0x420/0x560
[  579.073939]  ? intel_runtime_pm_put+0x51/0xa0 [i915]
[  579.073957]  drm_atomic_commit+0x46/0x50
[  579.073974]  drm_atomic_helper_commit_duplicated_state+0xbf/0xd0
[  579.074088]  __intel_display_resume+0x81/0xc0 [i915]
[  579.074205]  intel_display_resume+0xca/0xf0 [i915]
[  579.074323]  i915_pm_restore+0xef/0x190 [i915]
[  579.074424]  i915_pm_resume+0x9/0x10 [i915]
[  579.074438]  pci_pm_resume+0x5f/0x90
[  579.074455]  dpm_run_callback+0x6f/0x330
[  579.074465]  ? pci_pm_thaw+0x90/0x90
[  579.074483]  device_resume+0xac/0x1e0
[  579.074500]  ? dpm_watchdog_set+0x60/0x60
[  579.074523]  async_resume+0x18/0x40
[  579.074535]  async_run_entry_fn+0x34/0x160
[  579.074553]  process_one_work+0x1fe/0x670
[  579.074577]  worker_thread+0x49/0x3b0
[  579.074601]  kthread+0x10f/0x150
[  579.074612]  ? process_one_work+0x670/0x670
[  579.074623]  ? kthread_create_on_node+0x40/0x40
[  579.074642]  ret_from_fork+0x27/0x40
[  579.074724] Code: 52 00 00 ba 00 40 00 40 41 bc 00 40 00 98 73 97 41 bc 00 00 00 40 eb 88 48 c7 c6 b0 17 22 a0 48 c7 c7 2c 90 20 a0 e8 a5 aa 03 e1 <0f> ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 d2 48 89 e5 53 48 83 
[  579.075235] ---[ end trace fc1be0628a46c71d ]---

Full logs:
 - https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-byt-n2820/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
 - https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-byt-j1900/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
 - https://intel-gfx-ci.01.org/CI/CI_DRM_2744/fi-bsw-n3050/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html
Comment 1 Elizabeth 2017-06-20 16:14:04 UTC
Adding tag into "Whiteboard" field - ReadyForDev
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 2 krisman 2017-06-20 19:58:14 UTC
(In reply to Martin Peres from comment #0)
> Starting from CI_DRM_2743, executing the tests
> igt@kms_pipe_crc_basic@suspend-read-crc-pipe-[ac] on our baytrails and
> braswells may lead to the following WARNING:
> 
> [  579.071526] [drm:chv_set_cdclk [i915]] *ERROR* timed out waiting for
> CDclk change
> [  579.071942] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const

I
Comment 3 krisman 2017-06-20 20:03:56 UTC
(In reply to Martin Peres from comment #0)
> Starting from CI_DRM_2743, executing the tests
> igt@kms_pipe_crc_basic@suspend-read-crc-pipe-[ac] on our baytrails and
> braswells may lead to the following WARNING:
> 
> [  579.071526] [drm:chv_set_cdclk [i915]] *ERROR* timed out waiting for
> CDclk change
> [  579.071942] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const

In the code, this WARN_ON carries the following comment:

  /*
   * FIXME is this guaranteed to clear
   * immediately or should we poll for it?
  */
  WARN_ON(I915_READ(GCI_CONTROL) & PFI_CREDIT_RESEND);

Maybe we should consider polling? :-)

Anyway, that WARN_ON can be a side effect of the *ERROR* right before it.  I'm Looking into that.  Also, I don't see much change in the commit log between CI_DRM_2742 and CI_DRM_2743 that would explain it, but I will take a deeper look.
Comment 4 Ville Syrjala 2017-06-20 20:15:26 UTC
(In reply to krisman from comment #3)
> (In reply to Martin Peres from comment #0)
> > Starting from CI_DRM_2743, executing the tests
> > igt@kms_pipe_crc_basic@suspend-read-crc-pipe-[ac] on our baytrails and
> > braswells may lead to the following WARNING:
> > 
> > [  579.071526] [drm:chv_set_cdclk [i915]] *ERROR* timed out waiting for
> > CDclk change
> > [  579.071942] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const
> 
> In the code, this WARN_ON carries the following comment:
> 
>   /*
>    * FIXME is this guaranteed to clear
>    * immediately or should we poll for it?
>   */
>   WARN_ON(I915_READ(GCI_CONTROL) & PFI_CREDIT_RESEND);
> 
> Maybe we should consider polling? :-)
> 
> Anyway, that WARN_ON can be a side effect of the *ERROR* right before it. 
> I'm Looking into that.  Also, I don't see much change in the commit log
> between CI_DRM_2742 and CI_DRM_2743 that would explain it, but I will take a
> deeper look.

The simple solution would be to just revert commit 63ff30442519 ("drm/i915: Nuke the VLV/CHV PFI programming power domain workaround")

I failed to consider that the firmware likes to reset the cdclk to some non-minimum frequency during suspend, and then we apparently perform a modeset without actually modesetting anything and thus only end up reprogramming the cdclk without the power well being enabled.
Comment 5 Martin Peres 2017-06-21 07:55:02 UTC
(In reply to Ville Syrjala from comment #4)
> (In reply to krisman from comment #3)
> > (In reply to Martin Peres from comment #0)
> > > Starting from CI_DRM_2743, executing the tests
> > > igt@kms_pipe_crc_basic@suspend-read-crc-pipe-[ac] on our baytrails and
> > > braswells may lead to the following WARNING:
> > > 
> > > [  579.071526] [drm:chv_set_cdclk [i915]] *ERROR* timed out waiting for
> > > CDclk change
> > > [  579.071942] WARN_ON(dev_priv->uncore.funcs.mmio_readl(dev_priv, (((const
> > 
> > In the code, this WARN_ON carries the following comment:
> > 
> >   /*
> >    * FIXME is this guaranteed to clear
> >    * immediately or should we poll for it?
> >   */
> >   WARN_ON(I915_READ(GCI_CONTROL) & PFI_CREDIT_RESEND);
> > 
> > Maybe we should consider polling? :-)
> > 
> > Anyway, that WARN_ON can be a side effect of the *ERROR* right before it. 
> > I'm Looking into that.  Also, I don't see much change in the commit log
> > between CI_DRM_2742 and CI_DRM_2743 that would explain it, but I will take a
> > deeper look.
> 
> The simple solution would be to just revert commit 63ff30442519 ("drm/i915:
> Nuke the VLV/CHV PFI programming power domain workaround")
> 
> I failed to consider that the firmware likes to reset the cdclk to some
> non-minimum frequency during suspend, and then we apparently perform a
> modeset without actually modesetting anything and thus only end up
> reprogramming the cdclk without the power well being enabled.

Let's do it then! Revert fast, rework carefully.
Comment 6 krisman 2017-06-28 03:50:27 UTC
*** Bug 101516 has been marked as a duplicate of this bug. ***
Comment 7 krisman 2017-06-28 18:42:48 UTC
(In reply to Ville Syrjala from comment #4)

> The simple solution would be to just revert commit 63ff30442519 ("drm/i915:
> Nuke the VLV/CHV PFI programming power domain workaround")
> 
> I failed to consider that the firmware likes to reset the cdclk to some
> non-minimum frequency during suspend, and then we apparently perform a
> modeset without actually modesetting anything and thus only end up
> reprogramming the cdclk without the power well being enabled.

trybot confirmed it addresses the issue:

https://intel-gfx-ci.01.org/CI/Trybot_973/
Comment 8 krisman 2017-06-28 21:09:59 UTC
(In reply to Martin Peres from comment #5)

> > The simple solution would be to just revert commit 63ff30442519 ("drm/i915:
> > Nuke the VLV/CHV PFI programming power domain workaround")
> > 
> > I failed to consider that the firmware likes to reset the cdclk to some
> > non-minimum frequency during suspend, and then we apparently perform a
> > modeset without actually modesetting anything and thus only end up
> > reprogramming the cdclk without the power well being enabled.
> 
> Let's do it then! Revert fast, rework carefully.

Submitted to the ML: https://lists.freedesktop.org/archives/intel-gfx/2017-June/131672.html
Comment 9 krisman 2017-06-29 15:43:59 UTC
(In reply to krisman from comment #8)
> (In reply to Martin Peres from comment #5)
> 
> > > The simple solution would be to just revert commit 63ff30442519 ("drm/i915:
> > > Nuke the VLV/CHV PFI programming power domain workaround")
> > > 
> > > I failed to consider that the firmware likes to reset the cdclk to some
> > > non-minimum frequency during suspend, and then we apparently perform a
> > > modeset without actually modesetting anything and thus only end up
> > > reprogramming the cdclk without the power well being enabled.
> > 
> > Let's do it then! Revert fast, rework carefully.
> 
> Submitted to the ML:
> https://lists.freedesktop.org/archives/intel-gfx/2017-June/131672.html

Applied and verified by CI_DRM_2783.
Comment 10 Martin Peres 2017-06-30 07:34:51 UTC
Verified and closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.