Bug 101705

Summary: [BAT][BYT] WARN intel_uncore.c:792 __unclaimed_reg_debug (reg 0x1f0034)
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Maarten Lankhorst <bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs, jwrdegoede
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BYT i915 features: display/Other

Description Martin Peres 2017-07-06 10:57:47 UTC
Our two baytrails starting reporting the following warning randomly when running igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b, starting from CI_DRM_2783:

[  446.057678] Unclaimed read from register 0x1f0034
[  446.057755] ------------[ cut here ]------------
[  446.057816] WARNING: CPU: 0 PID: 2936 at drivers/gpu/drm/i915/intel_uncore.c:792 __unclaimed_reg_debug+0x3e/0x50 [i915]
[  446.057820] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm i915 r8169 mii lpc_ich i2c_hid i2c_designware_platform i2c_designware_core prime_numbers
[  446.057915] CPU: 0 PID: 2936 Comm: kworker/u8:14 Tainted: G     U          4.12.0-CI-CI_DRM_2804+ #1
[  446.057919] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
[  446.057928] Workqueue: events_unbound async_run_entry_fn
[  446.057936] task: ffff88010d2e8040 task.stack: ffffc90000430000
[  446.057991] RIP: 0010:__unclaimed_reg_debug+0x3e/0x50 [i915]
[  446.057996] RSP: 0018:ffffc90000433978 EFLAGS: 00010096
[  446.058004] RAX: 0000000000000025 RBX: 0000000000000000 RCX: 0000000000000002
[  446.058009] RDX: 0000000000000000 RSI: ffffffff81cba24f RDI: ffffffff81c99957
[  446.058013] RBP: ffffc90000433990 R08: 0000000000000000 R09: 0000000000000001
[  446.058017] R10: ffffc90000433908 R11: fc8c7a4200000000 R12: 00000000001f0034
[  446.058021] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff8801321c0b98
[  446.058026] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[  446.058031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  446.058035] CR2: 00007f0b0d26f028 CR3: 0000000135678000 CR4: 00000000001006f0
[  446.058039] Call Trace:
[  446.058099]  fwtable_read32+0x278/0x2c0 [i915]
[  446.058153]  vlv_program_watermarks+0x41e/0x770 [i915]
[  446.058213]  vlv_optimize_watermarks+0xa1/0xc0 [i915]
[  446.058273]  intel_atomic_commit_tail+0x339/0xf70 [i915]
[  446.058343]  intel_atomic_commit+0x3fb/0x500 [i915]
[  446.058353]  ? drm_atomic_check_only+0x370/0x560
[  446.058405]  ? intel_runtime_pm_put+0x51/0xa0 [i915]
[  446.058416]  drm_atomic_commit+0x46/0x50
[  446.058425]  drm_atomic_helper_commit_duplicated_state+0xbf/0xd0
[  446.058482]  __intel_display_resume+0x81/0xc0 [i915]
[  446.058541]  intel_display_resume+0xca/0xf0 [i915]
[  446.058601]  i915_pm_restore+0xef/0x190 [i915]
[  446.058652]  i915_pm_resume+0x9/0x10 [i915]
[  446.058659]  pci_pm_resume+0x5f/0x90
[  446.058668]  dpm_run_callback+0x6f/0x330
[  446.058673]  ? pci_pm_thaw+0x90/0x90
[  446.058682]  device_resume+0xac/0x1e0
[  446.058691]  ? dpm_watchdog_set+0x60/0x60
[  446.058703]  async_resume+0x18/0x40
[  446.058709]  async_run_entry_fn+0x34/0x160
[  446.058719]  process_one_work+0x1fe/0x670
[  446.058732]  worker_thread+0x49/0x3b0
[  446.058744]  kthread+0x10f/0x150
[  446.058750]  ? process_one_work+0x670/0x670
[  446.058756]  ? kthread_create_on_node+0x40/0x40
[  446.058765]  ret_from_fork+0x27/0x40
[  446.058781] Code: de ff ff 38 d8 76 2d 45 84 ed 48 c7 c0 9d 89 18 a0 48 c7 c6 a7 89 18 a0 48 0f 45 f0 44 89 e2 48 c7 c7 b0 89 18 a0 e8 7b a6 0c e1 <0f> ff 83 2d 45 32 12 00 01 5b 41 5c 41 5d 5d c3 66 90 55 48 89 
[  446.059026] ---[ end trace cc38744b313e72cf ]---

This may be linked to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=101516
Comment 1 Hans de Goede 2017-08-05 11:08:38 UTC
Just adding a me too comment here. FWIW here is my almost identical backtrace:

[   56.955016] PM: early resume of devices complete after 1068.631 msecs
[   56.956396] pcieport 0000:00:1c.0: System wakeup disabled by ACPI
[   56.957697] rtc_cmos 00:04: System wakeup disabled by ACPI
[   56.958261] Suspended for 0.888 seconds
[   56.964250] Unclaimed read from register 0x1f0034
[   56.964347] ------------[ cut here ]------------
[   56.964428] WARNING: CPU: 2 PID: 1799 at drivers/gpu/drm/i915/intel_uncore.c:801 __unclaimed_reg_debug+0x4e/0x60 [i915]
...
[   56.964590] Workqueue: events_unbound async_run_entry_fn
[   56.964594] task: ffff89ad698b0000 task.stack: ffff9ba401454000
[   56.964662] RIP: 0010:__unclaimed_reg_debug+0x4e/0x60 [i915]
...
[   56.964688] Call Trace:
[   56.964758]  fwtable_read32+0x17a/0x1d0 [i915]
[   56.964818]  vlv_program_watermarks+0x3c4/0x610 [i915]
[   56.964883]  ? intel_hdmi_get_hw_state+0x27/0xd0 [i915]
[   56.964941]  vlv_optimize_watermarks+0x95/0xb0 [i915]
[   56.965008]  intel_atomic_commit_tail+0x2e2/0x1030 [i915]
[   56.965016]  ? tracing_record_cmdline+0x32/0x120
[   56.965022]  ? __schedule+0x23e/0x860
[   56.965090]  intel_atomic_commit+0x399/0x4b0 [i915]
[   56.965125]  ? drm_atomic_check_only+0x37f/0x540 [drm]
[   56.965155]  drm_atomic_commit+0x4b/0x50 [drm]
[   56.965174]  drm_atomic_helper_commit_duplicated_state+0xc2/0xd0 [drm_kms_helper]
[   56.965245]  __intel_display_resume+0x85/0xc0 [i915]
[   56.965312]  intel_display_resume+0xf7/0x120 [i915]
[   56.965370]  i915_drm_resume+0xe1/0x180 [i915]
[   56.965427]  i915_pm_resume+0x1e/0x30 [i915]
[   56.965434]  pci_pm_resume+0x65/0xa0
[   56.965440]  dpm_run_callback+0x57/0x140
[   56.965444]  ? pci_pm_thaw+0x90/0x90
[   56.965447]  device_resume+0xe1/0x200
[   56.965450]  async_resume+0x1d/0x50
[   56.965455]  async_run_entry_fn+0x39/0x170
[   56.965460]  process_one_work+0x193/0x3c0
...
[   57.062736] PM: resume of devices complete after 107.709 msecs
[   57.063468] PM: resume devices took 0.108 seconds
[   57.063478] PM: Finishing wakeup.

Regards,

Hans
Comment 2 Ville Syrjala 2017-10-13 15:11:19 UTC
I've not thought through the issue in detail, but I think one potential way to fix this would to add the MODESET domain to the display power well. That should also mean that we could again eliminate 886015a0ad43 ("drm/i915: reintroduce VLV/CHV PFI programming power domain workaround").
Comment 3 krisman 2017-10-16 04:10:45 UTC
(In reply to Ville Syrjala from comment #2)
> I've not thought through the issue in detail, but I think one potential way
> to fix this would to add the MODESET domain to the display power well. That
> should also mean that we could again eliminate 886015a0ad43 ("drm/i915:
> reintroduce VLV/CHV PFI programming power domain workaround").

Hi Ville,  I have been playing with this for a while, mostly blindly, since I don't have documentation on power domains.  In fact, I had tried this suggestion beforebut I gave another try and it didn't make a difference:

https://lists.freedesktop.org/archives/intel-gfx-trybot/2017-October/023994.html
Comment 4 Maarten Lankhorst 2017-10-30 10:46:05 UTC
Seems first patch in https://patchwork.freedesktop.org/series/32739/ fixes it..
Comment 5 Elizabeth 2017-11-01 17:24:26 UTC
Quick note, we're still hitting this issue on BYT with IGT-Version: 1.20-g7aac0e8 (x86_64) (Linux: 4.14.0-rc7-drm-intel-qa-ww44-commit-ec9f758+ x86_64).
Comment 6 Maarten Lankhorst 2017-11-17 14:41:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

Seems fixed in CI_DRM_3320, most likely commit:

commit 1a1f12872edcd5e425b668a35fb23548cfa918ef
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 7 14:03:38 2017 +0000

    drm/i915: Prevent unbounded wm results in g4x_compute_wm()

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.