102543 – [BAT][HSW] igt@tools_test@tools_test - Unclaimed read from register 0x[4c]400c

Bug 102543 - [BAT][HSW] igt@tools_test@tools_test - Unclaimed read from register 0x[4c]400c

Summary: [BAT][HSW] igt@tools_test@tools_test - Unclaimed read from register 0x[4c]400c

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	IGT (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	highest critical
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2017-09-05 09:27 UTC by Martin Peres
Modified:	2017-09-08 11:32 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	HSW
i915 features:

Attachments

Description Martin Peres 2017-09-05 09:27:03 UTC

Starting from CI_DRM_3033, the machine shard-hsw produced the following warning when running the test igt@tools_test@tools_test:

[  227.980159] Unclaimed read from register 0xc400c
[  227.980198] ------------[ cut here ]------------
[  227.980233] WARNING: CPU: 3 PID: 0 at drivers/gpu/drm/i915/intel_uncore.c:792 __unclaimed_reg_debug+0x47/0x60 [i915]
[  227.980235] Modules linked in: vgem snd_hda_codec_hdmi i915 snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core r8169 snd_pcm mii mei_me mei prime_numbers lpc_ich
[  227.980271] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G     U          4.13.0-CI-CI_DRM_3037+ #1
[  227.980273] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[  227.980275] task: ffff88040d562800 task.stack: ffffc9000009c000
[  227.980300] RIP: 0010:__unclaimed_reg_debug+0x47/0x60 [i915]
[  227.980302] RSP: 0018:ffff88041fac3e28 EFLAGS: 00010092
[  227.980306] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000010003
[  227.980308] RDX: 0000000080010003 RSI: ffffffff81d021bf RDI: 00000000ffffffff
[  227.980309] RBP: ffff88041fac3e40 R08: 0000000000000000 R09: 0000000000000001
[  227.980311] R10: ffff88041fac3db8 R11: 00000000fa823003 R12: 00000000000c400c
[  227.980313] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff8803fa380ba8
[  227.980315] FS:  0000000000000000(0000) GS:ffff88041fac0000(0000) knlGS:0000000000000000
[  227.980316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  227.980318] CR2: 00007f5255477788 CR3: 0000000003e0f000 CR4: 00000000001406e0
[  227.980320] Call Trace:
[  227.980322]  <IRQ>
[  227.980347]  gen6_read32+0x22d/0x2b0 [i915]
[  227.980367]  ironlake_irq_handler+0xac/0xa80 [i915]
[  227.980372]  __handle_irq_event_percpu+0x49/0x350
[  227.980376]  handle_irq_event_percpu+0x23/0x60
[  227.980379]  handle_irq_event+0x39/0x60
[  227.980382]  handle_edge_irq+0xf4/0x1c0
[  227.980385]  handle_irq+0x1a/0x30
[  227.980389]  do_IRQ+0x68/0x130
[  227.980392]  common_interrupt+0x90/0x90
[  227.980395] RIP: 0010:cpuidle_enter_state+0x136/0x370
[  227.980397] RSP: 0018:ffffc9000009fe80 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff8d
[  227.980400] RAX: ffff88040d562800 RBX: 0000000000020319 RCX: 0000000000000001
[  227.980402] RDX: 0000000000000000 RSI: ffffffff81cf77c4 RDI: ffffffff81cae67e
[  227.980403] RBP: ffffc9000009feb8 R08: 0000000000000331 R09: 0000000000000018
[  227.980405] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[  227.980407] R13: 0000000000000005 R14: ffffe8ffffcc0530 R15: 0000003514aaa3ef
[  227.980408]  </IRQ>
[  227.980416]  cpuidle_enter+0x17/0x20
[  227.980419]  call_cpuidle+0x23/0x40
[  227.980422]  do_idle+0x192/0x1e0
[  227.980426]  cpu_startup_entry+0x1d/0x20
[  227.980430]  start_secondary+0x102/0x120
[  227.980434]  secondary_startup_64+0x9f/0x9f
[  227.980439] Code: 01 75 31 84 db 75 2d 45 84 ed 48 c7 c0 da 7b 44 a0 48 c7 c6 d0 7b 44 a0 48 0f 44 f0 44 89 e2 48 c7 c7 e3 7b 44 a0 e8 6a 4d d6 e0 <0f> ff 83 2d a4 ef 0f 00 01 5b 41 5c 41 5d 5d c3 66 0f 1f 84 00 
[  227.980527] ---[ end trace f0a9fad54483215d ]---

This happens 100% of the time, and it seems to have come due to one of this commits: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3033/commits_short.log

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3033/shard-hsw3/igt@tools_test@tools_test.html

Comment 1 Chris Wilson 2017-09-05 09:50:31 UTC

That looks entirely to be a test bug, intel_reg_read reading unknown registers which is then caught by the kernel on its next mmio. Since the kernel is not aware of a third party messing around with registers, it takes the blame upon itself.

Comment 2 Martin Peres 2017-09-07 08:53:12 UTC

The test has been passing recently. Has anything changed?

Comment 3 Chris Wilson 2017-09-07 08:58:42 UTC

From our pov, no. The highlighted commit was just the merge to 4.13. But the failure does depend upon i915 doing something during the test, but the test itself bypasses i915 for direct hw access.

What we could do is to disable automatic mmio checking if user forcewake is taken. Then clear any errors upon release.

Comment 4 Chris Wilson 2017-09-07 09:45:38 UTC

E.g. https://patchwork.freedesktop.org/series/29935/

Comment 5 Chris Wilson 2017-09-07 17:02:24 UTC

commit d7a133d886b45651e36e7065998b1413d379ac1f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 7 14:44:41 2017 +0100

    drm/i915: Disable mmio debugging during user access
    
    If the user bypasses i915 and accesses mmio directly, that easily
    confuses our automatic mmio debugging (any error we then detect is
    likely to be as a result of the user). Since we expect userspace to open
    debugfs/i915_forcewake_user if i915.ko is loaded and they want mmio
    access, that makes the opportune time to disable our debugging for
    duration of the bypass.
    
    v2: Move the fiddling of uncore internals to uncore.c

The issue in intel_reg_dump is still there, just not triggering a kernel warning.

Comment 6 Martin Peres 2017-09-08 10:45:27 UTC

Thanks Chris! I will close this bug.

What about https://bugs.freedesktop.org/show_bug.cgi?id=102249 though? Could the patch you sent fix this too?

Comment 7 Chris Wilson 2017-09-08 11:32:04 UTC

(In reply to Martin Peres from comment #6)
> Thanks Chris! I will close this bug.
> 
> What about https://bugs.freedesktop.org/show_bug.cgi?id=102249 though? Could
> the patch you sent fix this too?

It's not intended to have any effect for any test except for tools_test; but drv_suspend/forcewake, pm_rpm/debugfs-forcewake-user, gem_exec_latency, gem_exec_parse, gem_workarounds and pm_lsps also poke around with registers under forcewake - any may end up polluting the mmio debugger. #102249 looks clear of outside influence.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.