Bug 111674 - [CI][BAT][ICL-GUC] igt@i915_selftest@live_hangcheck - dmesg-warn - GuC status: 0x8002f076, MIA core expected to be in reset
Summary: [CI][BAT][ICL-GUC] igt@i915_selftest@live_hangcheck - dmesg-warn - GuC statu...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: low major
Assignee: Daniele Ceraolo Spurio
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-12 12:05 UTC by Lakshmi
Modified: 2019-09-25 18:49 UTC (History)
2 users (show)

See Also:
i915 platform: ICL
i915 features: firmware/guc


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-09-12 12:05:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6875/fi-icl-guc/igt@i915_selftest@live_hangcheck.html

GuC status: 0x8002f076, MIA core expected to be in reset
<4> [571.870282] WARNING: CPU: 6 PID: 7504 at drivers/gpu/drm/i915/gt/uc/intel_uc.c:36 __uc_sanitize+0xb0/0x1a0 [i915]
<4> [571.870283] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul ax88179_178a crc32_pclmul usbnet mii snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core e1000e mei_me ghash_clmulni_intel ptp pps_core snd_pcm mei prime_numbers [last unloaded: i915]
<4> [571.870298] CPU: 6 PID: 7504 Comm: i915_selftest Tainted: G     U            5.3.0-rc8-CI-CI_DRM_6875+ #1
<4> [571.870300] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428 04/26/2019
<4> [571.870351] RIP: 0010:__uc_sanitize+0xb0/0x1a0 [i915]
<4> [571.870354] Code: 7b f0 ba 01 00 00 00 be 00 c0 00 00 48 8b 87 b0 00 00 00 e8 82 3a 4f e1 a8 01 75 ca 89 c6 48 c7 c7 40 a0 81 a0 e8 90 2e 9a e0 <0f> 0b eb b8 48 c7 c1 90 9f 81 a0 ba a8 00 00 00 48 c7 c6 90 72 7e
<4> [571.870357] RSP: 0018:ffffc90000567978 EFLAGS: 00010286
<4> [571.870360] RAX: 0000000000000000 RBX: ffff88880962bea8 RCX: 0000000000000001
<4> [571.870362] RDX: 0000000080000001 RSI: ffff8887dee14918 RDI: 00000000ffffffff
<4> [571.870364] RBP: 0000000000000000 R08: ffff8887dee14918 R09: 0000000000000000
<4> [571.870367] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000057
<4> [571.870369] R13: ffff88880962be90 R14: ffff88880962be90 R15: 0000000000000004
<4> [571.870372] FS:  00007f9d0cfe0240(0000) GS:ffff88888ff00000(0000) knlGS:0000000000000000
<4> [571.870374] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [571.870376] CR2: 0000565051f5d5a0 CR3: 00000007f7448002 CR4: 0000000000760ee0
<4> [571.870377] PKRU: 55555554
<4> [571.870409] Call Trace:
<4> [571.870454]  reset_prepare+0x71/0x90 [i915]
<4> [571.870495]  intel_gt_reset+0x86/0x3d0 [i915]
<4> [571.870533]  igt_reset_nop+0x103/0x1f0 [i915]
<4> [571.870586]  __i915_subtests+0xb8/0x210 [i915]
<4> [571.870634]  ? __i915_live_teardown+0x70/0x70 [i915]
<4> [571.870743]  ? __intel_gt_live_setup+0x10/0x10 [i915]
<4> [571.870793]  intel_hangcheck_live_selftests+0xa5/0x100 [i915]
<4> [571.870845]  __run_selftests+0x112/0x170 [i915]
<4> [571.870886]  i915_live_selftests+0x2c/0x60 [i915]
<4> [571.870922]  i915_pci_probe+0x93/0x1b0 [i915]
<4> [571.870926]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [571.870932]  pci_device_probe+0x9e/0x120
<4> [571.870936]  really_probe+0xea/0x3d0
<4> [571.870939]  driver_probe_device+0x10b/0x120
<4> [571.870942]  device_driver_attach+0x4a/0x50
<4> [571.870945]  __driver_attach+0x97/0x130
<4> [571.870947]  ? device_driver_attach+0x50/0x50
<4> [571.870949]  bus_for_each_dev+0x74/0xc0
<4> [571.870953]  bus_add_driver+0x13f/0x210
<4> [571.870955]  ? 0xffffffffa0956000
<4> [571.870957]  driver_register+0x56/0xe0
<4> [571.870959]  ? 0xffffffffa0956000
<4> [571.870962]  do_one_initcall+0x58/0x300
<4> [571.870965]  ? do_init_module+0x1d/0x1f6
<4> [571.870969]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [571.870972]  ? kmem_cache_alloc_trace+0x2d1/0x300
<4> [571.870976]  do_init_module+0x56/0x1f6
<4> [571.870979]  load_module+0x25bd/0x2a40
<4> [571.870992]  ? __se_sys_finit_module+0xd3/0xf0
<4> [571.870994]  __se_sys_finit_module+0xd3/0xf0
<4> [571.871003]  do_syscall_64+0x55/0x1c0
<4> [571.871005]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [571.871007] RIP: 0033:0x7f9d0c698839
<4> [571.871010] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [571.871011] RSP: 002b:00007ffe29105dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [571.871013] RAX: ffffffffffffffda RBX: 000055f2411f12b0 RCX: 00007f9d0c698839
<4> [571.871015] RDX: 0000000000000000 RSI: 000055f2411eb3d0 RDI: 0000000000000006
<4> [571.871016] RBP: 000055f2411eb3d0 R08: 0000000000000004 R09: 000055f24041dc1b
<4> [571.871017] R10: 00007ffe29106020 R11: 0000000000000246 R12: 0000000000000000
<4> [571.871019] R13: 000055f2411e3000 R14: 0000000000000020 R15: 0000000000000048
<4> [571.871026] irq event stamp: 344384
<4> [571.871029] hardirqs last  enabled at (344383): [<ffffffff8112b707>] console_unlock+0x3f7/0x5a0
<4> [571.871031] hardirqs last disabled at (344384): [<ffffffff81001a8a>] trace_hardirqs_off_thunk+0x1a/0x20
<4> [571.871073] softirqs last  enabled at (344330): [<ffffffffa06fc046>] i915_request_add+0xa6/0x330 [i915]
<4> [571.871114] softirqs last disabled at (344288): [<ffffffffa06fc046>] i915_request_add+0xa6/0x330 [i915]
<4> [571.871154] WARNING: CPU: 6 PID: 7504 at drivers/gpu/drm/i915/gt/uc/intel_uc.c:36 __uc_sanitize+0xb0/0x1a0 [i915]
<4> [571.871156] ---[ end trace 11fff873b33acd37 ]---
Comment 1 CI Bug Log 2019-09-12 12:06:22 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-icl-guc: igt@i915_selftest@live_hangcheck - dmesg-warn - GuC status: 0x8002f076, MIA core expected to be in reset
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6875/fi-icl-guc/igt@i915_selftest@live_hangcheck.html
Comment 2 Lakshmi 2019-09-25 17:24:54 UTC
Daniele,  Can you please add your assessment and set the severity here?
Comment 3 Daniele Ceraolo Spurio 2019-09-25 18:49:18 UTC
It looks like something went wrong in the HW.
The test does lots of full gt resets; i915 resets the GuC before doing a full gt reset and in one case the GuC reset seems to have gone wrong. The HW reported a successful reset (confirmed by the lack of error logs for this) but the GUC_STATUS register, which we check right after the reset, has not been correctly reset and contains a status that makes no sense: one part indicates that GuC is executing the bootrom, while another part that GuC loading has completed and that GuC is in halt state, both of which are not something that can happen immediately after the reset has completed.
We only reset and reload GuC after a resume or a GT reset and in both of those cases we reset the GuC at least twice (once in GT reset/sanitize and once immediately before load), so we can handle one of the resets not correctly resetting all the regs. If all the resets fail then we could end up failing to load GuC, but this is very unlikely.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.