Bug 110622

Summary: [CI][BAT][guc] igt@i915_selftest@live_guc - dmesg-fail - BUG: unable to handle kernel paging request
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED WONTFIX QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BXT i915 features: firmware/guc

Description Martin Peres 2019-05-06 13:05:45 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6039/fi-apl-guc/igt@i915_selftest@live_guc.html

<6> [526.753530] i915: Running intel_guc_live_selftest/igt_guc_clients
<6> [526.753551] Max number of doorbells: 256
<7> [526.755145] [drm:guc_client_alloc [i915]] client 0 (high prio=no) reserved doorbell: 0
<7> [526.755244] [drm:guc_client_alloc [i915]] reserved cacheline 0x80, next 0xc0, linesize 64
<7> [526.755340] [drm:guc_client_alloc [i915]] new priority 2 client 00000000e01b484e for engine(s) 0x47: stage_id 0
<7> [526.755434] [drm:guc_client_alloc [i915]] doorbell id 0, cacheline offset 0x80
<7> [526.756389] [drm:guc_client_alloc [i915]] client 1 (high prio=yes) reserved doorbell: 128
<7> [526.756490] [drm:guc_client_alloc [i915]] reserved cacheline 0xc0, next 0x100, linesize 64
<7> [526.756585] [drm:guc_client_alloc [i915]] new priority 0 client 000000001df82430 for engine(s) 0x47: stage_id 1
<7> [526.756680] [drm:guc_client_alloc [i915]] doorbell id 128, cacheline offset 0xc0
<1> [526.757010] BUG: unable to handle kernel paging request at ffffc90000b19000
<1> [526.757016] #PF error: [WRITE]
<6> [526.757019] PGD 276887067 P4D 276887067 PUD 276944067 PMD 274fdb067 PTE 0
<4> [526.757026] Oops: 0002 [#1] PREEMPT SMP NOPTI
<4> [526.757031] CPU: 0 PID: 4400 Comm: i915_selftest Tainted: G     U            5.1.0-rc7-CI-CI_DRM_6039+ #1
<4> [526.757035] Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
<4> [526.757154] RIP: 0010:__guc_client_enable+0x99/0xae0 [i915]
<4> [526.757159] Code: 72 20 89 0a 8b 4d 24 89 4a 2c 8b 5d 28 48 8b 55 18 4c 8b 65 10 48 69 db 12 01 00 00 48 03 9a 88 02 00 00 48 8d 7b 08 48 89 d9 <48> c7 03 00 00 00 00 48 c7 83 0a 01 00 00 00 00 00 00 48 83 e7 f8
<4> [526.757162] RSP: 0018:ffffc900002b7a20 EFLAGS: 00010286
<4> [526.757166] RAX: 0000000000000000 RBX: ffffc90000b19000 RCX: ffffc90000b19000
<4> [526.757169] RDX: ffff888262de11d8 RSI: 0000000100002000 RDI: ffffc90000b19008
<4> [526.757172] RBP: ffff888269d3d8c8 R08: 00000000b6e9d5d9 R09: 0000000000000000
<4> [526.757175] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888267881a98
<4> [526.757178] R13: ffff888262de14c0 R14: 00000000f7e00009 R15: 0000000000000000
<4> [526.757181] FS:  00007fba838aa980(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
<4> [526.757184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [526.757187] CR2: ffffc90000b19000 CR3: 0000000246964000 CR4: 00000000003406f0
<4> [526.757190] Call Trace:
<4> [526.757289]  guc_clients_enable.isra.12+0x14/0x50 [i915]
<4> [526.757384]  igt_guc_clients+0x1b2/0x360 [i915]
<4> [526.757480]  __i915_subtests+0x1a4/0x1e0 [i915]
<4> [526.757578]  __run_selftests+0x112/0x170 [i915]
<4> [526.757672]  i915_live_selftests+0x2c/0x60 [i915]
<4> [526.757752]  i915_pci_probe+0x50/0xa0 [i915]
<4> [526.757762]  pci_device_probe+0xa1/0x120
<4> [526.757770]  really_probe+0xf3/0x3e0
<4> [526.757775]  driver_probe_device+0x10a/0x120
<4> [526.757780]  device_driver_attach+0x4b/0x50
<4> [526.757785]  __driver_attach+0x97/0x130
<4> [526.757790]  ? device_driver_attach+0x50/0x50
<4> [526.757794]  bus_for_each_dev+0x74/0xc0
<4> [526.757800]  bus_add_driver+0x13f/0x210
<4> [526.757805]  ? 0xffffffffa00fb000
<4> [526.757809]  driver_register+0x56/0xe0
<4> [526.757813]  ? 0xffffffffa00fb000
<4> [526.757818]  do_one_initcall+0x58/0x2e0
<4> [526.757823]  ? do_init_module+0x1d/0x1ea
<4> [526.757828]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [526.757833]  ? kmem_cache_alloc_trace+0x261/0x290
<4> [526.757838]  do_init_module+0x56/0x1ea
<4> [526.757843]  load_module+0x2701/0x29e0
<4> [526.757858]  ? __se_sys_finit_module+0xd3/0xf0
<4> [526.757862]  __se_sys_finit_module+0xd3/0xf0
<4> [526.757872]  do_syscall_64+0x55/0x190
<4> [526.757877]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [526.757881] RIP: 0033:0x7fba8315e839
<4> [526.757885] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [526.757888] RSP: 002b:00007fff842ac098 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [526.757892] RAX: ffffffffffffffda RBX: 0000561ec9dcb7c0 RCX: 00007fba8315e839
<4> [526.757895] RDX: 0000000000000000 RSI: 0000561ec9dc40e0 RDI: 0000000000000006
<4> [526.757898] RBP: 0000561ec9dc40e0 R08: 0000000000000004 R09: 0000000000000000
<4> [526.757901] R10: 00007fff842ac210 R11: 0000000000000246 R12: 0000000000000000
<4> [526.757904] R13: 0000561ec9dc5590 R14: 0000000000000020 R15: 0000000000000042
<4> [526.757913] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp btusb btrtl btbcm btintel crct10dif_pclmul crc32_pclmul bluetooth ghash_clmulni_intel ecdh_generic lpc_ich r8169 realtek snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me pinctrl_broxton pinctrl_intel mei prime_numbers [last unloaded: i915]
<0> [526.757942] Dumping ftrace buffer:
<0> [526.757949]    (ftrace buffer empty)
<4> [526.757952] CR2: ffffc90000b19000
<4> [526.757957] ---[ end trace b46cb251f1ea0eb8 ]---
<4> [527.138241] RIP: 0010:__guc_client_enable+0x99/0xae0 [i915]
<4> [527.138251] Code: 72 20 89 0a 8b 4d 24 89 4a 2c 8b 5d 28 48 8b 55 18 4c 8b 65 10 48 69 db 12 01 00 00 48 03 9a 88 02 00 00 48 8d 7b 08 48 89 d9 <48> c7 03 00 00 00 00 48 c7 83 0a 01 00 00 00 00 00 00 48 83 e7 f8
<4> [527.138255] RSP: 0018:ffffc900002b7a20 EFLAGS: 00010286
<4> [527.138259] RAX: 0000000000000000 RBX: ffffc90000b19000 RCX: ffffc90000b19000
<4> [527.138262] RDX: ffff888262de11d8 RSI: 0000000100002000 RDI: ffffc90000b19008
<4> [527.138265] RBP: ffff888269d3d8c8 R08: 00000000b6e9d5d9 R09: 0000000000000000
<4> [527.138268] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888267881a98
<4> [527.138271] R13: ffff888262de14c0 R14: 00000000f7e00009 R15: 0000000000000000
<4> [527.138275] FS:  00007fba838aa980(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
<4> [527.138278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [527.138281] CR2: ffffc90000b19000 CR3: 0000000246964000 CR4: 00000000003406f0
<3> [527.138286] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
<3> [527.138290] in_atomic(): 0, irqs_disabled(): 1, pid: 4400, name: i915_selftest
<4> [527.138293] INFO: lockdep is turned off.
<4> [527.138296] irq event stamp: 296700
<4> [527.138304] hardirqs last  enabled at (296699): [<ffffffff81124f27>] console_unlock+0x3f7/0x5a0
<4> [527.138309] hardirqs last disabled at (296700): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [527.138314] softirqs last  enabled at (296690): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [527.138319] softirqs last disabled at (296677): [<ffffffff810b5519>] irq_exit+0xa9/0xc0
<4> [527.138325] CPU: 0 PID: 4400 Comm: i915_selftest Tainted: G     UD           5.1.0-rc7-CI-CI_DRM_6039+ #1
<4> [527.138328] Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
<4> [527.138331] Call Trace:
<4> [527.138339]  dump_stack+0x67/0x9b
<4> [527.138346]  ___might_sleep+0x167/0x250
<4> [527.138353]  exit_signals+0x2b/0x2d0
<4> [527.138359]  do_exit+0xa3/0xd90
<4> [527.138370]  rewind_stack_do_exit+0x17/0x20

This may have been introduced by any of these changes:
941848427de7 drm-tip: 2019y-05m-03d-12h-56m-38s UTC integration manifest
d9291d803f21 RFC: console: hack up console_trylock more
e42ab48f69c8 RFC: soft/hardlookup: taint kernel
74aacc03f0f5 RFC: hung_task: taint kernel
99b69db57544 ICL HACK: Disable ACPI idle driver
6ce8a59aab9f ICL HACK: drm/i915/opregion: ICL should have opregion 2.1+ and relative rvda
d1021c47c27e ICL HACK: perf/x86: Bump INTEL_PMC_MAX_FIXED for Icelake
f56c8566c63d ICL HACK: usb/icl: Work around ACPI boottime crash
3d6f178fbb3f net/phy: Debugging for https://bugzilla.kernel.org/show_bug.cgi?id=202321
6dfa4d53ba03 perf/core: Avoid removing shared pmu_context on unregister
533fca65a614 libata: Downgrade unsupported feature warnings to notifications
ef47d1402920 x86: Downgrade clock throttling thermal event critical error
f63a131478a2 kernel/panic: Show the stacktrace after additional notifier messages
a9f840bdd2fd net/sch_generic: Shut up noise
ac6a199df540 kernel/panic: Repeat the line and caller information at the end of the OOPS
95848055f96d lockdep: Swap storage for pin_count and references
6c0b65289b18 lockdep: Up MAX_LOCKDEP_CHAINS
59e9086f7c26 mm/vmalloc: Replace opencoded 4-level page walkers
80c65ed698fa ftrace: Allow configuring global trace buffer size (for dump-on-oops)
e53f0aced0dd drm-tip: 2019y-05m-03d-12h-10m-12s UTC integration manifest
5172ea7b473b ICL HACK: Disable ACPI idle driver
461b891bacbd ICL HACK: drm/i915/opregion: ICL should have opregion 2.1+ and relative rvda
dcd2e2a90647 ICL HACK: perf/x86: Bump INTEL_PMC_MAX_FIXED for Icelake
b8bff9481ab1 ICL HACK: usb/icl: Work around ACPI boottime crash
17d0077a8c46 net/phy: Debugging for https://bugzilla.kernel.org/show_bug.cgi?id=202321
1e2f3e2c7e5a sysfs: Disable lockdep for driver bind/unbind files
361394453180 perf/core: Avoid removing shared pmu_context on unregister
4d2b6a105a1e libata: Downgrade unsupported feature warnings to notifications
b7ea6ff98250 x86: Downgrade clock throttling thermal event critical error
0fe529c84b31 kernel/panic: Show the stacktrace after additional notifier messages
bf3ed3da66e0 net/sch_generic: Shut up noise
0bb52651f7cf kernel/panic: Repeat the line and caller information at the end of the OOPS
6d0d4dfd5583 lockdep: Swap storage for pin_count and references
d54c2eb66b51 lockdep: Up MAX_LOCKDEP_CHAINS
3ca83f85b4f6 mm/vmalloc: Replace opencoded 4-level page walkers
45e78c46442f ftrace: Allow configuring global trace buffer size (for dump-on-oops)
Comment 1 CI Bug Log 2019-05-06 13:06:29 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-apl-guc: igt@i915_selftest@live_guc - dmesg-fail - BUG: unable to handle kernel paging request
  (No new failures associated)

* fi-apl-guc: igt@runner@aborted - fail - Previous test: i915_selftest (live_guc)
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12952/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12957/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6039/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12959/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12960/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4253/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6040/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6041/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12961/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4254/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12962/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12963/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12964/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6042/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12965/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4255/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6044/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4263/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2942/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6047/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12968/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6049/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4265/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12969/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_2943/fi-apl-guc/igt@runner@aborted.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_15/fi-apl-guc/igt@runner@aborted.html
Comment 2 Chris Wilson 2019-05-06 17:44:42 UTC
Simple answer is for the live_guc to skip if wedged. That may work out neatly if GUC_SUBMISSION is marked as disabled upon wedging.
Comment 3 Chris Wilson 2019-05-06 17:47:05 UTC
(In reply to Martin Peres from comment #0)
> This may have been introduced by any of these changes:
> 941848427de7 drm-tip: 2019y-05m-03d-12h-56m-38s UTC integration manifest
...

None of the above, it was a latent bug.
Comment 4 Chris Wilson 2019-05-28 11:17:19 UTC
commit a2904ade3dc28cf1a1b7deded41f4369f75e664c
Author: Michal Wajdeczko <michal.wajdeczko@intel.com>
Date:   Mon May 27 18:35:58 2019 +0000

    drm/i915/guc: Don't allow GuC submission
    
    Due to the upcoming changes to the GuC ABI interface, we must
    disable GuC submission mode until final ABI will be available
    on all GuC firmwares.
    
    To avoid regressions on systems configured to run with no longer
    supported configuration "enable_guc=3" or "enable_guc=1" clear
    GuC submission bit.
    
    v2: force switch to non-GuC submission mode
    v3: use GEM_BUG_ON (Joonas)
    
    Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Cc: John Spotswood <john.a.spotswood@intel.com>
    Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
    Cc: Tony Ye <tony.ye@intel.com>
    Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
    Cc: Jeff Mcgee <jeff.mcgee@intel.com>
    Cc: Antonio Argenziano <antonio.argenziano@intel.com>
    Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
    Cc: Martin Peres <martin.peres@linux.intel.com>
    Acked-by: Martin Peres <martin.peres@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190527183613.17076-3-michal.wajdeczko@intel.com

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.