Bug 111196

Summary: [CI][SHARDS] igt@i915_selftest@mock_vma - dmesg-warn - watchdog: BUG: soft lockup - CPU#\d+ stuck
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: SKL i915 features: GEM/Other

Description Martin Peres 2019-07-23 07:33:22 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6533/shard-skl3/igt@i915_selftest@mock_vma.html

<6> [2489.965844] i915: Running i915_vma_mock_selftests/igt_vma_partial
<0> [2515.074365] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [i915_selftest:5616]
<4> [2515.074511] Modules linked in: i915(+) i2c_dev vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel btusb btrtl cdc_ether btbcm btintel usbnet snd_pcm bluetooth r8152 mii ecdh_generic ecc i2c_hid pinctrl_sunrisepoint pinctrl_intel prime_numbers [last unloaded: i915]
<4> [2515.074623] irq event stamp: 3675472
<4> [2515.074648] hardirqs last  enabled at (3675471): [<ffffffff81236297>] __slab_free+0x347/0x460
<4> [2515.074667] hardirqs last disabled at (3675472): [<ffffffff8100199a>] trace_hardirqs_off_thunk+0x1a/0x20
<4> [2515.074683] softirqs last  enabled at (3675386): [<ffffffff81c002eb>] __do_softirq+0x2eb/0x465
<4> [2515.074698] softirqs last disabled at (3675379): [<ffffffff810b62be>] irq_exit+0xae/0xc0
<4> [2515.074715] CPU: 3 PID: 5616 Comm: i915_selftest Tainted: G     U            5.3.0-rc1-CI-CI_DRM_6533+ #1
<4> [2515.074726] Hardware name: Google Caroline/Caroline, BIOS MrChromebox 08/27/2018
<4> [2515.074745] RIP: 0010:lock_release+0x182/0x290
<4> [2515.074761] Code: 0e 65 48 8b 3c 25 80 5e 01 00 e8 69 bb ff ff 65 48 8b 04 25 80 5e 01 00 c7 80 44 08 00 00 00 00 00 00 41 55 9d 48 8b 44 24 10 <65> 48 33 04 25 28 00 00 00 0f 85 f2 00 00 00 48 83 c4 18 5b 5d 41
<4> [2515.074772] RSP: 0018:ffffc900001a3a08 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
<4> [2515.074787] RAX: 840ab7c03d287600 RBX: ffff888179ac4fc0 RCX: 000000009db5cb04
<4> [2515.074797] RDX: ffff888179ac5808 RSI: 00000000ed339702 RDI: 00000000ffffffff
<4> [2515.074807] RBP: ffffffff8224a820 R08: ffff888179ac5808 R09: 00000000fffffffe
<4> [2515.074817] R10: 00000000a37d8d85 R11: 00000000fed17f93 R12: ffffffffa02b2be9
<4> [2515.074827] R13: 0000000000000246 R14: 0000000000000002 R15: 0000000000000000
<4> [2515.074841] FS:  00007f47cf09de40(0000) GS:ffff88817ab80000(0000) knlGS:0000000000000000
<4> [2515.074852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [2515.074862] CR2: 000055fd663091f0 CR3: 0000000110c58003 CR4: 00000000003606e0
<4> [2515.074871] Call Trace:
<4> [2515.075262]  i915_gem_object_get_sg+0xf8/0x480 [i915]
<4> [2515.075661]  ? __i915_vma_do_pin+0x5a1/0xd20 [i915]
<4> [2515.076022]  i915_gem_object_get_dma_address+0x1e/0x50 [i915]
<4> [2515.076392]  igt_vma_partial+0x1ce/0x7a0 [i915]
<4> [2515.076832]  __i915_subtests+0xb8/0x220 [i915]
<4> [2515.077203]  ? i915_live_selftests+0x60/0x60 [i915]
<4> [2515.077569]  ? __i915_nop_setup+0x10/0x10 [i915]
<4> [2515.077946]  i915_vma_mock_selftests+0x82/0xf0 [i915]
<4> [2515.078311]  __run_selftests+0x139/0x180 [i915]
<4> [2515.078329]  ? 0xffffffffa0520000
<4> [2515.078714]  i915_mock_selftests+0x27/0x50 [i915]
<4> [2515.078982]  i915_init+0x12/0x73 [i915]
<4> [2515.078999]  ? 0xffffffffa0520000
<4> [2515.079014]  do_one_initcall+0x58/0x2d0
<4> [2515.079032]  ? do_init_module+0x1d/0x1ee
<4> [2515.079054]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [2515.079071]  ? kmem_cache_alloc_trace+0x29b/0x2c0
<4> [2515.079100]  do_init_module+0x56/0x1ee
<4> [2515.079122]  load_module+0x25bd/0x2a30
<4> [2515.079209]  ? __se_sys_finit_module+0xd3/0xf0
<4> [2515.079223]  __se_sys_finit_module+0xd3/0xf0
<4> [2515.079280]  do_syscall_64+0x55/0x1c0
<4> [2515.079302]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [2515.079316] RIP: 0033:0x7f47ce756839
<4> [2515.079331] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [2515.079343] RSP: 002b:00007fff0c82fd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [2515.079357] RAX: ffffffffffffffda RBX: 000055eafea1fd90 RCX: 00007f47ce756839
<4> [2515.079367] RDX: 0000000000000000 RSI: 000055eafea2a4c0 RDI: 0000000000000006
<4> [2515.079377] RBP: 000055eafea2a4c0 R08: 0000000000000004 R09: 0000000000000000
<4> [2515.079387] R10: 00007fff0c82ff90 R11: 0000000000000246 R12: 0000000000000000
<4> [2515.079397] R13: 000055eafea1c000 R14: 0000000000000020 R15: 0000000000000042
<6> [2518.941271] [IGT] i915_selftest: exiting, ret=0
Comment 2 Chris Wilson 2019-07-23 09:41:48 UTC
It escapes eventually, so I think it just got much slower.
Comment 3 Chris Wilson 2019-07-23 12:31:47 UTC
commit d8bf0e7627e6e887bd0ed3707216e0e69ec95710 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jul 23 10:58:00 2019 +0100

    drm/i915/selftests: Let igt_vma_partial et al breathe
    
    Give the scheduler a chance to breathe by calling cond_resched() as some
    of the loops may take some time on slower machines, and so catch the
    attention of the watchdogs.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111196
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190723095800.2820-1-chris@chris-wilson.co.uk
Comment 4 Lakshmi 2019-09-26 19:56:45 UTC
This issue was 100% reproducible till CI_DRM_6539_full (2 months old). Later no new occurrence, Closing and archiving the bug.
Comment 5 CI Bug Log 2019-09-26 19:57:01 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.