https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-skl-gvtdvm/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-draw-mmap-cpu.html running: igt/kms_frontbuffer_tracking/fbcdrrs-1p-primscrn-cur-indfb-draw-mmap-cpu [00/97] | owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10) FATAL: command execution failed java.io.EOFException This is all in dmesg: <5>[ 26.332741] owatch: Using watchdog device /dev/watchdog0 <5>[ 26.336059] owatch: Watchdog /dev/watchdog0 is a software watchdog <5>[ 26.344558] owatch: timeout for /dev/watchdog0 set to 410 (requested 410) <6>[ 31.746834] perf: interrupt took too long (4097 > 4026), lowering kernel.perf_event_max_sample_rate to 48000 <7>[ 38.036000] [IGT] kms_frontbuffer_tracking: executing <6>[ 41.810611] perf: interrupt took too long (5141 > 5121), lowering kernel.perf_event_max_sample_rate to 38000 <6>[ 57.951869] perf: interrupt took too long (6455 > 6426), lowering kernel.perf_event_max_sample_rate to 30000 <6>[ 84.774600] perf: interrupt took too long (8074 > 8068), lowering kernel.perf_event_max_sample_rate to 24000 <6>[ 128.839162] perf: interrupt took too long (10096 > 10092), lowering kernel.perf_event_max_sample_rate to 19000
This is the second Softdog incomplete on the first test on fi-skl-gvtdvm, see bug 105599
Here is another frontbuffer_tracking Softdog that is also on the first test in the shard: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-skl-gvtdvm/igt@kms_frontbuffer_tracking@psrdrrs-1p-primscrn-cur-indfb-move.html running: igt/kms_frontbuffer_tracking/psrdrrs-1p-primscrn-cur-indfb-move [00/97] | owatch: TIMEOUT! This is all in dmesg: <5>[ 22.259150] owatch: Using watchdog device /dev/watchdog0 <5>[ 22.260727] owatch: Watchdog /dev/watchdog0 is a software watchdog <5>[ 22.262484] owatch: timeout for /dev/watchdog0 set to 410 (requested 410) <6>[ 22.274177] perf: interrupt took too long (2567 > 2500), lowering kernel.perf_event_max_sample_rate to 77000 <6>[ 27.457786] perf: interrupt took too long (3292 > 3208), lowering kernel.perf_event_max_sample_rate to 60000 <7>[ 28.407794] [IGT] kms_frontbuffer_tracking: executing <6>[ 195.917343] perf: interrupt took too long (4658 > 4115), lowering kernel.perf_event_max_sample_rate to 42000
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-skl-gvtdvm/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-indfb-draw-mmap-wc.html running: igt/kms_frontbuffer_tracking/psr-2p-scndscrn-pri-indfb-draw-mmap-wc [00/97] | owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10) dmesg: <5>[ 23.179763] owatch: Using watchdog device /dev/watchdog0 <5>[ 23.181583] owatch: Watchdog /dev/watchdog0 is a software watchdog <5>[ 23.183544] owatch: timeout for /dev/watchdog0 set to 410 (requested 410) <6>[ 23.635390] perf: interrupt took too long (2526 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 <6>[ 28.698246] perf: interrupt took too long (3259 > 3157), lowering kernel.perf_event_max_sample_rate to 61000 <7>[ 29.104037] [IGT] kms_frontbuffer_tracking: executing
*** Bug 105599 has been marked as a duplicate of this bug. ***
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_3/fi-skl-gvtdvm/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-indfb-pgflip-blt.html running: igt/kms_frontbuffer_tracking/drrs-2p-scndscrn-indfb-pgflip-blt [00/97] | owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10) <5>[ 23.878119] owatch: Using watchdog device /dev/watchdog0 <5>[ 23.879865] owatch: Watchdog /dev/watchdog0 is a software watchdog <5>[ 23.881902] owatch: timeout for /dev/watchdog0 set to 410 (requested 410) <6>[ 25.013459] perf: interrupt took too long (2544 > 2500), lowering kernel.perf_event_max_sample_rate to 78000 <7>[ 29.755628] [IGT] kms_frontbuffer_tracking: executing <6>[ 37.658039] perf: interrupt took too long (3295 > 3180), lowering kernel.perf_event_max_sample_rate to 60000
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-bdw-gvtdvm/igt@kms_properties@plane-properties-atomic.html run.log: running: igt/kms_properties/plane-properties-atomic [00/97] | owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10) Full dmesg: <5>[ 24.071216] owatch: Using watchdog device /dev/watchdog0 <5>[ 24.073518] owatch: Watchdog /dev/watchdog0 is a software watchdog <5>[ 24.076762] owatch: timeout for /dev/watchdog0 set to 410 (requested 410) <6>[ 25.127388] perf: interrupt took too long (3211 > 3197), lowering kernel.perf_event_max_sample_rate to 62000 <6>[ 30.692676] perf: interrupt took too long (4124 > 4013), lowering kernel.perf_event_max_sample_rate to 48000 <7>[ 31.118853] [IGT] kms_properties: executing boot log: <4>[ 10.874799] ------------[ cut here ]------------ <3>[ 10.875470] ODEBUG: free active (active state 0) object type: hrtimer hint: hrtimer_wakeup+0x0/0x20 <4>[ 10.876334] WARNING: CPU: 0 PID: 196 at lib/debugobjects.c:291 debug_print_object+0x67/0x80 <4>[ 10.877186] Modules linked in: i915(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel prime_numbers e1000(+) i2c_piix4 <4>[ 10.878107] CPU: 0 PID: 196 Comm: systemd-udevd Not tainted 4.16.0-rc5-g639c78d2805b-drmtip_4+ #1 <4>[ 10.878977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 10.879886] RIP: 0010:debug_print_object+0x67/0x80 <4>[ 10.880345] RSP: 0018:ffffa5c0c027fcd8 EFLAGS: 00010092 <4>[ 10.880895] RAX: 0000000000000057 RBX: ffffa24c27a8dbf0 RCX: 0000000000000002 <4>[ 10.881399] RDX: 0000000080000002 RSI: ffffffffa8076546 RDI: 00000000ffffffff <4>[ 10.881948] RBP: ffffffffa8249cc0 R08: 0000000000000000 R09: 0000000000000000 <4>[ 10.882452] R10: ffffa5c0c027fc70 R11: ffffffffa70efb8b R12: ffffffffa80747d4 <4>[ 10.883022] R13: ffffffffa97b9560 R14: 0000000000000001 R15: ffffffffa97b9568 <4>[ 10.883528] FS: 00007f5f6abb8480(0000) GS:ffffa24c3fc00000(0000) knlGS:0000000000000000 <4>[ 10.884392] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 10.884912] CR2: 00005572006db178 CR3: 00000000780c0006 CR4: 00000000003606f0 <4>[ 10.885417] Call Trace: <4>[ 10.885895] debug_check_no_obj_freed+0x1ef/0x230 <4>[ 10.886364] __vunmap+0x5b/0x110 <4>[ 10.886797] load_module+0x2347/0x2dc0 <4>[ 10.887261] ? show_coresize+0x20/0x20 <4>[ 10.887701] ? vfs_read+0x126/0x150 <4>[ 10.888179] ? SyS_finit_module+0xa5/0xe0 <4>[ 10.888631] SyS_finit_module+0xa5/0xe0 <4>[ 10.889117] do_syscall_64+0x65/0x1a0 <4>[ 10.889557] entry_SYSCALL_64_after_hwframe+0x42/0xb7 <4>[ 10.890065] RIP: 0033:0x7f5f6a4bc839 <4>[ 10.890501] RSP: 002b:00007fff477482b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 10.891346] RAX: ffffffffffffffda RBX: 00005640e00d8730 RCX: 00007f5f6a4bc839 <4>[ 10.891851] RDX: 0000000000000000 RSI: 00007f5f6a19b0e5 RDI: 000000000000000f <4>[ 10.892402] RBP: 00007f5f6a19b0e5 R08: 0000000000000000 R09: 00007fff477483d0 <4>[ 10.892907] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 <4>[ 10.893452] R13: 00005640e00cdd30 R14: 0000000000020000 R15: 00005640e00d8730 <4>[ 10.893959] Code: c1 8b 43 10 83 c2 01 8b 4b 14 89 15 2c b5 28 02 4c 8b 45 00 4c 89 e6 48 c7 c7 30 83 09 a8 48 8b 14 c5 e0 3b e4 a7 e8 09 f3 c1 ff <0f> 0b 5b 83 05 2b 48 ee 00 01 5d 41 5c c3 83 05 20 48 ee 00 01 <4>[ 10.895420] WARNING: CPU: 0 PID: 196 at lib/debugobjects.c:291 debug_print_object+0x67/0x80 <4>[ 10.896303] ---[ end trace 77cae12efc453097 ]--- <4>[ 10.896773] ====================================================== <4>[ 10.896773] WARNING: possible circular locking dependency detected <4>[ 10.896773] 4.16.0-rc5-g639c78d2805b-drmtip_4+ #1 Not tainted <4>[ 10.896774] ------------------------------------------------------ <4>[ 10.896774] systemd-udevd/196 is trying to acquire lock: <4>[ 10.896774] ((console_sem).lock){..-.}, at: [<00000000673d59fd>] down_trylock+0xa/0x30 <4>[ 10.896776] but task is already holding lock: <4>[ 10.896776] (&obj_hash[i].lock){-.-.}, at: [<000000002f519ca4>] debug_check_no_obj_freed+0xa4/0x230 <4>[ 10.896777] which lock already depends on the new lock. <4>[ 10.896778] the existing dependency chain (in reverse order) is: <4>[ 10.896779] -> #3 (&obj_hash[i].lock){-.-.}: <4>[ 10.896780] __debug_object_init+0x5f/0x430 <4>[ 10.896780] hrtimer_init+0x1b/0x150 <4>[ 10.896780] init_dl_task_timer+0x17/0x30 <4>[ 10.896781] __sched_fork.isra.16+0xa7/0x100 <4>[ 10.896781] init_idle+0x53/0x270 <4>[ 10.896781] sched_init+0x42f/0x4a5 <4>[ 10.896782] start_kernel+0x20e/0x3f0 <4>[ 10.896782] secondary_startup_64+0xa5/0xb0 <4>[ 10.896782] -> #2 (&rq->lock){-.-.}: <4>[ 10.896783] task_fork_fair+0x36/0x120 <4>[ 10.896784] sched_fork+0x12f/0x2a0 <4>[ 10.896784] copy_process.part.7+0x5e4/0x1d90 <4>[ 10.896784] _do_fork+0xc0/0x6b0 <4>[ 10.896785] kernel_thread+0x20/0x30 <4>[ 10.896785] rest_init+0x1d/0x220 <4>[ 10.896785] start_kernel+0x3e8/0x3f0 <4>[ 10.896786] secondary_startup_64+0xa5/0xb0 <4>[ 10.896786] -> #1 (&p->pi_lock){-.-.}: <4>[ 10.896787] try_to_wake_up+0x27/0x610 <4>[ 10.896787] up+0x3b/0x50 <4>[ 10.896788] __up_console_sem+0x2e/0x50 <4>[ 10.896788] console_unlock+0x31f/0x650 <4>[ 10.896788] con_write+0x43/0x60 <4>[ 10.896789] n_tty_write+0x1a7/0x450 <4>[ 10.896789] tty_write+0x1b6/0x2e0 <4>[ 10.896789] do_iter_write+0x140/0x180 <4>[ 10.896790] vfs_writev+0x84/0xf0 <4>[ 10.896790] do_writev+0x4c/0xd0 <4>[ 10.896790] do_syscall_64+0x65/0x1a0 <4>[ 10.896790] entry_SYSCALL_64_after_hwframe+0x42/0xb7 <4>[ 10.896791] -> #0 ((console_sem).lock){..-.}: <4>[ 10.896792] _raw_spin_lock_irqsave+0x33/0x50 <4>[ 10.896792] down_trylock+0xa/0x30 <4>[ 10.896793] __down_trylock_console_sem+0x20/0x80 <4>[ 10.896793] console_trylock+0xe/0x60 <4>[ 10.896793] vprintk_emit+0x24f/0x4f0 <4>[ 10.896793] printk+0x3e/0x46 <4>[ 10.896794] __warn_printk+0x37/0x70 <4>[ 10.896794] debug_print_object+0x67/0x80 <4>[ 10.896794] debug_check_no_obj_freed+0x1ef/0x230 <4>[ 10.896795] __vunmap+0x5b/0x110 <4>[ 10.896795] load_module+0x2347/0x2dc0 <4>[ 10.896795] SyS_finit_module+0xa5/0xe0 <4>[ 10.896795] do_syscall_64+0x65/0x1a0 <4>[ 10.896796] entry_SYSCALL_64_after_hwframe+0x42/0xb7 <4>[ 10.896796] other info that might help us debug this: <4>[ 10.896797] Chain exists of: <4>[ 10.896797] (console_sem).lock --> &rq->lock --> &obj_hash[i].lock <4>[ 10.896799] Possible unsafe locking scenario: <4>[ 10.896799] CPU0 CPU1 <4>[ 10.896800] ---- ---- <4>[ 10.896800] lock(&obj_hash[i].lock); <4>[ 10.896801] lock(&rq->lock); <4>[ 10.896801] lock(&obj_hash[i].lock); <4>[ 10.896802] lock((console_sem).lock); <4>[ 10.896803] *** DEADLOCK *** <4>[ 10.896803] 1 lock held by systemd-udevd/196: <4>[ 10.896804] #0: (&obj_hash[i].lock){-.-.}, at: [<000000002f519ca4>] debug_check_no_obj_freed+0xa4/0x230 <4>[ 10.896805] stack backtrace: <4>[ 10.896805] CPU: 0 PID: 196 Comm: systemd-udevd Not tainted 4.16.0-rc5-g639c78d2805b-drmtip_4+ #1 <4>[ 10.896806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 10.896806] Call Trace: <4>[ 10.896807] dump_stack+0x5f/0x86 <4>[ 10.896807] print_circular_bug.isra.18+0x1d0/0x2c0 <4>[ 10.896807] __lock_acquire+0x14ae/0x1b60 <4>[ 10.896807] ? lock_acquire+0xaf/0x200 <4>[ 10.896808] lock_acquire+0xaf/0x200 <4>[ 10.896808] ? down_trylock+0xa/0x30 <4>[ 10.896808] ? vprintk_emit+0x24f/0x4f0 <4>[ 10.896809] _raw_spin_lock_irqsave+0x33/0x50 <4>[ 10.896809] ? down_trylock+0xa/0x30 <4>[ 10.896809] down_trylock+0xa/0x30 <4>[ 10.896809] __down_trylock_console_sem+0x20/0x80 <4>[ 10.896810] console_trylock+0xe/0x60 <4>[ 10.896810] vprintk_emit+0x24f/0x4f0 <4>[ 10.896810] printk+0x3e/0x46 <4>[ 10.896811] ? clock_was_set_work+0x20/0x20 <4>[ 10.896811] __warn_printk+0x37/0x70 <4>[ 10.896811] ? debug_check_no_obj_freed+0xa4/0x230 <4>[ 10.896811] ? clock_was_set_work+0x20/0x20 <4>[ 10.896812] debug_print_object+0x67/0x80 <4>[ 10.896812] debug_check_no_obj_freed+0x1ef/0x230 <4>[ 10.896812] __vunmap+0x5b/0x110 <4>[ 10.896813] load_module+0x2347/0x2dc0 <4>[ 10.896813] ? show_coresize+0x20/0x20 <4>[ 10.896813] ? vfs_read+0x126/0x150 <4>[ 10.896813] ? SyS_finit_module+0xa5/0xe0 <4>[ 10.896814] SyS_finit_module+0xa5/0xe0 <4>[ 10.896814] do_syscall_64+0x65/0x1a0 <4>[ 10.896814] entry_SYSCALL_64_after_hwframe+0x42/0xb7 <4>[ 10.896815] RIP: 0033:0x7f5f6a4bc839 <4>[ 10.896815] RSP: 002b:00007fff477482b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 10.896816] RAX: ffffffffffffffda RBX: 00005640e00d8730 RCX: 00007f5f6a4bc839 <4>[ 10.896816] RDX: 0000000000000000 RSI: 00007f5f6a19b0e5 RDI: 000000000000000f <4>[ 10.896816] RBP: 00007f5f6a19b0e5 R08: 0000000000000000 R09: 00007fff477483d0 <4>[ 10.896817] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 <4>[ 10.896817] R13: 00005640e00cdd30 R14: 0000000000020000 R15: 00005640e00d8730 <1>[ 12.520047] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 <1>[ 12.520325] IP: hrtimer_active+0x4/0x40 <6>[ 12.520485] PGD 0 P4D 0 <4>[ 12.520613] Oops: 0000 [#1] PREEMPT SMP PTI <4>[ 12.520772] Modules linked in: i915(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel prime_numbers e1000(+) i2c_piix4 <4>[ 12.521105] CPU: 0 PID: 196 Comm: systemd-udevd Tainted: G W 4.16.0-rc5-g639c78d2805b-drmtip_4+ #1 <4>[ 12.521391] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 12.521699] RIP: 0010:hrtimer_active+0x4/0x40 <4>[ 12.521860] RSP: 0018:ffffa5c0c027fca8 EFLAGS: 00010282 <4>[ 12.522038] RAX: 0000000000000000 RBX: ffffa5c0c0343a18 RCX: 0000000000000001 <4>[ 12.522276] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffffa5c0c0343a18 <4>[ 12.522490] RBP: 0000000000000000 R08: ffffa5c0c0343a18 R09: 0000000000000000 <4>[ 12.522705] R10: 0000000000000000 R11: ffffffffa70efb8b R12: ffffa5c0c548f000 <4>[ 12.522920] R13: ffffffffa97b9560 R14: ffffffffa8249cc0 R15: ffffffffa97b9568 <4>[ 12.523166] FS: 00007f5f6abb8480(0000) GS:ffffa24c3fc00000(0000) knlGS:0000000000000000 <4>[ 12.523412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 12.523601] CR2: 0000000000000010 CR3: 00000000780c0005 CR4: 00000000003606f0 <4>[ 12.524112] Call Trace: <4>[ 12.524563] hrtimer_try_to_cancel+0x14/0x1d0 <4>[ 12.525016] hrtimer_fixup_free+0x15/0x40 <4>[ 12.525505] debug_object_fixup+0xf/0x30 <4>[ 12.525948] debug_check_no_obj_freed+0x221/0x230 <4>[ 12.526451] __vunmap+0x5b/0x110 <4>[ 12.526893] load_module+0x2347/0x2dc0 <4>[ 12.527399] ? show_coresize+0x20/0x20 <4>[ 12.527844] ? vfs_read+0x126/0x150 <4>[ 12.528285] ? SyS_finit_module+0xa5/0xe0 <4>[ 12.528760] SyS_finit_module+0xa5/0xe0 <4>[ 12.529204] do_syscall_64+0x65/0x1a0 <4>[ 12.529685] entry_SYSCALL_64_after_hwframe+0x42/0xb7 <4>[ 12.530151] RIP: 0033:0x7f5f6a4bc839 <4>[ 12.530628] RSP: 002b:00007fff477482b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 12.531433] RAX: ffffffffffffffda RBX: 00005640e00d8730 RCX: 00007f5f6a4bc839 <4>[ 12.531976] RDX: 0000000000000000 RSI: 00007f5f6a19b0e5 RDI: 000000000000000f <4>[ 12.532487] RBP: 00007f5f6a19b0e5 R08: 0000000000000000 R09: 00007fff477483d0 <4>[ 12.533031] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 <4>[ 12.533550] R13: 00005640e00cdd30 R14: 0000000000020000 R15: 00005640e00d8730 <4>[ 12.534085] Code: ff ff ff 48 8b 57 70 eb bf 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 47 28 10 b5 10 a7 48 89 77 40 c3 0f 1f 00 48 8b 47 30 <8b> 50 10 f6 c2 01 75 1d 80 7f 38 00 75 23 48 3b 78 38 74 1d 39 <1>[ 12.535484] RIP: hrtimer_active+0x4/0x40 RSP: ffffa5c0c027fca8 <4>[ 12.536008] CR2: 0000000000000010 <4>[ 12.536447] ---[ end trace 77cae12efc453098 ]--- <3>[ 12.536945] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 <3>[ 12.537821] in_atomic(): 0, irqs_disabled(): 1, pid: 196, name: systemd-udevd <4>[ 12.538327] INFO: lockdep is turned off. <4>[ 12.538771] irq event stamp: 365170 <4>[ 12.539247] hardirqs last enabled at (365169): [<0000000025e2fc5e>] _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 12.540135] hardirqs last disabled at (365170): [<0000000037f8c343>] _raw_spin_lock_irqsave+0xd/0x50 <4>[ 12.541021] softirqs last enabled at (360940): [<0000000020eb4dde>] __do_softirq+0x3a1/0x4aa <4>[ 12.541847] softirqs last disabled at (360933): [<000000004c971658>] irq_exit+0xa4/0xb0 <4>[ 12.542683] CPU: 0 PID: 196 Comm: systemd-udevd Tainted: G D W 4.16.0-rc5-g639c78d2805b-drmtip_4+ #1 <4>[ 12.543580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 12.544511] Call Trace: <4>[ 12.544928] dump_stack+0x5f/0x86 <4>[ 12.545384] ___might_sleep+0x1d9/0x240 <4>[ 12.545826] exit_signals+0x1b/0x2a0 <4>[ 12.546304] do_exit+0x93/0xcb0 <4>[ 12.546734] rewind_stack_do_exit+0x17/0x20 <6>[ 13.042212] PCI Interrupt Link [LNKC] enabled at IRQ 10
I created a small debug patch which hopefully reproduces the lockdep splat every time: https://patchwork.freedesktop.org/patch/211427/ I have no idea how to fix the lockdep splat, too deep in core code. Also no idea whether the lockdep splat indicates a real issue that might hang the test or not.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_3/fi-skl-gvtdvm/igt@gem_exec_reuse@contexts.html
(In reply to Daniel Vetter from comment #7) > I created a small debug patch which hopefully reproduces the lockdep splat > every time: > > https://patchwork.freedesktop.org/patch/211427/ > > I have no idea how to fix the lockdep splat, too deep in core code. Also no > idea whether the lockdep splat indicates a real issue that might hang the > test or not. Daniel, I only put the test with the lockdep that incompletes on this bug.
First test lockdep in boot.log: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_5/fi-skl-gvtdvm/igt@perf_pmu@init-sema-vcs1.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_6/fi-bdw-gvtdvm/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-shrfb-draw-blt.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-skl-gvtdvm/igt@gem_mmap_gtt@basic-wc.html boot log has lockdep
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-bdw-gvtdvm/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-blt.html softdog first test, bootlog has pi_lock lockdep.
test patch in topic/core-for-CI, hopefully sheds some light on where this stuff is from: commit 0162204f7aa1db746262eb2b68d294c25e5be36c (HEAD -> topic/core-for-CI, drm-intel/topic/core-for-CI) Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Mar 20 17:02:58 2018 +0100 RFC: debugobjects: capture stack traces at _init() time
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-skl-gvtdvm/igt@gem_exec_schedule@reorder-wide-render.html boot log has the lockdep: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-skl-gvtdvm/boot3.log owatch on first test: running: igt/gem_exec_schedule/reorder-wide-render [00/99] | owatch: TIMEOUT!
It also happened here but links are dead on public servers I pulled data from internal to check https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-skl-gvtdvm/igt@kms_vblank@pipe-d-query-forked.html
New hit, now with the ODEBUG backtraced patch: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-skl-gvtdvm/igt@gem_exec_schedule@reorder-wide-render.html Relevant bits of dmesg (without the lockdep splat, which is secondary): <4>[ 10.850889] WARNING: CPU: 0 PID: 194 at lib/debugobjects.c:301 debug_print_object+0xac/0xe0 <4>[ 10.851698] ---[ end trace 229b22432d4ca3d7 ]--- <3>[ 10.852134] ODEBUG: ODEBUG: debug object originally initialized at: <4>[ 10.852640] futex_wait+0x69/0x220 <4>[ 10.853059] do_futex+0x482/0xb00 <4>[ 10.853475] SyS_futex+0x118/0x150 <4>[ 10.853935] do_syscall_64+0x6b/0x1b0 <4>[ 10.854358] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Blowing up at <4>[ 10.854893] debug_print_object+0xac/0xe0 <4>[ 10.854893] ? __schedule+0x2a6/0xbb0 <4>[ 10.854894] ? lock_acquire+0xa6/0x210 <4>[ 10.854894] ? debug_check_no_obj_freed+0xac/0x220 <4>[ 10.854894] debug_check_no_obj_freed+0x1e6/0x220 <4>[ 10.854895] __vunmap+0xab/0x100 <4>[ 10.854895] load_module+0x2388/0x2ac0 <4>[ 10.854895] ? SyS_finit_module+0xb8/0xd0 <4>[ 10.854896] SyS_finit_module+0xb8/0xd0 <4>[ 10.854896] do_syscall_64+0x6b/0x1b0 <4>[ 10.854896] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Followed shortly by an Oops: <1>[ 13.299232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000006 <1>[ 13.299238] IP: do_raw_spin_lock+0x1/0xb0 <6>[ 13.299239] PGD 0 P4D 0 <4>[ 13.299241] Oops: 0000 [#1] PREEMPT SMP PTI <4>[ 13.299243] Modules linked in: i915(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000(+) i2c_piix4 prime_numbers <4>[ 13.299251] CPU: 0 PID: 194 Comm: systemd-udevd Tainted: G W 4.16.0-rc7-g0539b52e05cd-drmtip_8+ #1 <4>[ 13.299264] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 13.299266] RIP: 0010:do_raw_spin_lock+0x1/0xb0 <4>[ 13.299267] RSP: 0018:ffffbbf980257c58 EFLAGS: 00010086 <4>[ 13.299268] RAX: ffffa3833ac6d040 RBX: 0000000000000002 RCX: 0000000000000000 <4>[ 13.299269] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 <4>[ 13.299270] RBP: 0000000000000287 R08: 0000000000000001 R09: ffffffff9010c09f <4>[ 13.299271] R10: ffffbbf980257c00 R11: 0000000000000000 R12: ffffbbf980257ca8 <4>[ 13.299271] R13: 0000000000000003 R14: ffffffff9273da68 R15: ffffffff91249cc0 <4>[ 13.299275] FS: 00007fbecbb66480(0000) GS:ffffa3833fc00000(0000) knlGS:0000000000000000 <4>[ 13.299276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 13.299276] CR2: 0000000000000006 CR3: 0000000077bc8002 CR4: 00000000003606f0 <4>[ 13.299277] Call Trace: <4>[ 13.299281] _raw_spin_lock_irqsave+0x3b/0x50 <4>[ 13.299283] ? lock_hrtimer_base.isra.2+0x1f/0x40 <4>[ 13.299285] lock_hrtimer_base.isra.2+0x1f/0x40 <4>[ 13.299286] hrtimer_try_to_cancel+0x36/0x1d0 <4>[ 13.299288] hrtimer_fixup_free+0x15/0x40 <4>[ 13.299290] debug_object_fixup+0x12/0x30 <4>[ 13.299292] debug_check_no_obj_freed+0x212/0x220 <4>[ 13.299295] __vunmap+0xab/0x100 <4>[ 13.299298] load_module+0x2388/0x2ac0 <4>[ 13.299303] ? SyS_finit_module+0xb8/0xd0 <4>[ 13.299305] SyS_finit_module+0xb8/0xd0 <4>[ 13.299309] do_syscall_64+0x6b/0x1b0 <4>[ 13.299312] entry_SYSCALL_64_after_hwframe+0x42/0xb7 I hope I pasted the right bits togethere, the ODEBUG, OOps and lockdep splat are all a bit interleaved.
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_12/fi-skl-gvtdvm/igt@gem_exec_store@pages-bsd2.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_12/fi-bdw-gvtdvm/igt@kms_universal_plane@cursor-fb-leak-pipe-e.html
I couldn't reproduce this issue on my SKL machine which doesn't have eDP screen. I compiled kernel with: CONFIG_CC_STACKPROTECTOR_AUTO=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_SCHED_STACK_END_CHECK=y Please paste your qemu command which is used to boot a vm ? From ci's dmesg, two graphic cards exist after vm boot up, one is emulated qxl or circus card, the other is IGD card, and the emulated qxl or circus card is primary graphic card. So the igt case may run against qxl or circus card, not IGD card. You could use the following qemu command which make sure vm has IGD card only. qemu-system-x86_64 -enable-kvm -smp 2 -m 6G -cpu host -nographic -vga none -hda "$IMAGE" -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 -usb -usbdevice tablet -net nic,macaddr=00:16:3e:60:0a:50,addr=0x3 -net tap,script=/etc/qemu-ifup -vnc :1
This qemu command line is used to boot fi-skl-gvtdvm: qemu-system-x86_64 -enable-kvm -vga cirrus -m 2048 -hda /home/testrunner/ubuntu-16.04.img -hdb /home/testrunner/linux_hd.img -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x6 -usb -usbdevice tablet -net nic,macaddr=b8:ae:ed:7f:76:38 -net tap,script=/etc/qemu-ifup -serial stdio -cpu host -vnc :1 > guest.log 2>&1
Some new ones: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-skl-gvtdvm/igt@perf_pmu@busy-idle-vecs0.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_14/fi-bdw-gvtdvm/igt@prime_vgem@fence-read-hang.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_15/fi-bdw-gvtdvm/igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-shrfb-draw-pwrite.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_17/fi-skl-gvtdvm/igt@prime_self_import@basic-with_one_bo_two_files.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_17/fi-bdw-gvtdvm/igt@gem_exec_capture@capture-bsd2.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_18/fi-skl-gvtdvm/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_19/fi-bdw-gvtdvm/igt@kms_psr_sink_crc@psr_basic.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_21/fi-bdw-gvtdvm/igt@gem_wait@await-bsd.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_19/fi-skl-gvtdvm/igt@kms_plane_multiple@atomic-pipe-c-tiling-x.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_22/fi-skl-gvtdvm/igt@syncobj_wait@multi-wait-all-for-submit-unsubmitted-submitted.html
Pinged gvtdvm folks, waiting for info.
Moved to gvt team.
Could you try to modify Qemu command: qemu-system-x86_64 -enable-kvm -nographic -vga none -m 2048 -hda /home/testrunner/ubuntu-16.04.img -hdb /home/testrunner/linux_hd.img -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 -usb -usbdevice tablet -net nic,macaddr=b8:ae:ed:7f:76:38,addr=0x3 -net tap,script=/etc/qemu-ifup -serial stdio -cpu host -vnc :1 > guest.log 2>&1
+ Tomi
qemu-system-x86_64 -enable-kvm -nographic -vga none -m 2048 -hda /home/testrunner/ubuntu-16.04.img -hdb /home/testrunner/linux_hd.img -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 -usb -usbdevice tablet -net nic,macaddr=b8:ae:ed:7f:76:38,addr=0x3 -net tap,script=/etc/qemu-ifup -cpu host -vnc :1 > guest.log 2>&1 (In reply to XiongZhang from comment #29) > Could you try to modify Qemu command: > > qemu-system-x86_64 -enable-kvm -nographic -vga none -m 2048 -hda > /home/testrunner/ubuntu-16.04.img -hdb /home/testrunner/linux_hd.img -device > vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 -usb -usbdevice tablet > -net nic,macaddr=b8:ae:ed:7f:76:38,addr=0x3 -net tap,script=/etc/qemu-ifup > -serial stdio -cpu host -vnc :1 > guest.log 2>&1
Tomi, any changes done here on our CI?
The following changes were made to fi-skl-gvtd and fi-bdw-gvtd qemu commandline: OLD NEW -vga cirrus -nographic -vga none -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x6 -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2 -net nic,macaddr=b8:ae:ed:7f:76:38 -net nic,macaddr=b8:ae:ed:7f:76:38,addr=0x3 -serial stdio (removed) The changes will be active from CI_DRM_4270 onwards.
This change caused at least two instances of BDW-GVTDVM to spam lines: *ERROR* Fault errors on pipe A: 0x00000100
Examples after changing the command line: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9168/issues.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9169/issues.html
BDW-GVTDVM and SKL-GVTDVM qemu command line invocation changed as it was, and incomplete issues seemed to disappear.
(In reply to Tomi Sarvela from comment #36) > BDW-GVTDVM and SKL-GVTDVM qemu command line invocation changed as it was, > and incomplete issues seemed to disappear. Still hapenning 100% of the time when running igt@gem_exec_reuse@contexts: - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_87/fi-bdw-gvtdvm/igt@gem_exec_reuse@contexts.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_85/fi-skl-gvtdvm/igt@gem_exec_reuse@contexts.html Can you please check it out?
On a positive note, the "*ERROR* Fault errors on pipe A: 0x00000100" has not been hit in over a month.
(In reply to Martin Peres from comment #37) > (In reply to Tomi Sarvela from comment #36) > > BDW-GVTDVM and SKL-GVTDVM qemu command line invocation changed as it was, > > and incomplete issues seemed to disappear. > > Still hapenning 100% of the time when running igt@gem_exec_reuse@contexts: > > - > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_87/fi-bdw-gvtdvm/ > igt@gem_exec_reuse@contexts.html > - > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_85/fi-skl-gvtdvm/ > igt@gem_exec_reuse@contexts.html > > Can you please check it out? That's a regression common to all machines, <3>[ 126.722643] active_instance:969 GEM_BUG_ON(list_empty(&vma->last_active.link)) <4>[ 126.722783] ------------[ cut here ]------------ <2>[ 126.722785] kernel BUG at drivers/gpu/drm/i915/i915_vma.c:969! <4>[ 126.722806] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4>[ 126.722813] CPU: 0 PID: 1536 Comm: gem_exec_reuse Tainted: G U 4.18.0-rc4-g1a9b9e1bf951-drmtip_87+ #1 <4>[ 126.722834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 <4>[ 126.722895] RIP: 0010:i915_vma_move_to_active+0x6f0/0x700 [i915] <4>[ 126.722902] Code: 6f 8d cd c6 48 8b 35 37 c0 1a 00 49 c7 c0 98 6b 53 c0 b9 c9 03 00 00 48 c7 c2 70 11 50 c0 48 c7 c7 4b 54 42 c0 e8 10 1e d4 c6 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 <4>[ 126.722961] RSP: 0018:ffffa867c079fa90 EFLAGS: 00010286 <4>[ 126.722968] RAX: 000000000000000e RBX: ffff91a42f82d008 RCX: 0000000000000000 <4>[ 126.722975] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff91a43d023a78 <4>[ 126.722982] RBP: ffff91a3cfa91fc0 R08: 0000000000033c72 R09: ffff91a43d02a000 <4>[ 126.722989] R10: 0000000000000000 R11: ffff91a43d023a78 R12: ffff91a430d70680 <4>[ 126.722996] R13: ffff91a43934c040 R14: ffff91a43934c158 R15: 000000000000004f <4>[ 126.723008] FS: 00007f77b23c4980(0000) GS:ffff91a43fc00000(0000) knlGS:0000000000000000 <4>[ 126.723016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 126.723023] CR2: 00007fab6f1ef470 CR3: 00000000759d6003 CR4: 00000000003606f0 <4>[ 126.723030] Call Trace: <4>[ 126.723068] i915_gem_do_execbuffer+0x1239/0x18a0 [i915] <4>[ 126.723080] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 126.723089] ? retint_kernel+0x2d/0x2d <4>[ 126.723123] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4>[ 126.723160] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4>[ 126.723194] i915_gem_execbuffer2_ioctl+0x21b/0x3f0 [i915] <4>[ 126.723229] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4>[ 126.723237] drm_ioctl_kernel+0x7c/0xf0 <4>[ 126.723244] drm_ioctl+0x2e6/0x3a0 <4>[ 126.723276] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4>[ 126.723285] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 126.723292] ? trace_hardirqs_on_caller+0xe0/0x1b0 <4>[ 126.723300] do_vfs_ioctl+0xa0/0x6d0 <4>[ 126.723307] ? __schedule+0x36c/0xbf0 <4>[ 126.723313] ksys_ioctl+0x35/0x60 <4>[ 126.723319] __x64_sys_ioctl+0x11/0x20 <4>[ 126.723325] do_syscall_64+0x55/0x190 <4>[ 126.723331] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 126.723337] RIP: 0033:0x7f77b1c8d5d7 <4>[ 126.723341] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 <4>[ 126.723401] RSP: 002b:00007ffcdaaf8b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 126.723410] RAX: ffffffffffffffda RBX: 00007ffcdaaf9c40 RCX: 00007f77b1c8d5d7 <4>[ 126.723417] RDX: 00007ffcdaaf8bd0 RSI: 0000000040406469 RDI: 0000000000000003 <4>[ 126.723424] RBP: 00007ffcdaaf8bd0 R08: 000000000000000d R09: 00007ffcdab3f080 <4>[ 126.723431] R10: 00000000ffffffd2 R11: 0000000000000246 R12: 0000000040406469 <4>[ 126.723438] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 and needs to be filed separately.
(In reply to Chris Wilson from comment #39) > (In reply to Martin Peres from comment #37) > > (In reply to Tomi Sarvela from comment #36) > > > BDW-GVTDVM and SKL-GVTDVM qemu command line invocation changed as it was, > > > and incomplete issues seemed to disappear. > > > > Still hapenning 100% of the time when running igt@gem_exec_reuse@contexts: > > > > - > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_87/fi-bdw-gvtdvm/ > > igt@gem_exec_reuse@contexts.html > > - > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_85/fi-skl-gvtdvm/ > > igt@gem_exec_reuse@contexts.html > > > > Can you please check it out? > > That's a regression common to all machines, commit 46b1063f91e5680eff2a6b9fe432f6aae3bef54e (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jul 19 08:22:06 2018 +0100 drm/i915: Handle recursive shrinker for vma->last_active allocation If we call into the shrinker for direct relcaim inside kmalloc, it will retire the requests. If we retire the vma->last_active while processing a new i915_vma_move_to_active() we can upset the delicate bookkeeping required for the cache. After the possible invocation of the shrinker, we need to double check the vma->last_active is still valid. Fixes: 8b293eb53a7d ("drm/i915: Track the last-active inside the i915_vma") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105600#c39 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180719072206.16015-1-chris@chris-wilson.co.uk
From the last 5 months, the reproduction rate of this failure is once in 5.25 drmtip runs. Last seen on drmtip 233. According to 10x rule, Closing this bug as WORKSFORME.
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.