Bug 100258

Summary: [BAT][SKL] Serial console hung in kms_flip when running IGT
Product: DRI Reporter: Tomi Sarvela <tomi.p.sarvela>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED INVALID QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SKL i915 features:

Description Tomi Sarvela 2017-03-17 16:04:33 UTC
Testhost is i7-6700K on Gigabyte Z170 MB. Serial is real COM1 header. Lockup happened while running IGT testlist in loop (started from serial console, if that matters).

Kernel is built from DRM-Tip today. BIOS newest. Kconfig at
https://intel-gfx-ci.01.org/CI/CI_DRM_2352/kernel.config.bz2


[22018.856538] [drm:intel_atomic_commit_tail [i915]] [CRTC:46:pipe C]
[22018.856551] [drm:intel_power_well_disable [i915]] disabling DC off
[22018.856566] [drm:skl_enable_dc6 [i915]] Enabling DC6
[22018.856577] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
[22018.856997] [drm:intel_power_well_disable [i915]] disabling always-on
[22018.857038] [drm:intel_runtime_suspend [i915]] Suspending device
[22018.857680] [IGT] kms_flip: exiting, ret=0
[22018.859602] [drm:intel_runtime_suspend [i915]] Device suspended
[22018.884658] [drm:intel_runtime_resume [i915]] Resuming device
[22018.888911] [drm:intel_runtime_resume [i915]] Device resumed
[22018.888993] [drm:intel_runtime_suspend [i915]] Suspending device
[22018.891615] [drm:intel_runtime_suspend [i915]] Device suspended
[22055.545628] INFO: task kms_flip:25920 blocked for more than 30 seconds.
[22055.545629]       Tainted: G     U  W       4.11.0-rc2-CI-CI_DRM_2357+ #1
[22055.545629] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22055.545630] kms_flip        D    0 25920  20731 0x00000000
[22055.545631] Call Trace:
[22055.545635]  __schedule+0x3fd/0xbc0
[22055.545636]  ? __mutex_unlock_slowpath+0x3e/0x2e0
[22055.545637]  schedule+0x3b/0x90
[22055.545639]  schedule_timeout+0x23b/0x490
[22055.545641]  ? __save_stack_trace+0x7e/0xd0
[22055.545643]  __down+0x80/0xc0
[22055.545644]  ? _raw_spin_lock_irqsave+0x49/0x60
[22055.545645]  down+0x3c/0x50
[22055.545647]  console_lock+0x27/0x70
[22055.545648]  ? capable+0x17/0x20
[22055.545650]  vt_ioctl+0x51a/0x1440
[22055.545651]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[22055.545652]  ? __slab_free+0x39f/0x4f0
[22055.545654]  tty_ioctl+0x384/0xe80
[22055.545655]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
[22055.545656]  ? debug_check_no_obj_freed+0x117/0x220
[22055.545658]  do_vfs_ioctl+0x90/0x6e0
[22055.545659]  ? putname+0x4f/0x60
[22055.545661]  SyS_ioctl+0x3c/0x70
[22055.545662]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[22055.545663] RIP: 0033:0x7f6c17851357
[22055.545663] RSP: 002b:00007fffa5d50198 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[22055.545664] RAX: ffffffffffffffda RBX: 0000000001566ea0 RCX: 00007f6c17851357
[22055.545664] RDX: 0000000000000000 RSI: 0000000000004b3a RDI: 0000000000000006
[22055.545665] RBP: 0000000000000000 R08: 00007fffa5d50160 R09: 0000000000000000
[22055.545665] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
[22055.545665] R13: 00007fffa5d50ed0 R14: 0000000000000000 R15: 0000000000000000
[22055.545667] INFO: lockdep is turned off.
[22055.545667] NMI backtrace for cpu 5
[22055.545669] CPU: 5 PID: 66 Comm: khungtaskd Tainted: G     U  W       4.11.0-rc2-CI-CI_DRM_2357+ 
#1
[22055.545669] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2
017
[22055.545669] Call Trace:
[22055.545671]  dump_stack+0x67/0x92
[22055.545672]  ? irq_force_complete_move+0xf0/0xf0
[22055.545673]  nmi_cpu_backtrace+0x92/0xa0
[22055.545674]  ? irq_force_complete_move+0xf0/0xf0
[22055.545675]  nmi_trigger_cpumask_backtrace+0xc0/0x100
[22055.545677]  arch_trigger_cpumask_backtrace+0x14/0x20
[22055.545678]  watchdog+0x433/0x600
[22055.545678]  ? watchdog+0x9c/0x600
[22055.545680]  kthread+0x107/0x140
[22055.545680]  ? reset_hung_task_detector+0x10/0x10
[22055.545681]  ? kthread_create_on_node+0x40/0x40
[22055.545682]  ret_from_fork+0x2e/0x40
[22055.545684] Sending NMI from CPU 5 to CPUs 0-4,6-7:
[22055.545690] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffff818760f0
[22055.545691] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffff818760f0
[22055.545721] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffff818760f0
[22055.545722] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffff818760f0
[22055.545723] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffff818760f0
[22055.545723] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffff818760f0
[22055.546688] NMI backtrace for cpu 4
[22055.546688] CPU: 4 PID: 25851 Comm: kworker/4:1 Tainted: G     U  W       4.11.0-rc2-CI-CI_DRM_23
57+ #1
[22055.546689] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2
017
[22055.546689] Workqueue: usb_hub_wq hub_event
[22055.546689] task: ffff88041d3c0040 task.stack: ffffc900042b4000
[22055.546689] RIP: 0010:io_serial_in+0x13/0x20
[22055.546690] RSP: 0018:ffffc900042b79a0 EFLAGS: 00000002
[22055.546690] RAX: 0000000080000000 RBX: ffffffff83680100 RCX: 0000000000000000
[22055.546690] RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff83680100
[22055.546691] RBP: ffffc900042b79a0 R08: 0000000000000001 R09: 0000000000000001
[22055.546691] R10: 0000000000000000 R11: ffff88041d3c0040 R12: 0000000000002704
[22055.546691] R13: 0000000000000020 R14: ffffffff82d04fbb R15: ffffffff83680100
[22055.546691] FS:  0000000000000000(0000) GS:ffff88043ed00000(0000) knlGS:0000000000000000
[22055.546691] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22055.546692] CR2: 00007f6c142323c0 CR3: 0000000003e0f000 CR4: 00000000003406e0
[22055.546692] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[22055.546692] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[22055.546692] Call Trace:
[22055.546693]  wait_for_xmitr+0x37/0xa0
[22055.546693]  serial8250_console_putchar+0x17/0x30
[22055.546693]  ? wait_for_xmitr+0xa0/0xa0
[22055.546693]  uart_console_write+0x2b/0x60
[22055.546693]  serial8250_console_write+0xbc/0x270
[22055.546693]  ? vt_console_print+0x22d/0x3b0
[22055.546694]  univ8250_console_write+0x1c/0x20
[22055.546694]  console_unlock+0x40e/0x530
[22055.546694]  ? vprintk_emit+0x27a/0x380
[22055.546694]  vprintk_emit+0x283/0x380
[22055.546694]  ? snprintf+0x39/0x40
[22055.546695]  dev_vprintk_emit+0xab/0x220
[22055.546695]  ? usb_cache_string+0x7e/0xa0
[22055.546695]  ? trace_hardirqs_on+0xd/0x10
[22055.546695]  dev_printk_emit+0x39/0x40
[22055.546695]  __dev_printk+0x37/0x80
[22055.546696]  _dev_info+0x57/0x60
[22055.546696]  usb_new_device+0x122/0x4a0
[22055.546696]  hub_event+0xb18/0x1500
[22055.546696]  process_one_work+0x1f4/0x6d0
[22055.546696]  ? process_one_work+0x16e/0x6d0
[22055.546696]  worker_thread+0x49/0x4a0
[22055.546697]  kthread+0x107/0x140
[22055.546697]  ? process_one_work+0x6d0/0x6d0
[22055.546697]  ? kthread_create_on_node+0x40/0x40
[22055.546697]  ret_from_fork+0x2e/0x40
[22055.546697] Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 40 8b 06 5d c3 0f 1f 84 00 00 00 00 00 0f b6 8
f d1 00 00 00 8b 57 38 55 48 89 e5 d3 e6 01 f2 ec <0f> b6 c0 5d c3 0f 1f 84 00 00 00 00 00 0f b6 8f 
d1 00 00 00 89 
[22055.546706] Kernel panic - not syncing: hung_task: blocked tasks
[22055.546707] CPU: 5 PID: 66 Comm: khungtaskd Tainted: G     U  W       4.11.0-rc2-CI-CI_DRM_2357+ 
#1
[22055.546707] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2
017
[22055.546707] Call Trace:
[22055.546708]  dump_stack+0x67/0x92
[22055.546709]  panic+0xcf/0x205
[22055.546711]  ? irq_force_complete_move+0xf0/0xf0
[22055.546712]  watchdog+0x43f/0x600
[22055.546712]  ? watchdog+0x9c/0x600
[22055.546713]  kthread+0x107/0x140
[22055.546714]  ? reset_hung_task_detector+0x10/0x10
[22055.546715]  ? kthread_create_on_node+0x40/0x40
[22055.546716]  ret_from_fork+0x2e/0x40
[22055.555570] Kernel Offset: disabled
[22070.033123] ---[ end Kernel panic - not syncing: hung_task: blocked tasks
Comment 1 Chris Wilson 2017-03-17 16:20:44 UTC
Each character has a 1s timeout in wait_for_xmitr. If the string was log enough and the serial console bust, it can easily appear to be hung. As wait_for_xmitr keeps poking the nmi watchdog. the slow loop will not be caught. If that is correct, this appears to be a hw issue.

An alternative hypothesis is that the usb device struct sent garbage to be printed and it really is stuck printing out all of memory, 1 character at a time. Kasan might help.

I don't think the vtcon switch of modes is significant.

What would be useful is the dmesg before to see if we can identify what usb event triggered.
Comment 2 Elizabeth 2017-06-23 23:00:54 UTC
(In reply to Chris Wilson from comment #1)
> Each character has a 1s timeout in wait_for_xmitr. If the string was log
> enough and the serial console bust, it can easily appear to be hung. As
> wait_for_xmitr keeps poking the nmi watchdog. the slow loop will not be
> caught. If that is correct, this appears to be a hw issue.
> 
> An alternative hypothesis is that the usb device struct sent garbage to be
> printed and it really is stuck printing out all of memory, 1 character at a
> time. Kasan might help.
> 
> I don't think the vtcon switch of modes is significant.
> 
> What would be useful is the dmesg before to see if we can identify what usb
> event triggered.

Hello,
Is this bug still valid? If so, could you attach the dmesg mentioned before or any other relevant information? Thank you.
Comment 3 Elizabeth 2017-08-16 14:55:46 UTC
Hello,
I asked Tomi about this case. The problem couldn't be reproduced again, if this is pointed up again it will be reopened. Thank you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.