Bug 103540 - [CI][HSW only] igt@* - Incomplete - timeout/system hang
Summary: [CI][HSW only] igt@* - Incomplete - timeout/system hang
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks: 105984
  Show dependency treegraph
 
Reported: 2017-11-02 06:23 UTC by Marta Löfstedt
Modified: 2018-09-06 11:38 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: display/Other, GPU hang


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-11-02 06:23:57 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-hsw5/igt@gem_softpin@noreloc-S3.html

First dmesg:
<5>[  410.029726] owatch: Using watchdog device /dev/watchdog0
<5>[  410.029792] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  410.030293] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<7>[  630.912561] [IGT] gem_softpin: starting subtest noreloc-S3
<6>[  631.934079] PM: suspend entry (deep)
<6>[  631.934082] PM: Syncing filesystems ... done.

run.log
[32/73] skip: 11, pass: 20, fail: 1 |
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3307@shard-hsw5 : FAILURE
CI_IGT_test runtime 625 seconds
Comment 1 Marta Löfstedt 2017-11-08 07:47:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6978/shard-hsw4/igt@kms_cursor_legacy@all-pipes-torture-bo.html

First dmesg:
<5>[  514.464286] owatch: Using watchdog device /dev/watchdog0
<5>[  514.464338] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  514.464680] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<7>[  732.865088] [IGT] gem_exec_parallel: exiting, ret=0
<7>[  732.954598] [IGT] kms_cursor_legacy: executing

run.log:
[10/72] skip: 3, pass: 7 -                  
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test Patchwork_6978@shard-hsw4 : FAILURE
CI_IGT_test runtime 287 seconds

Looks like system hang.
Comment 2 Marta Löfstedt 2017-11-21 07:08:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3993/shard-hsw1/igt@kms_flip@vblank-vs-dpms-suspend.html

dmesg:
<5>[ 4868.543800] owatch: Using watchdog device /dev/watchdog0
<5>[ 4868.543854] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 4868.544357] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 4897.089326] [drm:intel_power_well_disable [i915]] disabling always-on
<6>[ 4897.241855] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[ 4897.282825] ata2.00: configured for UDMA/133

run.log:
running: igt/kms_flip/vblank-vs-suspend

[18/73] skip: 11, pass: 7 -            
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3364/shard-hsw1/12 : FAILURE
CI_IGT_test runtime 276 seconds
Rebooting shard-hsw1
Comment 3 Marta Löfstedt 2017-12-01 08:02:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3419/shard-hsw8/igt@kms_vblank@wait-forked.html

this caught my eye in dmesg:
<7>[  513.201488] [IGT] gem_exec_suspend: executing
<4>[  513.213950] Setting dangerous option reset - tainting kernel
<7>[  513.214808] [IGT] gem_exec_suspend: starting subtest basic-S3
...
<7>[  513.798733] [drm:i915_check_and_clear_faults [i915]] Unexpected fault
                  	Addr: 0x00000000
                  	Address space: PPGTT
                  	Source ID: 24
                  	Type: 2
...
<7>[  513.845966] [drm:sandybridge_pcode_write [i915]] warning: pcode (write of 0x00000011 to mbox 11) mailbox access failed for hsw_write_dcomp [i915]: -6
...
However this incomplete is way later:
<4>[  736.408137] Setting dangerous option reset - tainting kernel
<7>[  736.409030] [IGT] gem_exec_async: starting subtest concurrent-writes-bsd
<7>[  736.411044] [IGT] gem_exec_async: exiting, ret=0

run.log has no indication on timeout.
Comment 4 Marta Löfstedt 2017-12-04 07:18:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3429/shard-hsw5/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

dmesg:
<5>[   17.625788] owatch: Using watchdog device /dev/watchdog0
<5>[   17.625904] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   17.626792] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  116.022862] [drm:gmbus_xfer [i915]] GMBUS [i915 gmbus dpd] NAK for addr: 0040 w(1)
<7>[  116.022867] [drm:drm_dp_dual_mode_detect] DP dual mode HDMI ID:  (err -6)
<7>[  116.022873] [drm:drm_helper_hpd_irq_event] [CONNECTOR:72:HDMI-A-3] status updated from disconnected to disconnected
Followed by "stray"

run.log:
running: igt/kms_plane/plane-panning-bottom-right-suspend-pipe-a-planes

[52/75] skip: 24, pass: 28 |                                           
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3429/shard-hsw5/0 : FAILURE
CI_IGT_test runtime 224 seconds
Rebooting shard-hsw5
Comment 5 Marta Löfstedt 2017-12-04 07:22:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3445/shard-hsw3/igt@kms_flip@vblank-vs-dpms-suspend-interruptible.html

Note none of the pstore appear to match dmesg timestamps

dmesg:
<7>[  127.131153] [drm:verify_connector_state.isra.77 [i915]] [CONNECTOR:58:VGA-1]
<7>[  127.131219] [drm:intel_atomic_commit_tail [i915]] [CRTC:37:pipe A]
<7>[  127.131289] [drm:verify_single_dpll_state.isra.78 [i915]] SPLL
<6>[  127.182789] PM: suspend entry (deep)

run.log:
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3445/shard-hsw3/2 : FAILURE
CI_IGT_test runtime 547 seconds
Rebooting shard-hsw3
Comment 7 Marta Löfstedt 2017-12-05 10:18:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4037/shard-hsw4/igt@kms_cursor_crc@cursor-64x64-suspend.html

last dmesg:
<7>[ 6527.477151] [drm:intel_crt_detect [i915]] CRT detected via hotplug
<6>[ 6527.689894] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[ 6527.713458] ata3.00: configured for UDMA/133

run.log:
[23/75] skip: 12, pass: 11 -
running: igt/kms_cursor_crc/cursor-64x64-suspend

[23/75] skip: 12, pass: 11 \                    
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3452/shard-hsw4/11 : FAILURE
CI_IGT_test runtime 256 seconds
Rebooting shard-hsw4
Comment 8 Marta Löfstedt 2017-12-05 13:51:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3457/shard-hsw8/igt@gem_exec_suspend@basic-s3.html

last dmesg:
<7>[  809.762818] [IGT] gem_exec_suspend: executing
<4>[  809.777894] Setting dangerous option reset - tainting kernel
<7>[  809.778786] [IGT] gem_exec_suspend: starting subtest basic-S3

run.log:
pass: igt/gem_exec_suspend/basic-s3

[17/75] skip: 8, pass: 9 |
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3457/shard-hsw8/29 : FAILURE
CI_IGT_test runtime 183 seconds
Rebooting shard-hsw8
Comment 9 Marta Löfstedt 2017-12-07 07:47:40 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3467/shard-hsw8/igt@gem_eio@hibernate.html

last dmesg:
<7>[   79.416629] [IGT] gem_eio: executing
<4>[   79.439221] Setting dangerous option reset - tainting kernel
<6>[   79.441597] [drm] GPU HANG: ecode 7:0:0x87f3fffe, reason: Manually setting wedged to 18446744073709551615, action: reset
<6>[   79.441732] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[   79.441736] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[   79.441739] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[   79.441743] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[   79.441747] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<7>[   79.442619] [drm:i915_reset_device [i915]] resetting chip
<5>[   79.442736] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[   79.443960] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 0
<7>[   79.446291] [IGT] gem_eio: starting subtest hibernate
<4>[   79.446354] Setting dangerous option reset - tainting kernel
<7>[   79.446936] [drm:i915_reset_device [i915]] resetting chip
<5>[   79.447648] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[   79.448025] [drm:i915_reset [i915]] GPU reset disabled
<6>[   79.467560] PM: hibernation entry

run.log:
pass: igt/gem_eio/hibernate 

[24/75] skip: 12, pass: 12 \
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3467/shard-hsw8/19 : FAILURE
CI_IGT_test runtime 224 seconds
Rebooting shard-hsw8
Comment 10 Marta Löfstedt 2017-12-07 14:24:16 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4041/shard-hsw1/igt@gem_exec_create@basic.html

this is all in dmesg:
<5>[   17.780372] owatch: Using watchdog device /dev/watchdog0
<5>[   17.780484] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   17.781154] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
<6>[   20.535043] Console: switching to colour dummy device 80x25
<7>[   20.535136] [IGT] gem_bad_reloc: executing
<7>[   20.572327] [IGT] gem_bad_reloc: starting subtest negative-reloc-bltcopy
<7>[   20.675313] [IGT] gem_bad_reloc: exiting, ret=0
<7>[   20.796516] [IGT] gem_exec_create: executing
<4>[   20.831229] Setting dangerous option reset - tainting kernel
<7>[   20.831920] [IGT] gem_exec_create: starting subtest basic

run.log doesn't show result on any tests:
Completed CI_IGT_test CI_DRM_3470/shard-hsw1/32 : FAILURE
CI_IGT_test runtime 15 seconds
Rebooting shard-hsw1

network issue folloed by Jenkins reboot?
Comment 11 Marta Löfstedt 2017-12-08 06:49:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4042/shard-hsw5/igt@kms_cursor_crc@cursor-128x128-offscreen.html

last dmesg:
<7>[  304.471679] [drm:haswell_crtc_enable [i915]] iCLKIP clock: found settings for 65000KHz refresh rate: auxdiv=0, divsel=27, phasedir=0, phaseinc=22
<7>[  304.521857] [drm:verify_connector_state.isra.77 [i915]] [CONNECTOR:58:VGA-1]
<7>[  304.521882] [drm:intel_atomic_commit_tail [i915]] [CRTC:37:pipe A]
<7>[  304.521938] [drm:verify_single_dpll_state.isra.78 [i915]] SPLL
<7>[  304.521962] [drm:intel_atomic_commit_tail [i915]] [CRTC:57:pipe C]
Followed by "stray"

run.log:
[53/75] skip: 22, pass: 31 |
running: igt/gem_tiled_swapping/non-threaded

[53/75] skip: 22, pass: 31 /                
FATAL: command execution failed
java.io.EOFException
...

Finished: FAILURE
Completed CI_IGT_test CI_DRM_3473/shard-hsw5/11 : FAILURE
CI_IGT_test runtime 430 seconds
Rebooting shard-hsw5

mismatch between run.log and results on the affected subtest indicates Jenkins reboot.
Comment 12 Marta Löfstedt 2017-12-08 07:01:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3476/shard-hsw4/igt@kms_flip@flip-vs-panning.html

dmesg:
<7>[  266.501168] [IGT] drv_suspend: starting subtest fence-restore-untiled-hibernate
<6>[  266.564484] PM: hibernation entry
<6>[  266.564781] PM: Syncing filesystems ... 
<6>[  266.564877] PM: done.
<6>[  266.564879] Freezing user space processes ... 
<3>[  286.570138] Freezing of tasks failed after 20.005 seconds (5 tasks refusing to freeze, wq_busy=0):
<6>[  286.570253] java            D    0  1051   1015 0x00000004
<4>[  286.570287] Call Trace:
<4>[  286.570314]  ? __schedule+0x3c3/0xaf0
<4>[  286.570331]  ? wait_on_page_bit_killable+0xff/0x160
<4>[  286.570351]  schedule+0x37/0x90
<4>[  286.570366]  io_schedule+0xd/0x30
<4>[  286.570377]  wait_on_page_bit_killable+0x10b/0x160
<4>[  286.570397]  ? add_to_page_cache_lru+0xc0/0xc0
<4>[  286.570417]  __lock_page_or_retry+0x9c/0xe0
<4>[  286.570432]  do_swap_page+0x57f/0x8f0
<4>[  286.570452]  ? __lock_acquire+0x42c/0x15a0
<4>[  286.570471]  __handle_mm_fault+0x83b/0xe40
<4>[  286.570509]  handle_mm_fault+0x14f/0x2f0
<4>[  286.570525]  __do_page_fault+0x2d1/0x560
<4>[  286.570552]  page_fault+0x22/0x30
<4>[  286.570562] RIP: 0033:0x7f58dd0231d9
<4>[  286.570570] RSP: 002b:00007f58f5d2e798 EFLAGS: 00010246
<4>[  286.570584] RAX: 0000000000000000 RBX: 00007f58dc800a00 RCX: 00000007731182a8
<4>[  286.570592] RDX: 0000000090500003 RSI: 0000000000000005 RDI: 00007f58ec00a000
<4>[  286.570599] RBP: 00007f58f5d2e808 R08: 0000000000000000 R09: 0000000000000000
<4>[  286.570607] R10: 00007f58f4f0d8c0 R11: 0000000000000206 R12: 0000000000000000
<4>[  286.570614] R13: 00007f58f5d2e7a0 R14: 00007f58f5d2e818 R15: 00007f58ec00a000
<6>[  286.570652] java            D    0  1070   1015 0x00000004

run.log doesn't match results:
running: igt/gem_tiled_swapping/non-threaded

[53/75] skip: 22, pass: 31 /                
FATAL: command execution failed

indicating another suspected Jenkins reboot.
Comment 13 Marta Löfstedt 2017-12-11 08:41:16 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4052/shard-hsw7/igt@gem_exec_flush@basic-wb-pro-default.html

run.log doesn't match results:
running: igt/gem_tiled_swapping/non-threaded

[30/75] skip: 11, pass: 19 -                
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3487/shard-hsw7/29 : FAILURE
CI_IGT_test runtime 211 seconds
Rebooting shard-hsw7

this has similar pattern 
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4050/shard-hsw3/igt@kms_flip@absolute-wf_vblank.html

last test in run.log is:
running: igt/gem_tiled_swapping/non-threaded

[13/76] skip: 4, pass: 9 /                  
FATAL: command execution failed
Comment 14 Marta Löfstedt 2017-12-12 07:28:24 UTC
These two also has igt@gem_tiled_swapping@non-threaded as last test in run.log, however the incomplete is linked to tests running after.

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4057/shard-hsw6/igt@gem_exec_flush@basic-uc-set-default.html

<7>[   24.081678] [IGT] gem_tiled_swapping: starting subtest non-threaded
<5>[   24.300979] random: crng init done
<6>[   51.508286] Purging GPU memory, 50176 pages freed, 935 pages still pinned.
<6>[   55.038871] Purging GPU memory, 50432 pages freed, 935 pages still pinned.
<7>[   55.768070] [IGT] gem_tiled_swapping: exiting, ret=0
<7>[   56.662909] [IGT] gem_exec_flush: executing

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3499/shard-hsw6/igt@perf_pmu@render-node-busy-vcs0.html
<7>[   14.284709] [IGT] gem_tiled_swapping: executing
<7>[   15.804253] [IGT] gem_tiled_swapping: starting subtest non-threaded
<5>[   15.877672] random: crng init done
<6>[   37.606221] Purging GPU memory, 41984 pages freed, 935 pages still pinned.
<6>[   41.130918] Purging GPU memory, 42496 pages freed, 935 pages still pinned.
<3>[   41.130930] 0 and 256 pages still available in the bound and unbound GPU page lists.
<6>[   43.888349] Purging GPU memory, 42752 pages freed, 935 pages still pinned.
<6>[   47.333043] Purging GPU memory, 42752 pages freed, 935 pages still pinned.
<6>[   50.727754] Purging GPU memory, 42496 pages freed, 935 pages still pinned.
<7>[   51.330125] [IGT] gem_tiled_swapping: exiting, ret=0
<7>[   52.321979] [IGT] perf_pmu: executing

Maybe this should be moved to a separate bug.
Comment 15 Marta Löfstedt 2017-12-12 13:19:42 UTC
Note I moved the igt@gem_tiled_swapping@non-threaded related once to bug 104218
Comment 16 Marta Löfstedt 2017-12-12 13:20:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4058/shard-hsw7/igt@kms_flip@vblank-vs-modeset-suspend-interruptible.html

last dmesg:
<7>[   33.026621] [drm:intel_atomic_commit_tail [i915]] [CRTC:37:pipe A]
<6>[   33.187176] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[   33.206205] ata3.00: configured for UDMA/133
Folloed by "stray"

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend-interruptible

[05/76] pass: 5 /                                            
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3499/shard-hsw7/12 : FAILURE
CI_IGT_test runtime 183 seconds
Rebooting shard-hsw7
Comment 18 Marta Löfstedt 2018-01-02 08:21:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3546/shard-hsw3/igt@gem_eio@in-flight-external.html

<7>[  121.567706] [drm:i915_reset_device [i915]] resetting chip
<5>[  121.567799] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  121.568746] [drm:sandybridge_pcode_read [i915]] warning: pcode (read from mbox 5) mailbox access failed for intel_enable_gt_powersave [i915]: -6
<7>[  121.568983] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 0
<14>[  121.570147] [IGT] gem_eio: starting subtest in-flight-external
<4>[  121.571822] Setting dangerous option reset - tainting kernel
<3>[  123.921863] INFO: task kswapd0:80 blocked for more than 60 seconds.
<3>[  123.921887]       Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3546+ #1
<3>[  123.921905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  123.921926] kswapd0         D    0    80      2 0x80000000
<4>[  123.921931] Call Trace:
<4>[  123.921940]  ? __schedule+0x3c3/0xaf0
<4>[  123.921947]  schedule+0x37/0x90
<4>[  123.921951]  io_schedule+0xd/0x30
<4>[  123.921955]  __lock_page+0x107/0x130
<4>[  123.921960]  ? add_to_page_cache_lru+0xc0/0xc0
<4>[  123.921967]  deferred_split_scan+0x25a/0x2b0
<4>[  123.921975]  shrink_slab.part.17+0x201/0x5d0
<4>[  123.921986]  shrink_node+0x2fd/0x310
<4>[  123.921993]  kswapd+0x31c/0x910
<4>[  123.922004]  kthread+0xfb/0x130
<4>[  123.922007]  ? mem_cgroup_shrink_node+0x300/0x300
<4>[  123.922009]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.922015]  ret_from_fork+0x24/0x30
<4>[  123.922023] 
<4>[  123.922023] Showing all locks held in the system:
<4>[  123.922028] 2 locks held by khungtaskd/67:
<4>[  123.922032]  #0:  (rcu_read_lock){....}, at: [<00000000af589633>] watchdog+0x9b/0x5e0
<4>[  123.922043]  #1:  (tasklist_lock){.+.+}, at: [<000000004e9aafe9>] debug_show_all_locks+0x37/0x190
<4>[  123.922056] 1 lock held by kswapd0/80:
<4>[  123.922057]  #0:  (shrinker_rwsem){++++}, at: [<000000002678c2e8>] shrink_slab.part.17+0x46/0x5d0
<4>[  123.922075] 1 lock held by in:imklog/571:
<4>[  123.922076]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000044eb2e1>] __fdget_pos+0x3a/0x50
<4>[  123.922097] 1 lock held by dmesg/1154:
<4>[  123.922099]  #0:  (&user->lock){+.+.}, at: [<00000000b572ad75>] devkmsg_read+0x35/0x2f0
<4>[  123.922111] 
<4>[  123.922113] =============================================
<4>[  123.922113] 
<4>[  123.922116] NMI backtrace for cpu 0
<4>[  123.922118] CPU: 0 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3546+ #1
<4>[  123.922120] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.922122] Call Trace:
<4>[  123.922127]  dump_stack+0x5f/0x86
<4>[  123.922132]  nmi_cpu_backtrace+0xb4/0xc0
<4>[  123.922137]  ? lapic_can_unplug_cpu+0x90/0x90
<4>[  123.922140]  nmi_trigger_cpumask_backtrace+0xb8/0xf0
<4>[  123.922144]  watchdog+0x43e/0x5e0
<4>[  123.922149]  kthread+0xfb/0x130
<4>[  123.922151]  ? reset_hung_task_detector+0x10/0x10
<4>[  123.922154]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.922157]  ret_from_fork+0x24/0x30
<6>[  123.922166] Sending NMI from CPU 0 to CPUs 1-7:
<4>[  123.922175] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922189] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922192] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922195] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922199] NMI backtrace for cpu 2 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922212] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.922218] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x6f/0x120
<0>[  123.923181] Kernel panic - not syncing: hung_task: blocked tasks
<4>[  123.923221] CPU: 0 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3546+ #1
<4>[  123.923274] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.923317] Call Trace:
<4>[  123.923340]  dump_stack+0x5f/0x86
<4>[  123.923367]  panic+0xcf/0x20d
<4>[  123.923401]  watchdog+0x44a/0x5e0
<4>[  123.923431]  kthread+0xfb/0x130
<4>[  123.923455]  ? reset_hung_task_detector+0x10/0x10
<4>[  123.923485]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.923519]  ret_from_fork+0x24/0x30
<0>[  123.924041] Dumping ftrace buffer:
<0>[  123.924117]    (ftrace buffer empty)
<0>[  123.924131] Kernel Offset: disabled
Comment 19 Marta Löfstedt 2018-01-02 08:22:32 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3552/shard-hsw4/igt@kms_flip@flip-vs-fences.html

<7>[  179.489599] [drm:verify_single_dpll_state.isra.78 [i915]] SPLL
<3>[  185.394157] INFO: task kswapd0:80 blocked for more than 60 seconds.
<3>[  185.394169]       Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<3>[  185.394171] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  185.394174] kswapd0         D    0    80      2 0x80000000
<4>[  185.394179] Call Trace:
<4>[  185.394188]  ? __schedule+0x3c3/0xaf0
<4>[  185.394196]  schedule+0x37/0x90
<4>[  185.394200]  io_schedule+0xd/0x30
<4>[  185.394204]  __lock_page+0x107/0x130
<4>[  185.394208]  ? add_to_page_cache_lru+0xc0/0xc0
<4>[  185.394215]  deferred_split_scan+0x25a/0x2b0
<4>[  185.394222]  shrink_slab.part.17+0x201/0x5d0
<4>[  185.394234]  shrink_node+0x2fd/0x310
<4>[  185.394242]  kswapd+0x31c/0x910
<4>[  185.394254]  kthread+0xfb/0x130
<4>[  185.394256]  ? mem_cgroup_shrink_node+0x300/0x300
<4>[  185.394258]  ? _kthread_create_on_node+0x30/0x30
<4>[  185.394263]  ret_from_fork+0x24/0x30
<4>[  185.394273] 
<4>[  185.394273] Showing all locks held in the system:
<4>[  185.394284] 2 locks held by khungtaskd/67:
<4>[  185.394289]  #0:  (rcu_read_lock){....}, at: [<00000000731ae36a>] watchdog+0x9b/0x5e0
<4>[  185.394300]  #1:  (tasklist_lock){.+.+}, at: [<00000000a8a05d2a>] debug_show_all_locks+0x37/0x190
<4>[  185.394312] 1 lock held by kswapd0/80:
<4>[  185.394314]  #0:  (shrinker_rwsem){++++}, at: [<00000000a5d771df>] shrink_slab.part.17+0x46/0x5d0
<4>[  185.394338] 3 locks held by kworker/3:2/223:
<4>[  185.394339]  #0:  ((wq_completion)"events"){+.+.}, at: [<000000007bac85a9>] process_one_work+0x191/0x640
<4>[  185.394349]  #1:  ((work_completion)(&i915->mm.free_work)){+.+.}, at: [<000000007bac85a9>] process_one_work+0x191/0x640
<4>[  185.394358]  #2:  (&dev->struct_mutex){+.+.}, at: [<00000000a6527aa3>] __i915_gem_free_objects+0x7c/0x540 [i915]
<4>[  185.394404] 1 lock held by in:imklog/572:
<4>[  185.394405]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000a64b4995>] __fdget_pos+0x3a/0x50
<4>[  185.394425] 1 lock held by dmesg/1146:
<4>[  185.394427]  #0:  (&user->lock){+.+.}, at: [<0000000098ba0f15>] devkmsg_read+0x35/0x2f0
<4>[  185.394438] 
<4>[  185.394440] =============================================
<4>[  185.394440] 
<4>[  185.394442] NMI backtrace for cpu 1
<4>[  185.394445] CPU: 1 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.394447] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.394448] Call Trace:
<4>[  185.394454]  dump_stack+0x5f/0x86
<4>[  185.394457]  nmi_cpu_backtrace+0xb4/0xc0
<4>[  185.394462]  ? lapic_can_unplug_cpu+0x90/0x90
<4>[  185.394465]  nmi_trigger_cpumask_backtrace+0xb8/0xf0
<4>[  185.394469]  watchdog+0x43e/0x5e0
<4>[  185.394474]  kthread+0xfb/0x130
<4>[  185.394477]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.394479]  ? _kthread_create_on_node+0x30/0x30
<4>[  185.394483]  ret_from_fork+0x24/0x30
<6>[  185.394493] Sending NMI from CPU 1 to CPUs 0,2-7:
<4>[  185.394499] NMI backtrace for cpu 0
<4>[  185.394510] CPU: 0 PID: 983 Comm: java Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.394511] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.394513] RIP: 0010:check_flags.part.25+0x8f/0x1b0
<4>[  185.394513] RSP: 0018:ffffc900007afe58 EFLAGS: 00000046
<4>[  185.394514] RAX: 0000000080000002 RBX: ffff88041fa1ab40 RCX: 0000000000000001
<4>[  185.394515] RDX: ffffffff818b64b3 RSI: 0000000000000001 RDI: 0000000000000086
<4>[  185.394515] RBP: ffff88041fa1ab58 R08: 0000000000000000 R09: 0000000000000001
<4>[  185.394516] R10: 0000000000000000 R11: ffffffff810b553f R12: ffffffff818b64b3
<4>[  185.394516] R13: 0000000000000086 R14: ffffc900007afed0 R15: ffff880403ada7c0
<4>[  185.394517] FS:  00007f3b11ce4700(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
<4>[  185.394518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.394518] CR2: 00007f4c6bee1000 CR3: 0000000403b16002 CR4: 00000000001606f0
<4>[  185.394519] Call Trace:
<4>[  185.394520]  lock_release+0x200/0x300
<4>[  185.394522]  _raw_spin_unlock_irq+0x17/0x50
<4>[  185.394524]  __schedule+0x8b3/0xaf0
<4>[  185.394526]  schedule+0x37/0x90
<4>[  185.394527]  sys_sched_yield+0x8c/0xa0
<4>[  185.394529]  entry_SYSCALL_64_fastpath+0x1c/0x89
<4>[  185.394529] RIP: 0033:0x7f3b383d50a7
<4>[  185.394530] RSP: 002b:00007f3b11ce3c78 EFLAGS: 00000297 ORIG_RAX: 0000000000000018
<4>[  185.394531] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b383d50a7
<4>[  185.394531] RDX: 00007f3b37c56170 RSI: 00007f3b2162da80 RDI: 00000000066c25c8
<4>[  185.394532] RBP: 00007f3b11ce3d00 R08: 0000000000000000 R09: 00007f3b37c95800
<4>[  185.394532] R10: 0000000000000001 R11: 0000000000000297 R12: 00007f3b30007560
<4>[  185.394533] R13: 00000000066c1292 R14: 0000000000000001 R15: 00000000066c1292
<4>[  185.394534] Code: 85 db 75 ae e8 e3 3c 37 00 85 c0 74 10 44 8b 15 c0 73 1d 02 45 85 d2 0f 84 f7 00 00 00 48 c7 c7 70 6f c7 81 e8 b8 2f 01 00 eb 87 <8b> 35 83 72 17 02 85 f6 75 a0 65 48 8b 04 25 80 c5 00 00 8b 88 
<4>[  185.394549] NMI backtrace for cpu 4
<4>[  185.394550] CPU: 4 PID: 63 Comm: kworker/4:1 Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.394551] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.394553] Workqueue: events delayed_fput
<4>[  185.394555] RIP: 0010:check_preemption_disabled+0x2/0xe0
<4>[  185.394556] RSP: 0018:ffffc9000025fb00 EFLAGS: 00000006
<4>[  185.394557] RAX: ffffffffffffffc0 RBX: ffffffffffffffff RCX: 0000000000000001
<4>[  185.394557] RDX: ffffffffffffffff RSI: ffffffff81d12ee1 RDI: ffffffff81cb850e
<4>[  185.394558] RBP: 000000000001f9bf R08: ffff88040c8d0940 R09: 0000000044e5fc34
<4>[  185.394558] R10: ffffc9000025fb30 R11: 0000000000000002 R12: 000000000000000e
<4>[  185.394559] R13: 000000000001f9b0 R14: ffffffffffffffff R15: ffffffff81ef5240
<4>[  185.394560] FS:  0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
<4>[  185.394560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.394561] CR2: 00007fdb0dd54390 CR3: 0000000003e0f005 CR4: 00000000001606e0
<4>[  185.394561] Call Trace:
<4>[  185.394564]  __mod_node_page_state+0x64/0xa0
<4>[  185.394566]  unaccount_page_cache_page+0xa4/0x260
<4>[  185.394567]  __delete_from_page_cache+0x3f/0x240
<4>[  185.394569]  delete_from_page_cache+0x40/0x70
<4>[  185.394570]  truncate_inode_page+0x1d/0x30
<4>[  185.394572]  shmem_undo_range+0x46d/0x990
<4>[  185.394575]  shmem_truncate_range+0x11/0x30
<4>[  185.394576]  shmem_evict_inode+0xb8/0x1a0
<4>[  185.394578]  evict+0xb7/0x1b0
<4>[  185.394580]  __dentry_kill+0xb2/0x170
<4>[  185.394581]  __fput+0x13b/0x1e0
<4>[  185.394583]  delayed_fput+0x17/0x30
<4>[  185.394585]  process_one_work+0x215/0x640
<4>[  185.394587]  worker_thread+0x48/0x3a0
<4>[  185.394588]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  185.394589]  kthread+0xfb/0x130
<4>[  185.394591]  ? process_one_work+0x640/0x640
<4>[  185.394591]  ? _kthread_create_on_node+0x30/0x30
<4>[  185.394593]  ret_from_fork+0x24/0x30
<4>[  185.394594] Code: 00 00 00 41 c7 44 24 08 00 00 00 00 e9 19 ff ff ff 48 89 e8 4c 8b 34 24 48 83 c8 01 e9 09 ff ff ff 90 90 90 90 90 90 90 90 41 55 <41> 54 65 8b 05 25 06 bb 7e 55 53 65 8b 1d 3c e2 ba 7e a9 ff ff 
<4>[  185.394611] NMI backtrace for cpu 3
<4>[  185.394612] CPU: 3 PID: 3921 Comm: kms_flip Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.394612] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.394614] RIP: 0010:lock_acquire+0x27/0x200
<4>[  185.394615] RSP: 0018:ffffc9000090f810 EFLAGS: 00000246
<4>[  185.394616] RAX: ffff8803fc8d4f40 RBX: 0000000000000001 RCX: 0000000000000002
<4>[  185.394616] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81e443c0
<4>[  185.394617] RBP: ffffc9000090f860 R08: 0000000000000000 R09: 0000000000000000
<4>[  185.394617] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc9000090f928
<4>[  185.394618] R13: ffff88040a162740 R14: 0000000000000000 R15: 0000000000000000
<4>[  185.394618] FS:  00007fc1054b5a40(0000) GS:ffff88041fac0000(0000) knlGS:0000000000000000
<4>[  185.394619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.394619] CR2: 00007f366c00a350 CR3: 000000040956a004 CR4: 00000000001606e0
<4>[  185.394620] Call Trace:
<4>[  185.394623]  get_mem_cgroup_from_mm+0x33/0x2e0
<4>[  185.394624]  ? mem_cgroup_from_task+0x90/0x90
<4>[  185.394625]  mem_cgroup_try_charge+0x76/0x490
<4>[  185.394628]  shmem_getpage_gfp.isra.9+0x1b7/0xc80
<4>[  185.394630]  shmem_read_mapping_page_gfp+0x2e/0x50
<4>[  185.394661]  i915_gem_object_get_pages_gtt+0x13b/0x660 [i915]
<4>[  185.394683]  ____i915_gem_object_get_pages+0x17/0x70 [i915]
<4>[  185.394703]  __i915_gem_object_get_pages+0x59/0x80 [i915]
<4>[  185.394724]  __i915_vma_do_pin+0x6ef/0x980 [i915]
<4>[  185.394743]  eb_lookup_vmas+0x87f/0xe80 [i915]
<4>[  185.394746]  ? __pm_runtime_resume+0x4f/0x80
<4>[  185.394764]  i915_gem_do_execbuffer+0x57c/0x1690 [i915]
<4>[  185.394781]  ? i915_gem_execbuffer2+0x90/0x3a0 [i915]
<4>[  185.394784]  ? lock_acquire+0xaf/0x200
<4>[  185.394786]  ? __might_fault+0x39/0x90
<4>[  185.394803]  ? i915_gem_execbuffer+0x2c0/0x2c0 [i915]
<4>[  185.394820]  i915_gem_execbuffer2+0x1da/0x3a0 [i915]
<4>[  185.394837]  ? i915_gem_execbuffer+0x2c0/0x2c0 [i915]
<4>[  185.394839]  drm_ioctl_kernel+0x60/0xa0
<4>[  185.394840]  drm_ioctl+0x290/0x330
<4>[  185.394856]  ? i915_gem_execbuffer+0x2c0/0x2c0 [i915]
<4>[  185.394858]  ? _raw_spin_unlock_irq+0x2f/0x50
<4>[  185.394859]  ? finish_task_switch+0xa5/0x210
<4>[  185.394859]  ? finish_task_switch+0x6a/0x210
<4>[  185.394861]  do_vfs_ioctl+0x8a/0x670
<4>[  185.394862]  ? entry_SYSCALL_64_fastpath+0x5/0x89
<4>[  185.394863]  ? trace_hardirqs_on_caller+0xde/0x1c0
<4>[  185.394865]  SyS_ioctl+0x36/0x70
<4>[  185.394866]  entry_SYSCALL_64_fastpath+0x1c/0x89
<4>[  185.394867] RIP: 0033:0x7fc1036b4587
<4>[  185.394867] RSP: 002b:00007ffdd38a4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  185.394868] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fc1036b4587
<4>[  185.394869] RDX: 00007ffdd38a44d0 RSI: 0000000040406469 RDI: 0000000000000003
<4>[  185.394869] RBP: 00007ffdd38a4380 R08: 0000000000000000 R09: 0000000000000007
<4>[  185.394870] R10: 00007fc103977b58 R11: 0000000000000246 R12: 00000000c01864b0
<4>[  185.394870] R13: 0000000000000003 R14: 0000000000000001 R15: 00007ffdd38a4670
<4>[  185.394871] Code: 0f 1f 40 00 65 48 8b 04 25 80 c5 00 00 44 8b 90 ac 08 00 00 45 85 d2 0f 85 bd 00 00 00 41 57 41 56 4d 89 cf 41 55 41 54 45 89 c6 <55> 53 41 89 cd 41 89 d4 89 f5 48 83 ec 10 48 89 3c 24 9c 8f 44 
<4>[  185.394886] NMI backtrace for cpu 5
<4>[  185.394888] CPU: 5 PID: 149 Comm: kworker/u16:2 Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.394888] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.394913] Workqueue: events_unbound intel_atomic_commit_work [i915]
<4>[  185.394914] RIP: 0010:mutex_spin_on_owner+0xa5/0x170
<4>[  185.394915] RSP: 0018:ffffc90000c33c90 EFLAGS: 00000246
<4>[  185.394916] RAX: ffff8803fc8d4f41 RBX: ffff8803f89a0070 RCX: 0000000000000001
<4>[  185.394916] RDX: ffff88040abe4f40 RSI: 0000000000000001 RDI: ffffffff81cb84fd
<4>[  185.394917] RBP: ffffc90000c33cb0 R08: ffff88040abe5868 R09: 00000000c2ec3375
<4>[  185.394917] R10: ffffc90000c33c20 R11: ffffffff810d1510 R12: 0000000000000000
<4>[  185.394918] R13: 0000000000000000 R14: ffff8803fc8d4f40 R15: 0000000000000001
<4>[  185.394918] FS:  0000000000000000(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
<4>[  185.394919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  185.394919] CR2: 00007fa1ad770e20 CR3: 0000000003e0f003 CR4: 00000000001606e0
<4>[  185.394920] Call Trace:
<4>[  185.394922]  __mutex_lock+0x691/0x9b0
<4>[  185.394945]  ? intel_cleanup_plane_fb+0x2a/0x50 [i915]
<4>[  185.394946]  ? complete_all+0x13/0x40
<4>[  185.394948]  ? mark_held_locks+0x64/0x90
<4>[  185.394948]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  185.394949]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4>[  185.394970]  ? intel_cleanup_plane_fb+0x2a/0x50 [i915]
<4>[  185.394989]  intel_cleanup_plane_fb+0x2a/0x50 [i915]
<4>[  185.394992]  drm_atomic_helper_cleanup_planes+0x4a/0x60
<4>[  185.395012]  intel_atomic_commit_tail+0x739/0xce0 [i915]
<4>[  185.395014]  process_one_work+0x215/0x640
<4>[  185.395016]  worker_thread+0x48/0x3a0
<4>[  185.395017]  kthread+0xfb/0x130
<4>[  185.395018]  ? process_one_work+0x640/0x640
<4>[  185.395019]  ? _kthread_create_on_node+0x30/0x30
<4>[  185.395020]  ret_from_fork+0x24/0x30
<4>[  185.395021] Code: 42 4d 85 e4 74 20 41 8b 44 24 10 85 c0 74 0c 48 8b 83 80 00 00 00 48 85 c0 75 28 4d 85 ed 74 63 4c 3b 6b 48 75 1d f3 90 48 8b 03 <48> 83 e0 f8 49 39 c6 75 58 41 8b 4e 60 85 c9 74 07 48 8b 02 a8 
<4>[  185.395037] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x6f/0x120
<4>[  185.395040] NMI backtrace for cpu 2 skipped: idling at intel_idle+0x6f/0x120
<4>[  185.395042] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x6f/0x120
<0>[  185.395501] Kernel panic - not syncing: hung_task: blocked tasks
<4>[  185.395506] CPU: 1 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3552+ #1
<4>[  185.395508] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  185.395510] Call Trace:
<4>[  185.395516]  dump_stack+0x5f/0x86
<4>[  185.395520]  panic+0xcf/0x20d
<4>[  185.395529]  watchdog+0x44a/0x5e0
<4>[  185.395535]  kthread+0xfb/0x130
<4>[  185.395538]  ? reset_hung_task_detector+0x10/0x10
<4>[  185.395541]  ? _kthread_create_on_node+0x30/0x30
<4>[  185.395546]  ret_from_fork+0x24/0x30
<0>[  185.395882] Dumping ftrace buffer:
<0>[  185.395932]    (ftrace buffer empty)
<0>[  185.395934] Kernel Offset: disabled
Comment 20 Marta Löfstedt 2018-01-02 08:23:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3557/shard-hsw2/igt@kms_frontbuffer_tracking@fbcpsr-modesetfrombusy.html

<14>[  123.880728] [IGT] kms_frontbuffer_tracking: starting subtest fbcpsr-modesetfrombusy
<14>[  123.880948] [IGT] kms_frontbuffer_tracking: exiting, ret=77
<3>[  123.922867] INFO: task kswapd0:80 blocked for more than 60 seconds.
<3>[  123.922902]       Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3557+ #1
<3>[  123.922912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[  123.922924] kswapd0         D    0    80      2 0x80000000
<4>[  123.922927] Call Trace:
<4>[  123.922933]  ? __schedule+0x3c3/0xaf0
<4>[  123.922937]  schedule+0x37/0x90
<4>[  123.922939]  io_schedule+0xd/0x30
<4>[  123.922942]  __lock_page+0x107/0x130
<4>[  123.922945]  ? add_to_page_cache_lru+0xc0/0xc0
<4>[  123.922948]  deferred_split_scan+0x25a/0x2b0
<4>[  123.922953]  shrink_slab.part.17+0x201/0x5d0
<4>[  123.922959]  shrink_node+0x2fd/0x310
<4>[  123.922963]  kswapd+0x31c/0x910
<4>[  123.922970]  kthread+0xfb/0x130
<4>[  123.922971]  ? mem_cgroup_shrink_node+0x300/0x300
<4>[  123.922973]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.922975]  ret_from_fork+0x24/0x30
<4>[  123.922980] 
<4>[  123.922980] Showing all locks held in the system:
<4>[  123.922987] 2 locks held by khungtaskd/67:
<4>[  123.922990]  #0:  (rcu_read_lock){....}, at: [<00000000062ec395>] watchdog+0x9b/0x5e0
<4>[  123.922996]  #1:  (tasklist_lock){.+.+}, at: [<000000003320a277>] debug_show_all_locks+0x37/0x190
<4>[  123.923004] 1 lock held by kswapd0/80:
<4>[  123.923005]  #0:  (shrinker_rwsem){++++}, at: [<00000000f7094996>] shrink_slab.part.17+0x46/0x5d0
<4>[  123.923021] 1 lock held by in:imklog/645:
<4>[  123.923022]  #0:  (&f->f_pos_lock){+.+.}, at: [<000000005f7af38c>] __fdget_pos+0x3a/0x50
<4>[  123.923044] 1 lock held by dmesg/1081:
<4>[  123.923044]  #0:  (&user->lock){+.+.}, at: [<000000009066b1f6>] devkmsg_read+0x35/0x2f0
<4>[  123.923051] 
<4>[  123.923052] =============================================
<4>[  123.923052] 
<4>[  123.923053] NMI backtrace for cpu 0
<4>[  123.923055] CPU: 0 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3557+ #1
<4>[  123.923056] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.923057] Call Trace:
<4>[  123.923059]  dump_stack+0x5f/0x86
<4>[  123.923061]  nmi_cpu_backtrace+0xb4/0xc0
<4>[  123.923065]  ? lapic_can_unplug_cpu+0x90/0x90
<4>[  123.923066]  nmi_trigger_cpumask_backtrace+0xb8/0xf0
<4>[  123.923069]  watchdog+0x43e/0x5e0
<4>[  123.923072]  kthread+0xfb/0x130
<4>[  123.923073]  ? reset_hung_task_detector+0x10/0x10
<4>[  123.923074]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.923077]  ret_from_fork+0x24/0x30
<6>[  123.923082] Sending NMI from CPU 0 to CPUs 1-7:
<4>[  123.923087] NMI backtrace for cpu 6
<4>[  123.923098] CPU: 6 PID: 960 Comm: java Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3557+ #1
<4>[  123.923098] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.923100] RIP: 0010:lock_acquire+0xc4/0x200
<4>[  123.923101] RSP: 0018:ffffc90000a17d40 EFLAGS: 00000086
<4>[  123.923102] RAX: ffff8804016e4f40 RBX: 0000000000000086 RCX: 99d2db9e00000000
<4>[  123.923102] RDX: 0000000000000001 RSI: 00000000ef4af4e7 RDI: 0000000000000000
<4>[  123.923103] RBP: 0000000000000000 R08: ffff8804016e5818 R09: 00000000e0c4a53a
<4>[  123.923103] R10: ffffc90000a17d40 R11: ffffffff810d0e42 R12: 0000000000000000
<4>[  123.923104] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
<4>[  123.923104] FS:  00007f338b5f5700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
<4>[  123.923105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  123.923105] CR2: 00007f33a95dc000 CR3: 00000003fccfb006 CR4: 00000000001606e0
<4>[  123.923106] Call Trace:
<4>[  123.923108]  ? cpuacct_charge+0x32/0x1f0
<4>[  123.923109]  cpuacct_charge+0x5b/0x1f0
<4>[  123.923110]  ? cpuacct_charge+0x32/0x1f0
<4>[  123.923112]  update_curr+0x17b/0x460
<4>[  123.923113]  pick_next_task_fair+0xa9/0x8a0
<4>[  123.923116]  __schedule+0x123/0xaf0
<4>[  123.923117]  schedule+0x37/0x90
<4>[  123.923118]  sys_sched_yield+0x8c/0xa0
<4>[  123.923120]  entry_SYSCALL_64_fastpath+0x1c/0x89
<4>[  123.923121] RIP: 0033:0x7f33a9d1f0a7
<4>[  123.923121] RSP: 002b:00007f338b5f4c78 EFLAGS: 00000293 ORIG_RAX: 0000000000000018
<4>[  123.923122] RAX: ffffffffffffffda RBX: 00007f3320014800 RCX: 00007f33a9d1f0a7
<4>[  123.923122] RDX: 00007f33a95a0170 RSI: 00007f3391005300 RDI: 0000000003885b5a
<4>[  123.923123] RBP: 00007f338b5f4d00 R08: 0000000000000000 R09: 00007f33a95df800
<4>[  123.923123] R10: 0000000000000001 R11: 0000000000000293 R12: 00007f332000f2d0
<4>[  123.923124] R13: 0000000003883ac3 R14: 0000000000000002 R15: 0000000003883ac2
<4>[  123.923125] Code: 49 89 d9 89 ee 49 c1 e9 09 49 83 f1 01 41 83 e1 01 e8 41 dd ff ff 65 48 8b 04 25 80 c5 00 00 c7 80 ac 08 00 00 00 00 00 00 53 9d <48> 83 c4 20 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f f3 c3 65 
<4>[  123.923141] NMI backtrace for cpu 2 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.923143] NMI backtrace for cpu 5
<4>[  123.923144] CPU: 5 PID: 200 Comm: jbd2/sda2-8 Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3557+ #1
<4>[  123.923144] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.923147] RIP: 0010:jbd2_journal_commit_transaction+0x12e6/0x1f40
<4>[  123.923148] RSP: 0018:ffffc90000333d00 EFLAGS: 00000282
<4>[  123.923149] RAX: 0000000000000000 RBX: ffff8803e5bd9bd0 RCX: 0000000000000002
<4>[  123.923149] RDX: ffff8803e5bd9b88 RSI: ffffc90000333cd0 RDI: ffffffff81e07c38
<4>[  123.923150] RBP: ffffc90000333e90 R08: 0000000000000001 R09: 0000000000000001
<4>[  123.923150] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[  123.923150] R13: ffff8803e5bd9b88 R14: ffff88040a980008 R15: ffff880400400b40
<4>[  123.923151] FS:  0000000000000000(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
<4>[  123.923152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  123.923152] CR2: 00007f4da7afe080 CR3: 0000000003e0f004 CR4: 00000000001606e0
<4>[  123.923153] Call Trace:
<4>[  123.923157]  ? kjournald2+0xb4/0x270
<4>[  123.923158]  kjournald2+0xb4/0x270
<4>[  123.923160]  ? wait_woken+0x90/0x90
<4>[  123.923161]  kthread+0xfb/0x130
<4>[  123.923163]  ? commit_timeout+0x10/0x10
<4>[  123.923163]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.923165]  ret_from_fork+0x24/0x30
<4>[  123.923166] Code: 81 e8 4f 5b dc ff 48 8b 43 b8 4c 8d 6b b8 a8 04 0f 84 6f ff ff ff 4c 89 ef e8 27 98 f5 ff e9 62 ff ff ff 4c 89 ef e8 1a 98 f5 ff <e9> 93 fe ff ff 0f 0b 0f 0b 89 c6 4c 89 f7 e8 e7 66 00 00 e9 20 
<4>[  123.923182] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.923184] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.923187] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x6f/0x120
<4>[  123.923188] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x6f/0x120
<0>[  123.924088] Kernel panic - not syncing: hung_task: blocked tasks
<4>[  123.924099] CPU: 0 PID: 67 Comm: khungtaskd Tainted: G     U           4.15.0-rc4-CI-CI_DRM_3557+ #1
<4>[  123.924112] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
<4>[  123.924123] Call Trace:
<4>[  123.924129]  dump_stack+0x5f/0x86
<4>[  123.924136]  panic+0xcf/0x20d
<4>[  123.924145]  watchdog+0x44a/0x5e0
<4>[  123.924153]  kthread+0xfb/0x130
<4>[  123.924159]  ? reset_hung_task_detector+0x10/0x10
<4>[  123.924166]  ? _kthread_create_on_node+0x30/0x30
<4>[  123.924175]  ret_from_fork+0x24/0x30
<0>[  123.924426] Dumping ftrace buffer:
<0>[  123.924485]    (ftrace buffer empty)
<0>[  123.924492] Kernel Offset: disabled
Comment 21 Marta Löfstedt 2018-01-17 13:48:22 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3640/shard-hsw6/igt@gem_fence_thrash@bo-write-verify-x.html

last test in run.log doesn't match results:
running: igt/kms_flip/vblank-vs-suspend

dmesg:
<7>[  282.749388] [IGT] kms_flip: starting subtest vblank-vs-suspend
...
<7>[  304.420422] [IGT] gem_fence_thrash: starting subtest bo-write-verify-x

So, it looks like Jenkins did it.
Comment 22 Marta Löfstedt 2018-01-22 08:50:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3664/shard-hsw5/igt@kms_flip@2x-vblank-vs-dpms-suspend-interruptible.html

last test in run.log doesn't match results:
running: igt/kms_cursor_crc/cursor-128x128-suspend

[52/78] skip: 18, pass: 34 |                      
FATAL: command execution failed

from dmesg:
<7>[  289.559132] [IGT] kms_cursor_crc: starting subtest cursor-128x128-suspend
...
<6>[  290.429087] r8169 0000:02:00.0 enp2s0: link down
...
<7>[  306.022276] [drm:intel_atomic_commit_tail [i915]] [CRTC:57:pipe C]
<7>[  306.022295] [drm:verify_single_dpll_state.isra.76 [i915]] WRPLL 1
Comment 23 Marta Löfstedt 2018-01-22 08:51:51 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4159/shard-hsw6/igt@gem_pwrite_pread@snooped-pwrite-blt-cpu_mmap-performance.html

last dmesg:
<7>[  291.743959] [IGT] gem_pwrite_pread: executing
<7>[  291.769455] [IGT] gem_pwrite_pread: starting subtest snooped-pwrite-blt-cpu_mmap-performance
<6>[  294.456840] r8169 0000:02:00.0 enp2s0: link up
Comment 24 Marta Löfstedt 2018-01-23 13:35:25 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4169/shard-hsw7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

run.log:
[04/77] skip: 2, pass: 2 \
running: igt/kms_plane/plane-panning-bottom-right-suspend-pipe-b-planes
...
Completed CI_IGT_test CI_DRM_3671/shard-hsw7/26 : FAILURE
CI_IGT_test runtime 544 seconds
Rebooting shard-hsw7

from dmesg:
<7>[   11.762538] [IGT] kms_plane: starting subtest plane-panning-bottom-right-suspend-pipe-B-planes
...
<6>[   14.606643] PM: suspend exit
<7>[   14.630453] [drm:drm_mode_addfb2] [FB:114]
<7>[   14.647374] [drm:drm_mode_setcrtc] [CRTC:47:pipe B]
<7>[   14.647397] [drm:drm_mode_setcrtc] [CONNECTOR:61:HDMI-A-1]
<6>[   14.741652] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[   14.759500] ata3.00: configured for UDMA/133

Note last dmesg at 14 seconds, however run.log is at 544 seconds. My guess is that the machine never came up after suspend.
Comment 25 Marta Löfstedt 2018-01-25 12:49:00 UTC
This is a Meta bug to capture all unexplained incompletes on HSW-shards
Comment 26 Marta Löfstedt 2018-02-06 12:48:08 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4224/shard-hsw6/igt@prime_busy@wait-hang-vebox.html

run.log doesn't match results.

last dmesg
<4>[  864.261735] Setting dangerous option reset - tainting kernel
<4>[  864.263116] Setting dangerous option reset - tainting kernel
<7>[  864.263153] [IGT] prime_busy: starting subtest wait-hang-vebox
Comment 28 Marta Löfstedt 2018-02-20 08:42:52 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4268/shard-hsw7/igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend.html

run.log:
running: igt/kms_vblank/pipe-a-ts-continuation-dpms-suspend

[29/97] skip: 11, pass: 18 /                               
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3800/shard-hsw7/8 : FAILURE
CI_IGT_test runtime 316 seconds
Rebooting shard-hsw7

from dmesg:
<7>[ 4400.744449] [IGT] kms_vblank: starting subtest pipe-A-ts-continuation-dpms-suspend
...
Last dmesg:
<7>[ 4400.845869] [drm:sandybridge_pcode_write_timeout [i915]] warning: pcode (write of 0x00000011 to mbox 11) mailbox access failed for hsw_write_dcomp [i915]: -6
<7>[ 4400.845922] [drm:hsw_write_dcomp [i915]] Failed to write to D_COMP
Comment 29 Marta Löfstedt 2018-03-01 07:10:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3853/shard-hsw8/igt@pm_rpm@system-suspend-modeset.html

run.log:
pass: igt/pm_rpm/system-suspend-modeset

[42/97] skip: 24, pass: 18 /
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3853/shard-hsw8/34 : FAILURE
CI_IGT_test runtime 367 seconds
Rebooting shard-hsw8

last dmesg.
<7>[ 1171.242000] [drm:sandybridge_pcode_write_timeout [i915]] warning: pcode (write of 0x00000011 to mbox 11) mailbox access failed for hsw_write_dcomp [i915]: -6
<7>[ 1171.242038] [drm:hsw_write_dcomp [i915]] Failed to write to D_COMP
<7>[ 1171.242158] [drm:drm_mode_setcrtc] [CRTC:57:pipe C]
<7>[ 1171.244745] [drm:intel_runtime_suspend [i915]] Device suspended
Comment 30 Marta Löfstedt 2018-03-01 09:46:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3854/shard-hsw6/igt@gem_partial_pwrite_pread@write-snoop.html

from dmesg:
<7>[  597.155364] [IGT] gem_exec_capture: starting subtest capture-bsd
<6>[  597.161484] [drm] GPU HANG: ecode 7:0:0x87fbfffe, reason: Manually setting wedged to 18446744073709551615, action: reset
<7>[  597.161631] [drm:i915_reset_device [i915]] resetting chip
<5>[  597.161726] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  597.162242] [drm:i915_gem_reset_engine [i915]] resetting vcs0 to restart from tail of request 0x1
<7>[  597.162488] [drm:sandybridge_pcode_read [i915]] warning: pcode (read from mbox 5) mailbox access failed for intel_enable_gt_powersave [i915]: -6
<7>[  597.162642] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 0
<7>[  597.165299] [drm:error_state_write [i915]] Resetting error state
...
<7>[  612.449304] [IGT] gem_partial_pwrite_pread: executing
<7>[  612.477141] [IGT] gem_partial_pwrite_pread: starting subtest write-snoop
Comment 31 Marta Löfstedt 2018-03-05 06:45:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4321/shard-hsw2/igt@kms_cursor_crc@cursor-64x21-sliding.html

run.log:
running: igt/kms_cursor_crc/cursor-64x21-sliding

[87/98] skip: 48, pass: 39 \                    
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3867/shard-hsw2/15 : FAILURE
CI_IGT_test runtime 589 seconds
Rebooting shard-hsw2

Last dmesg:
<7>[  739.939280] [IGT] kms_cursor_crc: starting subtest cursor-64x21-sliding
...
<7>[  749.084830] [drm:drm_dp_dual_mode_detect] DP dual mode HDMI ID:  (err -6)
<7>[  749.084835] [drm:drm_helper_hpd_irq_event] [CONNECTOR:81:HDMI-A-3] status updated from disconnected to disconnected
Comment 32 Marta Löfstedt 2018-03-13 07:19:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4349/shard-hsw8/igt@kms_cursor_crc@cursor-128x128-random.html

run.log:
running: igt/kms_cursor_crc/cursor-128x128-random

[10/98] skip: 3, pass: 7 -                       
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3911/shard-hsw8/27 : FAILURE
CI_IGT_test runtime 434 seconds
Rebooting shard-hsw8

Last dmesg:
<7>[ 7329.602401] [IGT] kms_cursor_crc: starting subtest cursor-128x128-random
...
<7>[ 7339.007192] [drm:verify_connector_state.isra.78 [i915]] [CONNECTOR:67:HDMI-A-1]
<7>[ 7339.007254] [drm:intel_atomic_commit_tail [i915]] [CRTC:39:pipe A]
<7>[ 7339.007303] [drm:verify_single_dpll_state.isra.79 [i915]] WRPLL 1
Comment 33 Marta Löfstedt 2018-03-13 07:22:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4347/shard-hsw7/igt@kms_flip@2x-flip-vs-modeset-vs-hang.html

run.log:
running: igt/kms_flip/2x-flip-vs-modeset-vs-hang

[77/98] skip: 36, pass: 41 /                    
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3911/shard-hsw7/7 : FAILURE
CI_IGT_test runtime 492 seconds
Rebooting shard-hsw7

From dmesg:
<7>[  307.586378] [IGT] kms_flip: starting subtest 2x-flip-vs-modeset-vs-hang
...
<7>[  311.601708] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<5>[  315.632005] i915 0000:00:02.0: Resetting chip after gpu hang
...
<7>[  319.597757] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<5>[  323.628692] i915 0000:00:02.0: Resetting chip after gpu hang
...
<7>[  327.593776] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<5>[  331.623944] i915 0000:00:02.0: Resetting chip after gpu hang
...
<7>[  331.815963] [drm:verify_single_dpll_state.isra.79 [i915]] LCPLL 2700
<7>[  331.816029] [drm:intel_atomic_commit_tail [i915]] [CRTC:39:pipe A]
Comment 34 Marta Löfstedt 2018-03-21 09:55:10 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_5/fi-hsw-4770/igt@gem_softpin@noreloc-s3.html

run.log:
running: igt/gem_softpin/noreloc-s3

[25/98] skip: 11, pass: 14 /       
FATAL: command execution failed
...
Completed CI_IGT_test drmtip_5/fi-hsw-4770/1 : FAILURE
CI_IGT_test runtime 344 seconds
Rebooting fi-hsw-4770

dmesg:
<7>[   78.247762] [IGT] gem_softpin: starting subtest noreloc-S3
...
<6>[   83.642390] ata4: SATA link down (SStatus 0 SControl 300)
<6>[   83.642579] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[   83.664477] ata2.00: configured for UDMA/133
Followed by "stray"
Comment 35 Marta Löfstedt 2018-03-26 06:54:45 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3972/shard-hsw7/igt@gem_ctx_isolation@rcs0-s3.html

run.log:
pass: igt/gem_ctx_isolation/rcs0-s3

[43/99] skip: 19, pass: 24 -
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3972/shard-hsw7/1 : FAILURE
CI_IGT_test runtime 227 seconds
Rebooting shard-hsw7

Last demsg:
<7>[   90.034857] [IGT] gem_ctx_isolation: executing
<4>[   90.051403] Setting dangerous option reset - tainting kernel
<7>[   90.054028] [IGT] gem_ctx_isolation: starting subtest rcs0-S3
<6>[   90.114976] PM: suspend entry (deep)
Comment 38 Marta Löfstedt 2018-04-09 07:38:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4414/shard-hsw7/igt@kms_cursor_crc@cursor-128x42-offscreen.html

run.log:
running: igt/kms_cursor_crc/cursor-128x42-offscreen

[23/75] skip: 5, pass: 18 \                        
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_4031/shard-hsw7/19 : FAILURE
CI_IGT_test runtime 281 seconds
Rebooting shard-hsw7

from dmesg:
<7>[ 2223.692476] [IGT] kms_cursor_crc: starting subtest cursor-128x42-offscreen
...
<7>[ 2234.204120] [drm:verify_connector_state.isra.77 [i915]] [CONNECTOR:67:HDMI-A-1]
<7>[ 2234.204148] [drm:intel_atomic_commit_tail [i915]] [CRTC:39:pipe A]
<7>[ 2234.204183] [drm:verify_single_dpll_state.isra.78 [i915]] WRPLL 1


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.