Bug 103665 - [CI][KBL only] igt@* - incomplete - timeout/system hang?
Summary: [CI][KBL only] igt@* - incomplete - timeout/system hang?
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks: 105984
  Show dependency treegraph
 
Reported: 2017-11-10 09:15 UTC by Marta Löfstedt
Modified: 2018-12-05 09:15 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: display/Other, GPU hang


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-11-10 09:15:40 UTC
These are on two different KBL machines, since they happended consecutively I file a special bug.

There are no obvious *ERROR* in dmesg

last dmesgs are:
<7>[  271.651321] [IGT] gem_workarounds: starting subtest suspend-resume

<7>[  269.786046] [IGT] gem_workarounds: starting subtest suspend-resume

run.logs:
Completed CI_IGT_test CI_DRM_3326@shard-kbl4 : FAILURE
CI_IGT_test runtime 310 seconds

Completed CI_IGT_test CI_DRM_3327@shard-kbl3 : FAILURE
CI_IGT_test runtime 308 seconds

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3326/shard-kbl4/igt@gem_workarounds@suspend-resume.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl3/igt@gem_workarounds@suspend-resume.html
Comment 1 Marta Löfstedt 2017-11-10 09:39:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3328/shard-kbl3/igt@pm_rpm@system-suspend-execbuf.html

No obvious *ERROR* in dmesg.

last dmesg:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3328/shard-kbl3/dmesg20.log

run.log:
Completed CI_IGT_test CI_DRM_3328@shard-kbl3 : FAILURE
CI_IGT_test runtime 147 seconds
Comment 2 Marta Löfstedt 2017-11-22 07:24:09 UTC
Since we expect bug 103163 and bug 103170 to be closed once we get DMC FW update, I use this bug for KBL incompletes that looks like system hangs and don't have the pattern in above bugs.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3370/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

dmesg:
<5>[   19.696372] owatch: Using watchdog device /dev/watchdog0
<5>[   19.696509] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   19.697337] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  209.002607] [drm:verify_connector_state.isra.75 [i915]] [CONNECTOR:58:DP-1]
<7>[  209.002634] [drm:intel_atomic_commit_tail [i915]] [CRTC:36:pipe A]
<7>[  209.002677] [drm:verify_single_dpll_state.isra.76 [i915]] DPLL 1
Followed by "stray"

run.log:
running: igt/kms_plane/plane-panning-bottom-right-suspend-pipe-a-planes

[41/72] skip: 11, pass: 29, fail: 1 /                                  
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3370/shard-kbl7/27 : FAILURE
CI_IGT_test runtime 248 seconds
Rebooting shard-kbl7
Comment 3 Marta Löfstedt 2017-11-24 07:28:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4010/shard-kbl5/igt@gem_userptr_blits@stress-mm-invalidate-close-overlap.html

This looks odd, very short dmesg:

<5>[   24.937078] owatch: Using watchdog device /dev/watchdog0
<5>[   24.937178] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   24.938027] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[   30.460793] [IGT] gem_pread: exiting, ret=0
<7>[   30.528405] [IGT] gem_userptr_blits: executing
<7>[   30.551011] [IGT] gem_userptr_blits: starting subtest stress-mm-invalidate-close-overlap

run.log no indication that any test was run:
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3378/shard-kbl5/15 : FAILURE
CI_IGT_test runtime 17 seconds
Rebooting shard-kbl5
Comment 4 Marta Löfstedt 2017-12-04 08:21:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3441/shard-kbl2/igt@gem_eio@in-flight.html

dmesg:
<5>[   12.528976] owatch: Using watchdog device /dev/watchdog0
<5>[   12.529228] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   12.529835] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  271.331786] [IGT] gem_eio: starting subtest in-flight
<4>[  271.331898] Setting dangerous option reset - tainting kernel
<7>[  285.883209] [drm:i915_reset_device [i915]] resetting chip
<5>[  285.883610] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  285.886116] [drm:i915_reset [i915]] GPU reset disabled
<4>[  285.906400] Setting dangerous option reset - tainting kernel
<7>[  285.907035] [drm:i915_reset_device [i915]] resetting chip
<5>[  285.907260] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  285.908730] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  285.908940] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[  285.909189] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  285.909436] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  285.909676] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[  285.909989] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<4>[  285.938110] Setting dangerous option reset - tainting kernel
<7>[  289.851921] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915], irq posted? yes, current seqno=46fe3, last=47024
<7>[  303.875949] [drm:i915_reset_device [i915]] resetting chip
<5>[  303.876091] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  303.876391] [drm:i915_reset [i915]] GPU reset disabled
<4>[  303.889041] Setting dangerous option reset - tainting kernel
<7>[  303.889222] [drm:i915_reset_device [i915]] resetting chip
<5>[  303.889289] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  303.889604] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  303.889637] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[  303.889686] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  303.890526] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  303.890579] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[  303.890629] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<4>[  303.910846] Setting dangerous option reset - tainting kernel

run.log:
running: igt/gem_tiled_swapping/non-threaded

[67/74] skip: 20, pass: 46, fail: 1 \       
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3441/shard-kbl2/33 : FAILURE
CI_IGT_test runtime 467 seconds
Rebooting shard-kbl2

Something weird is going on since the reproted test doesn't match run.log
Comment 5 Marta Löfstedt 2017-12-05 13:57:32 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3457/shard-kbl6/igt@kms_flip@dpms-off-confusion.html

last dmesg:
<7>[   27.876388] [drm:verify_connector_state.isra.77 [i915]] [CONNECTOR:59:DP-1]
<7>[   27.876415] [drm:intel_atomic_commit_tail [i915]] [CRTC:37:pipe A]
<7>[   27.876456] [drm:verify_single_dpll_state.isra.78 [i915]] DPLL 1

didn't even finish one test!

run.log:
Completed CI_IGT_test CI_DRM_3457/shard-kbl6/25 : FAILURE
CI_IGT_test runtime 16 seconds
Rebooting shard-kbl6
Comment 6 Marta Löfstedt 2017-12-07 09:21:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3459/shard-kbl1/igt@kms_cursor_crc@cursor-512x512-suspend.html

this is all in dmesg:
<5>[   23.516229] owatch: Using watchdog device /dev/watchdog0
<5>[   23.516426] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   23.517403] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
<6>[   26.995972] Console: switching to colour dummy device 80x25
<7>[   26.996171] [IGT] gem_exec_parallel: executing
<4>[   27.016640] Setting dangerous option reset - tainting kernel
<7>[   27.017642] [IGT] gem_exec_parallel: starting subtest bsd1-contexts
<7>[   28.598401] [IGT] gem_exec_parallel: exiting, ret=0
<7>[   28.667035] [IGT] gem_reg_read: executing
<7>[   28.684346] [IGT] gem_reg_read: starting subtest timestamp-moving
<7>[   29.684757] [IGT] gem_reg_read: exiting, ret=0

run.log has no results from any tests, could be network issues Jenkins reboot:

Completed CI_IGT_test CI_DRM_3459/shard-kbl1/26 : FAILURE
CI_IGT_test runtime 16 seconds
Rebooting shard-kbl1
Comment 7 Marta Löfstedt 2017-12-11 08:24:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3486/shard-kbl1/igt@kms_flip@vblank-vs-modeset-suspend.html

last dmesg:
<7>[  140.048434] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<7>[  140.049052] [drm:intel_power_well_disable [i915]] disabling always-on
<7>[  140.049233] [drm:intel_runtime_suspend [i915]] Suspending device
<7>[  140.069930] [drm:intel_runtime_suspend [i915]] Device suspended

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend

[26/76] skip: 12, pass: 13, fail: 1 -          
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3486/shard-kbl1/0 : FAILURE
CI_IGT_test runtime 180 seconds
Rebooting shard-kbl1
Comment 8 Marta Löfstedt 2017-12-11 08:25:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3480/shard-kbl2/igt@gem_exec_schedule@reorder-wide-bsd2.html

last dmesg:
<7>[  115.412973] [IGT] gem_exec_schedule: starting subtest reorder-wide-bsd2
<7>[  115.413338] [drm:vgem_gem_dumb_create [vgem]] Created object of size 1
<7>[  115.418134] [drm:vgem_gem_dumb_create [vgem]] Created object of size 1

run.log:
running: igt/gem_exec_schedule/reorder-wide-bsd2

[23/75] skip: 8, pass: 15 \                     
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3480/shard-kbl2/29 : FAILURE
CI_IGT_test runtime 107 seconds
Rebooting shard-kbl2
Comment 9 Marta Löfstedt 2018-01-02 08:37:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4078/shard-kbl1/igt@gem_exec_parallel@bsd1-contexts.html

<3>[  306.689187] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
Comment 10 Marta Löfstedt 2018-01-02 08:40:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3583/shard-kbl3/igt@drv_suspend@forcewake.html

run.log:
running: igt/drv_suspend/forcewake

[60/76] skip: 17, pass: 43 |      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3583/shard-kbl3/34 : FAILURE
CI_IGT_test runtime 221 seconds
Rebooting shard-kbl3

dmesg:
<7>[  178.371523] [IGT] syncobj_wait: starting subtest multi-wait-all-for-submit-submitted-signaled
<7>[  178.476302] [IGT] syncobj_wait: exiting, ret=0
<7>[  178.565855] [IGT] drv_suspend: executing
<7>[  178.573494] [IGT] drv_suspend: starting subtest forcewake
Comment 11 Marta Löfstedt 2018-01-10 14:06:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4127/shard-kbl5/igt@drv_suspend@debugfs-reader.html

run-log doesn't match results

last test in run.log:
igt/kms_flip/vblank-vs-suspend-interruptible

In dmesg there are multiple:
<3>[  120.763022] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

However starting at:
<7>[  122.314000] [IGT] drv_suspend: executing
<7>[  122.324981] [IGT] drv_suspend: starting subtest debugfs-reader-hibernate


<7>[  122.337568] [drm:intel_power_well_enable [i915]] enabling DDI A/E IO power well
<7>[  122.337609] [drm:intel_power_well_enable [i915]] enabling DDI C IO power well
<7>[  122.337624] [drm:intel_power_well_enable [i915]] enabling DDI D IO power well
<7>[  122.337640] [drm:intel_power_well_disable [i915]] disabling DDI D IO power well
<7>[  122.337655] [drm:intel_power_well_disable [i915]] disabling DDI C IO power well
<7>[  122.337670] [drm:intel_power_well_disable [i915]] disabling DDI A/E IO power well

there is a lot of spamming from intel_power_well_enable/intel_power_well_disable, then some genuine logs some e1000e HArdware unit hangs and more intel_power_well_enable/intel_power_well_disable spamming.
Comment 12 Marta Löfstedt 2018-01-19 08:41:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4156/shard-kbl1/igt@gem_mmap_gtt@basic-small-bo-tiledy.html

Last test in run-log doesn't match results:

running: igt/gem_eio/hibernate       

[47/78] skip: 18, pass: 28, fail: 1 \
FATAL: command execution failed
java.io.EOFException

From dmesg:
<7>[  121.402312] [IGT] gem_eio: starting subtest hibernate
...
<7>[  169.797217] [drm:drm_helper_hpd_irq_event] [CONNECTOR:72:HDMI-A-1] status updated from disconnected to disconnected

However, there is a pstore full of ftrace, so it is not Jenkins who killed on network issues.
Comment 13 Marta Löfstedt 2018-01-22 08:42:45 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3665/shard-kbl5/igt@pm_rpm@system-suspend-modeset.html

run.log:
running: igt/pm_rpm/system-suspend-modeset

[73/78] skip: 25, pass: 47, fail: 1 /     
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3665/shard-kbl5/19 : FAILURE
CI_IGT_test runtime 451 seconds
Rebooting shard-kbl5

last dmesg:
<7>[  403.765756] [drm:drm_mode_setcrtc] [CRTC:47:pipe B]
<7>[  403.766071] [drm:drm_mode_setcrtc] [CRTC:57:pipe C]
<7>[  403.771884] [drm:intel_runtime_suspend [i915]] Device suspended
Comment 14 Marta Löfstedt 2018-01-22 08:43:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3658/shard-kbl1/igt@drv_suspend@fence-restore-untiled.html

run.log:
running: igt/drv_suspend/fence-restore-untiled

[55/78] skip: 17, pass: 37, fail: 1 \         
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3658/shard-kbl1/7 : FAILURE
CI_IGT_test runtime 533 seconds
Rebooting shard-kbl1
Comment 15 Marta Löfstedt 2018-01-25 12:48:21 UTC
This is a Meta bug to capture all unexplained incompletes on SNB
Comment 16 Marta Löfstedt 2018-01-25 15:14:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4181/shard-kbl1/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

From dmesg:
<7>[   66.236954] [IGT] kms_vblank: starting subtest pipe-B-ts-continuation-dpms-suspend
...
<7>[   66.412476] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 02
<7>[   66.412991] [drm:intel_power_well_disable [i915]] disabling always-on
Comment 17 Marta Löfstedt 2018-01-25 15:21:57 UTC
(In reply to Marta Löfstedt from comment #15)
> This is a Meta bug to capture all unexplained incompletes on SNB

No it is for KBL
Comment 18 Marta Löfstedt 2018-02-06 07:59:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4221/shard-kbl3/igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend.html

from dmesg:
<7>[  130.874005] [IGT] kms_vblank: starting subtest pipe-A-ts-continuation-dpms-suspend
...
<7>[  130.969737] [drm:intel_power_well_disable [i915]] disabling always-on
<7>[  130.970500] [drm:intel_runtime_suspend [i915]] Suspending device
<7>[  130.979037] [drm:intel_runtime_suspend [i915]] Device suspended
Comment 20 Marta Löfstedt 2018-03-01 07:11:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3853/shard-kbl4/igt@gem_workarounds@suspend-resume-fd.html

run.log:
running: igt/gem_workarounds/suspend-resume-fd

[71/98] skip: 22, pass: 48, dmesg-warn: 1 \   
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3853/shard-kbl4/10 : FAILURE
CI_IGT_test runtime 248 seconds
Rebooting shard-kbl4

Last dmesg:
<7>[  213.133863] [IGT] gem_workarounds: executing
<7>[  213.136136] [IGT] gem_workarounds: starting subtest suspend-resume-fd
Comment 21 Marta Löfstedt 2018-03-06 06:41:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3876/fi-kbl-7560u/igt@gem_exec_suspend@basic-s3.html

run.log:
running: igt/gem_exec_suspend/basic-s3

[107/288] skip: 3, pass: 104 \        
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3876/fi-kbl-7560u/0 : FAILURE
CI_IGT_test runtime 217 seconds
Rebooting fi-kbl-7560u

Last dmesg:
<7>[  186.218027] [IGT] gem_exec_suspend: executing
<4>[  186.222484] Setting dangerous option reset - tainting kernel
<7>[  186.223715] [IGT] gem_exec_suspend: starting subtest basic-S3
<7>[  186.224426] [drm:intel_power_well_enable [i915]] enabling DC off
<7>[  186.224454] [drm:gen9_set_dc_state [i915]] Setting DC state from 02 to 00
Comment 22 Marta Löfstedt 2018-03-16 08:12:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-kbl-7567u/igt@drv_suspend@forcewake.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-skl-guc/igt@drv_suspend@forcewake.html

Last dmesg:
<7>[  106.875290] [IGT] drv_suspend: executing
<7>[  106.909787] [IGT] drv_suspend: starting subtest forcewake
<6>[  107.051760] PM: suspend entry (deep)
Comment 23 Marta Löfstedt 2018-03-19 07:45:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-kbl-7567u/igt@pm_rpm@system-suspend.html

run.log:
running: igt/pm_rpm/system-suspend   

[56/97] skip: 26, pass: 29, fail: 1 |
FATAL: command execution failed
...
Completed CI_IGT_test drmtip_2/fi-kbl-7567u/26 : FAILURE
CI_IGT_test runtime 543 seconds
Rebooting fi-kbl-7567u

Last dmesg:
<7>[   78.359105] [drm:drm_mode_setcrtc] [CRTC:69:pipe C]
<7>[   78.359245] [drm:intel_runtime_suspend [i915]] Suspending device
<7>[   78.361217] [drm:intel_runtime_suspend [i915]] Device suspended
Comment 24 Marta Löfstedt 2018-04-09 13:03:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-kbl-7567u/igt@kms_cursor_legacy@pipe-b-single-move.html

there is a partian backtrace in pstore, indicating some filesystem issue, but unfortunately this has been overwritten by ftrace as usual.

 <4>[  100.756196]  __ext4_new_inode+0x3cf/0x16a0
<4>[  100.756206]  ext4_tmpfile+0x50/0x160
<4>[  100.756212]  vfs_tmpfile+0x67/0xe0
<4>[  100.756215]  path_openat+0x78e/0xb10
<4>[  100.756222]  do_filp_open+0x96/0x110
<4>[  100.756231]  ? __alloc_fd+0xe0/0x1e0
<4>[  100.756293]  ? _raw_spin_unlock+0x29/0x40
<4>[  100.756297]  ? do_sys_open+0x1b8/0x240
<4>[  100.756299]  do_sys_open+0x1b8/0x240
<4>[  100.756303]  do_syscall_64+0x6b/0x1b0
<4>[  100.756306]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
<4>[  100.756307] RIP: 0033:0x7f3e33ccdc8e
<4>[  100.756308] RSP: 002b:00007ffc03aec7b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
<4>[  100.756310] RAX: ffffffffffffffda RBX: 0000564754458ec0 RCX: 00007f3e33ccdc8e
<4>[  100.756311] RDX: 0000000000410082 RSI: 00007f3e33d71fd1 RDI: 00000000ffffff9c
<4>[  100.756312] RBP: 00007ffc03aef180 R08: 00007ffc03aed7a0 R09: 00007f3e333152b0
<4>[  100.756313] R10: 0000000000000180 R11: 0000000000000246 R12: 00007ffc03af0e77
<4>[  100.756314] R13: 000000000000006e R14: 00005647544558c8 R15: 0000000000000000
<0>[  100.756395] Dumping ftrace buffer:
Comment 26 Martin Peres 2018-05-03 15:40:59 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-kbl-7567u/igt@gem_eio@hibernate.html

<0>[  169.203000] ---------------------------------
<4>[  169.203001] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp btusb btrtl btbcm crct10dif_pclmul snd_hda_codec btintel crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel bluetooth snd_pcm ecdh_generic e1000e mei_me mei prime_numbers [last unloaded: i915]
<4>[  169.203023] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            4.17.0-rc3-g844dd95837ab-drmtip_30+ #1
<4>[  169.203025] Hardware name:  /NUC7i7BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4>[  169.203053] RIP: 0010:execlists_submission_tasklet+0x573/0x1570 [i915]
<4>[  169.203055] RSP: 0018:ffff8c11bed03ea8 EFLAGS: 00010286
<4>[  169.203056] RAX: 0000000000000012 RBX: 0000000000000018 RCX: 0000000000000000
<4>[  169.203058] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8c11b5a074e8
<4>[  169.203059] RBP: ffff8c11bed03f18 R08: 00000000001418cd R09: ffff8c11b5b9e000
<4>[  169.203060] R10: 0000000000000000 R11: ffff8c11b5a074e8 R12: ffff8c11af60505c
<4>[  169.203061] R13: 0000000000000003 R14: ffff8c11af605040 R15: ffff8c1199872158
<4>[  169.203062] FS:  0000000000000000(0000) GS:ffff8c11bed00000(0000) knlGS:0000000000000000
<4>[  169.203063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  169.203064] CR2: 00007f837ff1ad30 CR3: 0000000204210004 CR4: 00000000003606e0
<4>[  169.203065] Call Trace:
<4>[  169.203067]  <IRQ>
<4>[  169.203071]  ? lock_acquire+0xa6/0x210
<4>[  169.203075]  tasklet_action_common.isra.5+0x47/0xb0
<4>[  169.203078]  __do_softirq+0xc1/0x4e1
<4>[  169.203080]  ? _raw_spin_unlock+0x29/0x40
<4>[  169.203083]  irq_exit+0xa4/0xb0
<4>[  169.203085]  do_IRQ+0x9a/0x120
<4>[  169.203087]  common_interrupt+0xf/0xf
<4>[  169.203089]  </IRQ>
<4>[  169.203091] RIP: 0010:cpuidle_enter_state+0xac/0x360
<4>[  169.203092] RSP: 0018:ffff92fdc00bfe90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde
<4>[  169.203094] RAX: ffff8c11b5a8a800 RBX: 0000000000001983 RCX: 0000000000000000
<4>[  169.203095] RDX: 0000000000000046 RSI: ffffffff840fa519 RDI: ffffffff840a77bf
<4>[  169.203096] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000000
<4>[  169.203097] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff842963d8
<4>[  169.203098] R13: ffffb2fdbfd00230 R14: 0000000000000000 R15: 00000027643459e4
<4>[  169.203105]  do_idle+0x1f3/0x250
<4>[  169.203107]  cpu_startup_entry+0x6a/0x70
<4>[  169.203111]  start_secondary+0x198/0x1e0
<4>[  169.203114]  secondary_startup_64+0xa5/0xb0
<4>[  169.203117] Code: e8 8c 82 a5 c2 48 8b 35 c4 1c 19 00 49 c7 c0 f0 21 7a c0 b9 2d 04 00 00 48 c7 c2 50 bc 76 c0 48 c7 c7 2e 50 6a c0 e8 9d ec ab c2 <0f> 0b 4c 8d a3 68 15 00 00 4c 89 e7 e8 7c 24 2a c3 83 83 40 15 
<1>[  169.203188] RIP: execlists_submission_tasklet+0x573/0x1570 [i915] RSP: ffff8c11bed03ea8
<4>[  169.203201] ---[ end trace e365e338da1d127b ]---
Comment 31 Martin Peres 2018-07-18 12:27:38 UTC
On KBLg, we got an incomplete which did not fit the size of the pstore.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4454/fi-kbl-8809g/igt@drv_module_reload@basic-reload.html

<0>[  395.846449] ---------------------------------
<4>[  395.846452] CR2: 0000000000000000
<4>[  396.050258] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20
<4>[  396.050263] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3 
<4>[  396.050316] RSP: 0018:ffffc900010c3d20 EFLAGS: 00010202
<4>[  396.050320] RAX: 6b6b6b6b6b6b6b6b RBX: ffff880270118008 RCX: 0000000000000001
<4>[  396.050325] RDX: 0000000000000000 RSI: 00000000aafad925 RDI: ffff880270118008
<4>[  396.050330] RBP: ffff8802591db3f8 R08: 00000000d381941d R09: 0000000000000001
<4>[  396.050335] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880270118560
<4>[  396.050339] R13: 0000000000000000 R14: ffffffffa0194a30 R15: 0000000000000008
<4>[  396.050344] FS:  00007fd3a0fe84c0(0000) GS:ffff88027ecc0000(0000) knlGS:0000000000000000
<4>[  396.050350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  396.050354] CR2: 0000000000000000 CR3: 000000026b5d0005 CR4: 00000000003606e0
<0>[  396.050359] Kernel panic - not syncing: Fatal exception
<0>[  396.050433] Dumping ftrace buffer:
<0>[  396.050436]    (ftrace buffer empty)
<0>[  396.050440] Kernel Offset: disabled
Comment 59 Francesco Balestrieri 2018-11-23 10:56:55 UTC
Setting to medium. The only thing we can do here is hope that some of the other bug fixes will help.
Comment 60 Lakshmi 2018-11-29 09:31:53 UTC
Few more failures in BAT with no proper logs

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5206/fi-kbl-7560u/igt@gem_ctx_create@basic-files.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5213/fi-kbl-7560u/igt@gem_basic@create-fd-close.html

With no proper logs, couldn't change the priority even though it's a BAT failure.
Comment 61 Chris Wilson 2018-12-05 09:15:53 UTC
Optimistically,


commit 4a15c75c42460252a63d30f03b4766a52945fb47
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Mon Dec 3 13:33:41 2018 +0000

    drm/i915: Introduce per-engine workarounds
    
    We stopped re-applying the GT workarounds after engine reset since commit
    59b449d5c82a ("drm/i915: Split out functions for different kinds of
    workarounds").
    
    Issue with this is that some of the GT workarounds live in the MMIO space
    which gets lost during engine resets. So far the registers in 0x2xxx and
    0xbxxx address range have been identified to be affected.
    
    This losing of applied workarounds has obvious negative effects and can
    even lead to hard system hangs (see the linked Bugzilla).
    
    Rather than just restoring this re-application, because we have also
    observed that it is not safe to just re-write all GT workarounds after
    engine resets (GPU might be live and weird hardware states can happen),
    we introduce a new class of per-engine workarounds and move only the
    affected GT workarounds over.
    
    Using the framework introduced in the previous patch, we therefore after
    engine reset, re-apply only the workarounds living in the affected MMIO
    address ranges.
    
    v2:
     * Move Wa_1406609255:icl to engine workarounds as well.
     * Rename API. (Chris Wilson)
     * Drop redundant IS_KABYLAKE. (Chris Wilson)
     * Re-order engine wa/ init so latest platforms are first. (Rodrigo Vivi)
    
    Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=107945
    Fixes: 59b449d5c82a ("drm/i915: Split out functions for different kinds of workarounds")
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: intel-gfx@lists.freedesktop.org
    Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181203133341.10258-1-tvrtko.ursulin@linux.intel.com


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.