On CI_DRM_3126 new IGT test: igt@drv_selftest@live_hangcheck triggers softdog: <7>[ 313.752576] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <3>[ 314.725474] Failed to start request b <0>[ 348.422049] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [swapper/4:0] <4>[ 348.422074] Modules linked in: i915(+) snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm mei_me mii lpc_ich mei prime_numbers [last unloaded: i915] <4>[ 348.422146] irq event stamp: 15454047 <4>[ 348.422152] hardirqs last enabled at (15454046): [<ffffffff819107bd>] restore_regs_and_iret+0x0/0x1d <4>[ 348.422156] hardirqs last disabled at (15454047): [<ffffffff819117e5>] apic_timer_interrupt+0x95/0xa0 <4>[ 348.422161] softirqs last enabled at (12679772): [<ffffffff81085251>] _local_bh_enable+0x21/0x40 <4>[ 348.422165] softirqs last disabled at (12679773): [<ffffffff81085645>] irq_exit+0xb5/0xd0 <4>[ 348.422169] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U 4.14.0-rc1-CI-CI_DRM_3126+ #1 <4>[ 348.422173] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 <4>[ 348.422176] task: ffff88040d5a8040 task.stack: ffffc900000ac000 <4>[ 348.422180] RIP: 0010:__do_softirq+0xa3/0x4e2 <4>[ 348.422183] RSP: 0018:ffff88041fb03f58 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff10 <4>[ 348.422190] RAX: 00000000ffffffff RBX: ffff88040d5a8040 RCX: 0000000000000000 <4>[ 348.422194] RDX: 0000000000000000 RSI: ffffffff81d0ddbc RDI: ffffffff81cc1bee <4>[ 348.422197] RBP: ffff88041fb03fb8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 348.422200] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 348.422203] R13: ffff88040d5a8040 R14: 0000000000000000 R15: 0000000000000000 <4>[ 348.422207] FS: 0000000000000000(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 <4>[ 348.422211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 348.422214] CR2: 00007fc893427000 CR3: 0000000402329002 CR4: 00000000001606e0 <4>[ 348.422217] Call Trace: <4>[ 348.422221] <IRQ> <4>[ 348.422228] irq_exit+0xb5/0xd0 <4>[ 348.422232] smp_apic_timer_interrupt+0x9e/0x2e0 <4>[ 348.422236] apic_timer_interrupt+0x9a/0xa0 <4>[ 348.422239] </IRQ> <4>[ 348.422244] RIP: 0010:tick_nohz_idle_exit+0x114/0x180 <4>[ 348.422248] RSP: 0018:ffffc900000afed0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff10 <4>[ 348.422254] RAX: ffff88040d5a8040 RBX: ffff88040d5a8040 RCX: 0000000000000001 <4>[ 348.422258] RDX: 0000000000000000 RSI: ffffffff81d0ddbc RDI: ffffffff81cc1bee <4>[ 348.422261] RBP: ffffc900000afed8 R08: 0000000000000000 R09: 0000000000000001 <4>[ 348.422264] R10: 0000000000000000 R11: 0000000000000000 R12: 0000004b38d4ed68 <4>[ 348.422268] R13: ffff88040d5a8040 R14: 0000000000000000 R15: 0000000000000000 <4>[ 348.422277] do_idle+0x13d/0x1e0 <4>[ 348.422282] cpu_startup_entry+0x1d/0x20 <4>[ 348.422286] start_secondary+0x11c/0x140 <4>[ 348.422291] secondary_startup_64+0xa5/0xa5 <4>[ 348.422299] Code: 00 00 e8 11 ac 7c ff c7 45 c8 0a 00 00 00 48 89 5d a8 48 c7 c0 40 86 01 00 65 c7 00 00 00 00 00 e8 23 76 7c ff fb b8 ff ff ff ff <48> c7 45 c0 00 51 e0 81 0f bc 45 d4 83 c0 01 89 45 d0 75 6a e9 <0>[ 348.422498] Kernel panic - not syncing: softlockup: hung tasks <4>[ 348.422517] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U L 4.14.0-rc1-CI-CI_DRM_3126+ #1 <4>[ 348.422542] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 <4>[ 348.422564] Call Trace: <4>[ 348.422574] <IRQ> <4>[ 348.422585] dump_stack+0x68/0x9f <4>[ 348.422599] panic+0xd4/0x21d <4>[ 348.422614] watchdog_timer_fn+0x289/0x290 <4>[ 348.422631] __hrtimer_run_queues+0xed/0x4d0 <4>[ 348.422646] ? __touch_watchdog+0x30/0x30 <4>[ 348.422662] hrtimer_interrupt+0xc1/0x220 <4>[ 348.422679] smp_apic_timer_interrupt+0x7d/0x2e0 <4>[ 348.422695] apic_timer_interrupt+0x9a/0xa0 <4>[ 348.422710] RIP: 0010:__do_softirq+0xa3/0x4e2 <4>[ 348.422724] RSP: 0018:ffff88041fb03f58 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff10 <4>[ 348.422750] RAX: 00000000ffffffff RBX: ffff88040d5a8040 RCX: 0000000000000000 <4>[ 348.422771] RDX: 0000000000000000 RSI: ffffffff81d0ddbc RDI: ffffffff81cc1bee <4>[ 348.422793] RBP: ffff88041fb03fb8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 348.422814] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 348.422835] R13: ffff88040d5a8040 R14: 0000000000000000 R15: 0000000000000000 <4>[ 348.422861] ? __do_softirq+0x9d/0x4e2 <4>[ 348.422878] irq_exit+0xb5/0xd0 <4>[ 348.422890] smp_apic_timer_interrupt+0x9e/0x2e0 <4>[ 348.422906] apic_timer_interrupt+0x9a/0xa0 <4>[ 348.422920] </IRQ> <4>[ 348.422931] RIP: 0010:tick_nohz_idle_exit+0x114/0x180 <4>[ 348.422947] RSP: 0018:ffffc900000afed0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff10 <4>[ 348.422973] RAX: ffff88040d5a8040 RBX: ffff88040d5a8040 RCX: 0000000000000001 <4>[ 348.422994] RDX: 0000000000000000 RSI: ffffffff81d0ddbc RDI: ffffffff81cc1bee <4>[ 348.423015] RBP: ffffc900000afed8 R08: 0000000000000000 R09: 0000000000000001 <4>[ 348.423036] R10: 0000000000000000 R11: 0000000000000000 R12: 0000004b38d4ed68 <4>[ 348.423058] R13: ffff88040d5a8040 R14: 0000000000000000 R15: 0000000000000000 <4>[ 348.423084] do_idle+0x13d/0x1e0 <4>[ 348.423099] cpu_startup_entry+0x1d/0x20 <4>[ 348.423113] start_secondary+0x11c/0x140 <4>[ 348.423128] secondary_startup_64+0xa5/0xa5 <0>[ 348.423382] Kernel Offset: disabled https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3126/shard-hsw6/igt@drv_selftest@live_hangcheck.html
NOTE: the referred pstore file is identical to the one in BUG 102974.
*** Bug 102974 has been marked as a duplicate of this bug. ***
<marta_> Adrinael, you mentioned something about being wrong testlist for shards on CI_DRM_3026, could you elaborate. I have already filed bugs for this run... <Adrinael> CI_DRM_3126 <Adrinael> It was running everything ever on accident <Adrinael> ivyl, ^ right? <marta_> but it was only 3 new drv_selftests and 3 new gem tests, for sure we have more than that blacklisted <ivyl> yep, due to elaborated nature of deployment method, and streamilining it to use just "make install" an inevitable error occured on the human-Jenkins boundary. <ivyl> marta_: it run with ALL ALL, but it got cancelled pretty quickly <ivyl> and then rerun properly <ivyl> what you see is the merge of both <Adrinael> tools_test@* got "broken" by make install -deployment btw * Weine (~dweineha@134.134.139.76) has joined <ivyl> as jenkins haven't cleaned staging area for results <Adrinael> marta_, if you file a bug on igt@tools_test@tools_test, make it an IGT bug <ivyl> so sorry about confusion, it wasn't intended and I hoped the rerun will fix it <ivyl> but as you can see we have the few leftovers <marta_> OK, I will archive if needed when I results from the next run. * Ahuj (Thunderbir@nat/intel/x-ngcmqbhsvahecvri) has joined <ivyl> results from -27 already came in and they look clean, we also should have results for -28 in half an hour or so
Fix here: https://patchwork.freedesktop.org/series/30419/
*** Bug 102970 has been marked as a duplicate of this bug. ***
commit 87dc03ad268f285065cdd2e2ac75701a1f04d0b8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 15 14:09:29 2017 +0100 drm/i915/selftests: Try to recover from a wedged GPU during reset tests If we see the seqno stop progressing, we abandon the test for fear that the GPU died following the reset. However, during test teardown we still wait for the GPU to idle before continuing, but we have already confirmed that the GPU is dead. Furthermore, since we are inside a reset test, we have disabled the hangchecker, and so there is no safety net and we wait indefinitely. Detect the stuck GPU and declare it wedged as a state of emergency so we can escape. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jari Tahvanainen <jari.tahvanainen@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170915130929.18892-1-chris@chris-wilson.co.uk Tested-by: Jari Tahvanainen <jari.tahvanainen@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Closing. According to CI results, this tests hasn't been failed on HSW for a while now.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.