Summary: | [BAT] [GLK-DSI only] igt@gem_* - Failed assertion: !"GPU hung" - and its aftermath | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Kimmo Nikkanen <knikkane> |
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | high | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | GLK | i915 features: | GEM/Other |
Description
Marta Löfstedt
2017-10-30 11:12:55 UTC
Here is a patchwork example, although the patch was about enabling runtime_pm, this pattern is repeating itself. starting at: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@gem_exec_nop@basic-series.html fail: (gem_exec_nop:1852) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_nop:1852) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-series failed. dmesg: [ 226.781384] Setting dangerous option reset - tainting kernel [ 243.856241] i915 0000:00:02.0: Resetting chip after gpu hang [ 245.063521] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 245.063716] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 then: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html fail: (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406: (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Failed assertion: !wedged (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor Subtest hang-read-crc-pipe-A failed. and finally: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html dmesg-warn: [ 375.382594] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 375.382740] WARN_ON(reset && reset != -19) [ 375.382776] ------------[ cut here ]------------ [ 375.382812] WARNING: CPU: 2 PID: 2383 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915] [ 375.382814] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 375.382892] CPU: 2 PID: 2383 Comm: kworker/u8:7 Tainted: G U 4.14.0-rc8-CI-Patchwork_6996+ #1 [ 375.382894] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 375.382899] Workqueue: events_unbound async_run_entry_fn [ 375.382903] task: ffff88016e980040 task.stack: ffffc90002164000 [ 375.382937] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915] [ 375.382939] RSP: 0018:ffffc90002167c58 EFLAGS: 00010292 [ 375.382943] RAX: 000000000000001e RBX: ffff880166e00000 RCX: 0000000000000006 [ 375.382945] RDX: 0000000000000006 RSI: ffffffff81d0ed64 RDI: ffffffff81cc294e [ 375.382947] RBP: ffffc90002167c68 R08: 0000000000000000 R09: 0000000000000001 [ 375.382949] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880166e00070 [ 375.382951] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cea9cf [ 375.382954] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 375.382957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.382959] CR2: 000055b4ccc93068 CR3: 0000000003e0f000 CR4: 00000000003406e0 [ 375.382961] Call Trace: [ 375.382997] i915_gem_suspend+0x111/0x170 [i915] [ 375.383027] i915_drm_suspend+0x6d/0x170 [i915] [ 375.383057] i915_pm_suspend+0x28/0x40 [i915] [ 375.383063] pci_pm_suspend+0x78/0x140 [ 375.383068] dpm_run_callback+0x6f/0x310 [ 375.383072] ? pci_pm_freeze+0xf0/0xf0 [ 375.383077] __device_suspend+0x102/0x380 [ 375.383081] ? dpm_watchdog_set+0x70/0x70 [ 375.383087] async_suspend+0x1f/0xa0 [ 375.383091] async_run_entry_fn+0x38/0x160 [ 375.383096] process_one_work+0x221/0x650 [ 375.383103] worker_thread+0x4e/0x3c0 [ 375.383108] kthread+0x114/0x150 [ 375.383111] ? process_one_work+0x650/0x650 [ 375.383114] ? kthread_create_on_node+0x40/0x40 [ 375.383119] ret_from_fork+0x27/0x40 [ 375.383127] Code: 5d c3 be ff ff ff ff 48 89 df e8 2a e3 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 58 aa 26 a0 48 c7 c7 a4 7a 25 a0 e8 4f df f7 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 fc a0 79 e1 48 89 df [ 375.383247] ---[ end trace 7978d132da79d715 ]--- On CI_DRM_3321 the issue started here: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html (gem_exec_flush:1727) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1727) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-batch-kernel-default-uc failed. Then dmesg is filled with: [ 144.151041] [drm:gen8_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 this cause a lot of following tests to either be skipped or dmesg-warn. but then from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html all the following tests pass. However, the machine did not boot up for CI_DRM_3322 and CI_DRM_3323. When rebooting manually in the lab the display was full of garbage. Here is another example: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@gem_exec_flush@basic-wb-prw-default.html (gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-prw-default failed. all tests are skipped until: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406: (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Failed assertion: !wedged (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor Subtest hang-read-crc-pipe-A failed. then fail the following tests and incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@pm_rpm@basic-rte.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_491/fi-glk-dsi/igt@gem_ctx_switch@basic-default.html (gem_ctx_switch:1560) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1560) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default failed. skip a lot of tests, fail some, skip some and this time actually no incomplete. Here is another: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3976/fi-glk-dsi/igt@gem_exec_flush@basic-wb-pro-default.html Rising priority since it is BAT. https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html fail: (gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-uc-set-default failed. the a bunch of skips then fail on: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html due to: (kms_pipe_crc_basic:2287) igt-gt-CRITICAL: Failed assertion: !wedged then https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html [ 343.853966] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 343.854132] WARN_ON(reset && reset != -19) [ 343.854169] ------------[ cut here ]------------ [ 343.854205] WARNING: CPU: 2 PID: 2335 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915] [ 343.854207] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 343.854263] CPU: 2 PID: 2335 Comm: kworker/u8:4 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3331+ #1 [ 343.854265] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 343.854273] Workqueue: events_unbound async_run_entry_fn [ 343.854277] task: ffff88017a41cec0 task.stack: ffffc90002218000 [ 343.854309] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915] [ 343.854312] RSP: 0018:ffffc9000221bc58 EFLAGS: 00010292 [ 343.854316] RAX: 000000000000001e RBX: ffff880167100000 RCX: 0000000000000006 [ 343.854318] RDX: 0000000000000006 RSI: ffffffff81d11314 RDI: ffffffff81cc3dee [ 343.854320] RBP: ffffc9000221bc68 R08: 0000000000000000 R09: 0000000000000001 [ 343.854322] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880167100070 [ 343.854324] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cecf7f [ 343.854327] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 343.854329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 343.854331] CR2: 000055b19f0a4068 CR3: 0000000003e0f000 CR4: 00000000003406e0 [ 343.854334] Call Trace: [ 343.854368] i915_gem_suspend+0x111/0x170 [i915] [ 343.854398] i915_drm_suspend+0x6d/0x170 [i915] [ 343.854429] i915_pm_suspend+0x28/0x40 [i915] [ 343.854434] pci_pm_suspend+0x78/0x140 [ 343.854440] dpm_run_callback+0x6f/0x310 [ 343.854444] ? pci_pm_freeze+0xf0/0xf0 [ 343.854449] __device_suspend+0x102/0x380 [ 343.854453] ? dpm_watchdog_set+0x70/0x70 [ 343.854460] async_suspend+0x1f/0xa0 [ 343.854463] async_run_entry_fn+0x38/0x160 [ 343.854469] process_one_work+0x221/0x650 [ 343.854475] worker_thread+0x4e/0x3c0 [ 343.854481] kthread+0x114/0x150 [ 343.854484] ? process_one_work+0x650/0x650 [ 343.854487] ? kthread_create_on_node+0x40/0x40 [ 343.854493] ret_from_fork+0x27/0x40 [ 343.854501] Code: 5d c3 be ff ff ff ff 48 89 df e8 ca e5 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 a8 fa 24 a0 48 c7 c7 a4 ca 23 a0 e8 6f 90 f9 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 9c a1 7b e1 48 89 df [ 343.854622] ---[ end trace 92cf58358a865d76 ]--- https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3366/fi-glk-dsi/igt@gem_sync@basic-store-each.html (gem_sync:3613) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3613) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-each failed. Above doesn't cause any after effect. But that is maybe due to test coming after not being sensitive to wedged GPU. https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-wb.html (gem_exec_flush:1784) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1784) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-batch-kernel-default-wb failed. [ 152.381045] Setting dangerous option reset - tainting kernel [ 164.786879] i915 0000:00:02.0: Resetting chip after gpu hang [ 165.992480] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 165.992686] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 then a lot of skips and incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_sync@basic-store-each.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3371/fi-glk-dsi/igt@gem_sync@basic-store-all.html (gem_sync:3559) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3559) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-all failed. no other tests affected. https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4005/fi-glk-dsi/igt@gem_sync@basic-all.html (gem_sync:3498) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3498) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-all failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@gem_sync@basic-many-each.html (gem_sync:3475) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3475) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-many-each failed. Note this dmesg has more information: [ 314.719246] Setting dangerous option reset - tainting kernel [ 316.762341] general protection fault: 0000 [#1] PREEMPT SMP [ 316.762366] Dumping ftrace buffer: [ 316.762379] --------------------------------- [ 316.762487] CPU:3 [LOST 255731 EVENTS] gem_sync-3495 3..s1 316076943us : execlists_submission_tasklet: bcs0 in[0]: ctx=2.2, seqno=238ed ... [ 316.795368] gem_sync-3526 1..s. 316804524us : execlists_submission_tasklet: bcs0 out[0]: ctx=2.2, seqno=239ff [ 316.795396] --------------------------------- [ 316.795411] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mei mii prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 316.795535] CPU: 3 PID: 3483 Comm: gem_sync Tainted: G U 4.14.0-CI-CI_DRM_3377+ #1 [ 316.795559] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 316.795588] task: ffff8801719a2780 task.stack: ffffc90000df4000 [ 316.795612] RIP: 0010:blk_flush_plug_list+0x54/0x270 [ 316.795628] RSP: 0018:ffffc90000df7b38 EFLAGS: 00010292 [ 316.795645] RAX: 0010000000000020 RBX: ffffc90000df7b50 RCX: 0000000000000000 [ 316.795665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0010000000000000 [ 316.795685] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001 [ 316.795704] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200 [ 316.795724] R13: dead000000000100 R14: 0010000000000000 R15: ffff8801681eec40 [ 316.795745] FS: 00007fd4aa2dc700(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 [ 316.795768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 316.795785] CR2: 00007fd4af1b6010 CR3: 0000000171ee0000 CR4: 00000000003406e0 [ 316.795804] Call Trace: [ 316.795822] io_schedule_prepare+0x3c/0x40 [ 316.795839] io_schedule_timeout+0xf/0x40 [ 316.795911] i915_wait_request+0x33c/0x830 [i915] [ 316.795932] ? wake_up_q+0x70/0x70 [ 316.795945] ? wake_up_q+0x70/0x70 [ 316.796017] i915_gem_object_wait_fence+0xc8/0xe0 [i915] [ 316.796090] i915_gem_object_wait+0x282/0x3d0 [i915] [ 316.796164] i915_gem_wait_ioctl+0x10f/0x280 [i915] [ 316.796235] ? i915_gem_unset_wedged+0x180/0x180 [i915] [ 316.796254] drm_ioctl_kernel+0x65/0xb0 [ 316.796269] drm_ioctl+0x295/0x340 [ 316.796337] ? i915_gem_unset_wedged+0x180/0x180 [i915] [ 316.796355] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 316.796374] ? lock_acquire+0xaf/0x200 [ 316.796389] ? __fget+0xe4/0x1f0 [ 316.796405] do_vfs_ioctl+0x8f/0x670 [ 316.796420] ? __fget+0x101/0x1f0 [ 316.796435] SyS_ioctl+0x3b/0x70 [ 316.796450] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 316.796465] RIP: 0033:0x7fd4ad5fc587 [ 316.796477] RSP: 002b:00007fd4aa2dbce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 316.796501] RAX: ffffffffffffffda RBX: ffffc90000df7ff0 RCX: 00007fd4ad5fc587 [ 316.796520] RDX: 00007fd4aa2dbd20 RSI: 00000000c010646c RDI: 0000000000000003 [ 316.796540] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000c [ 316.796560] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000046 [ 316.796579] R13: 00007ffff3b389ff R14: 00007fd4aa2dc9c0 R15: 00007fd4aa2dc700 [ 316.796604] Code: de 48 83 ec 28 48 8d 44 24 08 48 8d 5c 24 18 48 89 44 24 08 48 89 44 24 10 48 8d 47 20 48 89 5c 24 18 48 89 5c 24 20 48 89 04 24 <49> 8b 46 20 48 39 04 24 74 6d 49 8b 46 20 48 8b 34 24 48 39 c6 [ 316.796770] RIP: blk_flush_plug_list+0x54/0x270 RSP: ffffc90000df7b38 [ 316.860599] ---[ end trace 8af6cac31fbe4619 ]--- [ 321.627562] i915 0000:00:02.0: Resetting chip after gpu hang This is probably the aftermath of https://bugs.freedesktop.org/show_bug.cgi?id=103514#c15 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@pm_backlight@basic-brightness.html (pm_backlight:4059) igt-kms-CRITICAL: Test assertion failure function do_display_commit, file igt_kms.c:2895: (pm_backlight:4059) igt-kms-CRITICAL: Failed assertion: ret == 0 (pm_backlight:4059) igt-kms-CRITICAL: Last errno: 13, Permission denied (pm_backlight:4059) igt-kms-CRITICAL: error: -13 != 0 Test pm_backlight failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3384/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1790) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1790) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-ro-before-default failed. then the usual skipping and failing. CI_DRM_3388 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html (gem_ctx_switch:1525) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1525) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default-heavy failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_sync@basic-store-each.html (gem_sync:3587) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3587) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-each failed. dmesg: [ 339.327621] [drm:fw_domains_get_with_fallback [i915]] *ERROR* blitter: timed out waiting for forcewake ack request. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_fence_blits@basic.html [ 353.875381] BUG: stack guard page was hit at ffffc90000923fb8 (stack is ffffc90000924000..ffffc90000927fff) then incomplete: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_pread_basic.html run.log: running: igt/gem_tiled_pread_basic [154/289] skip: 14, pass: 137, fail: 1, dmesg-fail: 2 - owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10) https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html typical start: (gem_ctx_switch:1590) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1590) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default-heavy failed. The this looks like igt/piglit error: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_exec_parallel@basic.html Exception <class 'UnicodeDecodeError'>'utf-8' codec can't decode byte 0xac in position 109087: invalid start byte Traceback File "/opt/igt/piglit/framework/test/base.py", line 205, in execute self.run() File "/opt/igt/piglit/framework/test/base.py", line 271, in run self._run_command() File "/opt/igt/piglit/framework/test/base.py", line 338, in _run_command out, err = proc.communicate(timeout=self.timeout) File "/usr/lib/python3.5/subprocess.py", line 801, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.5/subprocess.py", line 1488, in _communicate self.stderr.encoding) File "/usr/lib/python3.5/subprocess.py", line 705, in _translate_newlines data = data.decode(encoding) then incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@kms_addfb_basic@bad-pitch-63.html run.log: running: igt/kms_addfb_basic/bad-pitch-63 [172/289] skip: 69, pass: 101, fail: 2 | Build timed out (after 17 minutes). Marking the build as aborted. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html (gem_exec_flush:1769) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1769) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-rw-before-default failed. Reference on https://patchwork.freedesktop.org/series/34623/ This thing keeps hitting more subtests: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd.html from dmesg: <4>[ 125.592642] general protection fault: 0000 [#1] PREEMPT SMP <0>[ 125.592662] Dumping ftrace buffer: <0>[ 125.592672] --------------------------------- <0>[ 125.592763] CPU:3 [LOST 234 EVENTS] gem_clos-1534 3..s1 68186424us : execlists_submission_tasklet: bcs0 in[0]: ctx=4.1, seqno=11f ... then there is a softdog: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd1.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3429/fi-glk-dsi/igt@kms_chamelium@hdmi-crc-fast.html Something is clearly wrong, glk-dsi is not connected to chamelium. dmesg: <7>[ 342.680647] [drm:gen9_enable_dc5 [i915]] Enabling DC5 <7>[ 342.680721] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <7>[ 342.840269] [IGT] kms_chamelium: executing <7>[ 342.881613] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.030927] [IGT] kms_chamelium: executing <7>[ 343.069355] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.234065] [IGT] kms_chamelium: executing <7>[ 343.274821] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.451939] [IGT] kms_chamelium: executing <7>[ 343.489785] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.647317] [IGT] kms_chamelium: executing <7>[ 343.689507] [IGT] kms_chamelium: exiting, ret=77 <4>[ 343.823626] general protection fault: 0000 [#1] PREEMPT SMP [ 343.823643] Dumping ftrace buffer: [ 343.823651] --------------------------------- [ 343.823727] CPU:3 [LOST 258392 EVENTS] kms_addf-3908 3..s1 338706705us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=25314 ... <0>[ 343.844386] <idle>-0 1..s1 343766801us : execlists_submission_tasklet: vecs0 cs-irq head=5 [5], tail=5 [5] <0>[ 343.844401] --------------------------------- <4>[ 343.844408] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 343.844476] CPU: 3 PID: 3947 Comm: kms_chamelium Tainted: G U 4.15.0-rc1-CI-CI_DRM_3429+ #1 <4>[ 343.844490] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 343.844505] task: ffff8801732c8040 task.stack: ffffc90002654000 <4>[ 343.844518] RIP: 0010:do_dentry_open.isra.1+0xf6/0x300 <4>[ 343.844526] RSP: 0018:ffffc90002657d40 EFLAGS: 00010202 <4>[ 343.844535] RAX: 31ffffff30a25d80 RBX: ffff88017737f140 RCX: 0000000000000001 <4>[ 343.844546] RDX: 00000000b6001000 RSI: 0000000000000001 RDI: ffff88016823ed40 <4>[ 343.844576] RBP: ffff88016823e9a0 R08: ffff8801732c8908 R09: 000000004ebbe168 <4>[ 343.844586] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 343.844597] R13: ffff88017737f150 R14: ffffc90002657e38 R15: 0000000000000000 <4>[ 343.844626] FS: 0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 <4>[ 343.844638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 343.844646] CR2: 00007fcd9d3811f0 CR3: 0000000174984000 CR4: 0000000000340ee0 <4>[ 343.844656] Call Trace: <4>[ 343.844665] path_openat+0x281/0x9d0 <4>[ 343.844674] do_filp_open+0x85/0xf0 <4>[ 343.844685] ? __alloc_fd+0xe9/0x200 <4>[ 343.844696] ? do_sys_open+0x12b/0x1f0 <4>[ 343.844702] do_sys_open+0x12b/0x1f0 <4>[ 343.844712] entry_SYSCALL_64_fastpath+0x1c/0x89 <4>[ 343.844720] RIP: 0033:0x7fcda07bb7c7 <4>[ 343.844726] RSP: 002b:00007fffc0672af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 <4>[ 343.844738] RAX: ffffffffffffffda RBX: 00007fcda097ea38 RCX: 00007fcda07bb7c7 <4>[ 343.844748] RDX: 00007fcda09a29f0 RSI: 0000000000080000 RDI: 00007fcda097ef10 <4>[ 343.844758] RBP: 00007fffc0672b70 R08: 0000000000000000 R09: 00007fffc0672bcf <4>[ 343.844768] R10: 00007fffc0672be0 R11: 0000000000000246 R12: 000000006ffffdff <4>[ 343.844779] R13: 00007fffc0672c58 R14: 000000037ffff1a0 R15: 0000000000000802 <4>[ 343.844793] Code: 00 00 00 01 00 0f b7 45 00 66 25 00 f0 66 2d 00 40 66 a9 00 b0 0f 84 cf 00 00 00 48 8b 85 08 02 00 00 48 85 c0 0f 84 ad 00 00 00 <48> 8b 38 e8 c2 11 f2 ff 84 c0 0f 84 9d 00 00 00 48 8b 85 08 02 <1>[ 343.844877] RIP: do_dentry_open.isra.1+0xf6/0x300 RSP: ffffc90002657d40 <4>[ 343.844939] ---[ end trace a0a331bdf01f2df4 ]--- https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3439/fi-glk-dsi/igt@gem_exec_basic@gtt-default.html [ 115.068452] BUG: unable to handle kernel paging request at 00000000818bc06f [ 115.068476] IP: do_error_trap+0x14/0xa0 [ 115.068491] Oops: 0002 [#1] PREEMPT SMP [ 115.068501] Dumping ftrace buffer: [ 115.068508] --------------------------------- [ 115.068584] CPU:0 [LOST 8035 EVENTS] gem_ctx_-1513 0..s1 68581260us : execlists_submission_tasklet: rcs0 in[0]: ctx=51.1, seqno=490d ... <0>[ 115.088691] <idle>-0 1..s1 115050529us : execlists_submission_tasklet: vecs0 out[0]: ctx=3.1, seqno=823 <0>[ 115.088705] --------------------------------- <4>[ 115.088712] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 115.088778] CPU: 0 PID: 1565 Comm: python3 Tainted: G U 4.15.0-rc1-CI-CI_DRM_3439+ #1 <4>[ 115.088791] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 115.088805] task: ffff8801736e8040 task.stack: ffffc90000398000 <4>[ 115.088816] RIP: 0010:do_error_trap+0x14/0xa0 <4>[ 115.088823] RSP: 0018:ffffc9000039b9c8 EFLAGS: 00010246 <4>[ 115.088832] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: 0000000000000006 <4>[ 115.088842] RDX: ffffffff81c62a96 RSI: 0000000000000000 RDI: ffffc9000039b9f8 <4>[ 115.088852] RBP: ffffffffffffffff R08: 0000000000000004 R09: ffffffffffffffff <4>[ 115.088862] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.088872] R13: ffffffff81c62a96 R14: 0000000000000004 R15: ffff880171f3c008 <4>[ 115.088883] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 <4>[ 115.088894] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 115.088903] CR2: 00000000818bc06f CR3: 0000000172278000 CR4: 0000000000340ef0 <4>[ 115.088913] Call Trace: <4>[ 115.088922] invalid_op+0x18/0x20 <4>[ 115.088930] RIP: 0010:do_general_protection+0x9/0x1d0 <4>[ 115.088937] RSP: 0018:ffffc9000039baa0 EFLAGS: 00010006 <4>[ 115.088946] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: ffffffff818bc077 <4>[ 115.088956] RDX: ff118801784043c0 RSI: 0000000000000000 RDI: ffffc9000039bac9 <4>[ 115.088966] RBP: ffffffffffffffff R08: 0000000000000000 R09: ffffffffffffffff <4>[ 115.088976] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.088986] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c008 <4>[ 115.088999] ? native_iret+0x7/0x7 <4>[ 115.089009] general_protection+0x22/0x30 <4>[ 115.089042] RIP: 0010:unmap_page_range+0x46/0x8e0 <4>[ 115.089050] RSP: 0018:ffffc9000039bb78 EFLAGS: 00010206 <4>[ 115.089059] RAX: 00000000000007f0 RBX: ffff88016fca7af0 RCX: 00007f2fe8d5b000 <4>[ 115.089069] RDX: ff118801784043c0 RSI: ffff88016fca7af0 RDI: ffffc9000039bc90 <4>[ 115.089080] RBP: ffffffffffffffff R08: 0000000000000000 R09: 00007f2fe8d5b000 <4>[ 115.089108] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.089118] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c077 <4>[ 115.089133] ? unmap_page_range+0x724/0x8e0 <4>[ 115.089146] unmap_vmas+0x47/0x90 <4>[ 115.089154] exit_mmap+0xa0/0x170 <4>[ 115.089165] mmput+0x5c/0x120 <4>[ 115.089173] flush_old_exec+0x644/0x850 <4>[ 115.089183] load_elf_binary+0x3b1/0x16b3 <4>[ 115.089193] ? __lock_acquire+0x42c/0x15a0 <4>[ 115.089202] ? search_binary_handler+0x72/0x1e0 <4>[ 115.089212] search_binary_handler+0x7f/0x1e0 <4>[ 115.089221] do_execveat_common.isra.12+0x658/0x950 <4>[ 115.089232] SyS_execve+0x27/0x30 <4>[ 115.089240] do_syscall_64+0x59/0x1a0 <4>[ 115.089247] entry_SYSCALL64_slow_path+0x25/0x25 <4>[ 115.089255] RIP: 0033:0x7f2ff59cd767 <4>[ 115.089261] RSP: 002b:00007f2fea55b518 EFLAGS: 00000206 ORIG_RAX: 000000000000003b <4>[ 115.089273] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007f2ff59cd767 <4>[ 115.089283] RDX: 00007f2fe4003200 RSI: 00007f2fe4007580 RDI: 00007f2fe4007540 <4>[ 115.089293] RBP: 00007f2fe4004250 R08: 0000000000000002 R09: 0000000000000000 <4>[ 115.089303] R10: 0000000000000008 R11: 0000000000000206 R12: 00007f2fea596348 <4>[ 115.089313] R13: 0000000000000000 R14: 00007f2fe4003200 R15: 0000000000000000 <4>[ 115.089327] Code: 00 00 ba 02 00 00 00 eb cd 48 8b 85 80 00 00 00 ba 01 00 00 00 eb bf 41 56 41 55 45 89 c6 41 54 55 49 89 f4 53 49 89 d5 48 fc fb <48> 89 2c e8 b4 91 88 00 85 c0 dd 09 80 3d 1d 6a ee 00 75 74 2a <1>[ 115.089411] RIP: do_error_trap+0x14/0xa0 RSP: ffffc9000039b9c8 <4>[ 115.089420] CR2: 00000000818bc06f <4>[ 115.089427] ---[ end trace 25fb74e124e0a858 ]--- <3>[ 115.229290] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 <3>[ 115.229309] in_atomic(): 0, irqs_disabled(): 1, pid: 1565, name: python3 <4>[ 115.229319] INFO: lockdep is turned off. <4>[ 115.229326] irq event stamp: 1412 <4>[ 115.229339] hardirqs last enabled at (1411): [<ffffffff811793a8>] free_unref_page+0x48/0x60 <4>[ 115.229354] hardirqs last disabled at (1412): [<ffffffff818bc8f6>] error_entry+0x66/0xc0 <4>[ 115.229384] softirqs last enabled at (1336): [<ffffffff818bf29a>] __do_softirq+0x3aa/0x4de <4>[ 115.229399] softirqs last disabled at (1313): [<ffffffff810804ea>] irq_exit+0xaa/0xc0 <4>[ 115.229412] CPU: 0 PID: 1565 Comm: python3 Tainted: G UD 4.15.0-rc1-CI-CI_DRM_3439+ #1 <4>[ 115.229425] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 115.229440] Call Trace: <4>[ 115.229452] dump_stack+0x5f/0x86 <4>[ 115.229461] ___might_sleep+0x1d9/0x240 <4>[ 115.229471] exit_signals+0x1b/0x2a0 <4>[ 115.229480] do_exit+0x93/0xcc0 <4>[ 115.229490] ? SyS_execve+0x27/0x30 <4>[ 115.229498] rewind_stack_do_exit+0x17/0x20 <7>[ 115.791287] [IGT] gem_exec_basic: executing https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3446/fi-glk-dsi/igt@gem_busy@basic-hang-default.html [ 40.373746] Setting dangerous option reset - tainting kernel [ 40.381284] Setting dangerous option reset - tainting kernel [ 48.444040] java: Corrupted page table at address 7fabb819e100 [ 48.444091] Bad pagetable: 000d [#1] PREEMPT SMP [ 48.444108] Dumping ftrace buffer: [ 48.444119] --------------------------------- [ 48.444221] ksoftirq-29 3..s. 38752448us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=52 ... 48.476354] <idle>-0 1..s1 40476071us : execlists_submission_tasklet: rcs0 csb[2d]: status=0x00000001:0x00000000 [ 48.476380] --------------------------------- [ 48.476392] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii snd_pcm mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 48.476498] CPU: 1 PID: 1085 Comm: java Tainted: G U 4.15.0-rc2-CI-CI_DRM_3446+ #1 [ 48.476519] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 48.476545] task: 000000007a9b860d task.stack: 00000000695182e3 [ 48.476561] RIP: 0033:0x7fabb8064bd1 [ 48.476572] RSP: 002b:00007fab91084cd8 EFLAGS: 00010206 [ 48.476587] RAX: 0000000000000000 RBX: 00007fab746c1940 RCX: 0000000000000000 [ 48.476604] RDX: 0000000000000010 RSI: 00007fab91084d3f RDI: 000007fab746c993 [ 48.476621] RBP: 00007fab74000020 R08: 00007fabb819e100 R09: 0000000000000001 [ 48.476638] R10: 0000000000000003 R11: 00007fab91084d30 R12: 0000000000007ff0 [ 48.476655] R13: 00007fab746c9930 R14: 0000000000000000 R15: 0000000000003be0 [ 48.476673] FS: 00007fab91085700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 48.476692] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.476707] CR2: 00007fabb819e100 CR3: 000000016ec92000 CR4: 0000000000340ee0 [ 48.476726] RIP: 0x7fabb8064bd1 RSP: 00007fab91084cd8 [ 48.476741] ---[ end trace 8747c98619fd2a37 ]--- [ 48.585697] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 [ 48.585729] in_atomic(): 0, irqs_disabled(): 1, pid: 1085, name: java [ 48.585749] INFO: lockdep is turned off. [ 48.585763] irq event stamp: 28302 [ 48.585783] hardirqs last enabled at (28301): [<00000000430c0964>] swapgs_restore_regs_and_return_to_usermode+0x0/0x20 [ 48.585816] hardirqs last disabled at (28302): [<00000000d609682f>] error_entry+0x60/0xc0 [ 48.585842] softirqs last enabled at (28300): [<0000000040735435>] __do_softirq+0x3aa/0x4de [ 48.585870] softirqs last disabled at (28277): [<000000007bcb8038>] irq_exit+0xaa/0xc0 [ 48.585897] CPU: 1 PID: 1085 Comm: java Tainted: G UD 4.15.0-rc2-CI-CI_DRM_3446+ #1 [ 48.585923] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 48.585953] Call Trace: [ 48.585973] dump_stack+0x5f/0x86 [ 48.585989] ___might_sleep+0x1d9/0x240 [ 48.586009] exit_signals+0x1b/0x2a0 [ 48.586026] ? oops_end+0x61/0x80 [ 48.586040] do_exit+0x93/0xcc0 [ 48.586054] ? __do_page_fault+0x3e6/0x560 [ 48.586077] rewind_stack_do_exit+0x17/0x20 [ 57.768110] i915 0000:00:02.0: Resetting rcs0 after gpu hang It starts here: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html then the issue on this subtest hasn't been seen before: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@prime_busy@basic-wait-before-default.html [ 385.598295] ------------[ cut here ]------------ [ 385.598302] list_del corruption. next->prev should be 0000000071b3538c, but was 000000000ce0e81a [ 385.598328] WARNING: CPU: 1 PID: 2600 at lib/list_debug.c:56 __list_del_entry_valid+0x8a/0x90 [ 385.598331] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 385.598398] CPU: 1 PID: 2600 Comm: python3 Tainted: G U W 4.15.0-rc2-CI-CI_DRM_3448+ #1 [ 385.598401] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 385.598403] task: 000000009cfb4ec2 task.stack: 000000008c8f3191 [ 385.598407] RIP: 0010:__list_del_entry_valid+0x8a/0x90 [ 385.598409] RSP: 0018:ffffc9000033bad0 EFLAGS: 00010082 [ 385.598414] RAX: 0000000000000054 RBX: ffff880173bf56e8 RCX: 0000000000000002 [ 385.598417] RDX: 0000000080000002 RSI: ffffffff81ca8e7d RDI: 00000000ffffffff [ 385.598419] RBP: ffffffff81f3d818 R08: 0000000000000000 R09: 0000000000000001 [ 385.598421] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea000530ae60 [ 385.598424] R13: 0000000000000001 R14: ffffea00058bdfc0 R15: 0000000000000014 [ 385.598426] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 385.598429] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 385.598431] CR2: 00007fa4d2b809e0 CR3: 000000016e65e000 CR4: 0000000000340ee0 [ 385.598434] Call Trace: [ 385.598439] release_pages+0x11f/0x370 [ 385.598452] tlb_flush_mmu_free+0x2c/0x50 [ 385.598458] unmap_page_range+0x794/0x8e0 [ 385.598473] unmap_vmas+0x47/0x90 [ 385.598480] exit_mmap+0xa0/0x170 [ 385.598493] mmput+0x5c/0x120 [ 385.598498] flush_old_exec+0x644/0x850 [ 385.598506] load_elf_binary+0x3b1/0x16b3 [ 385.598513] ? __lock_acquire+0x42c/0x15a0 [ 385.598521] ? search_binary_handler+0x72/0x1e0 [ 385.598529] search_binary_handler+0x7f/0x1e0 [ 385.598535] do_execveat_common.isra.12+0x658/0x950 [ 385.598544] SyS_execve+0x27/0x30 [ 385.598549] do_syscall_64+0x59/0x1a0 [ 385.598555] entry_SYSCALL64_slow_path+0x25/0x25 [ 385.598559] RIP: 0033:0x7fa4d2b4b767 [ 385.598561] RSP: 002b:00007fa4c76b8138 EFLAGS: 00000246 ORIG_RAX: 000000000000003b [ 385.598566] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007fa4d2b4b767 [ 385.598568] RDX: 00007ffc0ced82f8 RSI: 00007fa4c000aa70 RDI: 00007fa4c0012b40 [ 385.598571] RBP: 00007fa4c0013040 R08: 0000000000000002 R09: 0000000000000005 [ 385.598573] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4c76dcf08 [ 385.598575] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005 [ 385.598587] Code: ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 28 6b cb 81 e8 1a ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 68 6b cb 81 e8 06 ef c1 ff <0f> ff 31 c0 c3 90 41 57 41 56 41 55 41 54 55 53 48 83 ec 20 9c [ 385.598722] ---[ end trace a55595b2ba3d54dd ]--- Starts with: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1756) CRITICAL: Test assertion failure function run, file gem_exec_flush.c:323: (gem_exec_flush:1756) CRITICAL: Failed assertion: map[i] == i (gem_exec_flush:1756) CRITICAL: error: 0x8e7c0142 != 0x142 Subtest basic-wb-ro-before-default failed. then: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-default.html (gem_exec_flush:1761) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_flush:1761) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-ro-default failed. then a lot of unexpected skips followed by: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap@basic-small-bo.html [ 221.187721] BUG: unable to handle kernel paging request at 000000007276fe22 [ 221.187743] IP: __lock_acquire+0xb0/0x15a0 [ 221.187753] Oops: 0002 [#1] PREEMPT SMP [ 221.187761] Dumping ftrace buffer: [ 221.187766] --------------------------------- [ 221.187849] CPU:3 [LOST 86041 EVENTS] gem_exec-1765 3..s1 200397128us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.2, seqno=80ab7 followed by Softdog: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap_gtt@basic.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3498/fi-glk-dsi/igt@gem_exec_store@basic-vebox.html result say: Received signal SIGSEGV. Stack trace: #0 [fatal_sig_handler+0x12f] #1 [killpg+0x40] #2 [_IO_file_fopen+0x42] This is yet another weird thing on this machine. One random memcorruption issue fixed: commit 7d622351c94172a42bfe9b13bdb0fdc2be90ed3b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 13 09:48:02 2017 +0000 drm/i915/fence: Use rcu to defer freeing of irq_work It is illegal to perform an immediate free of the struct irq_work from inside the irq_work callback (as irq_work_run_list modifies work->flags after execution of the work->func()). As we use the irq_work to coordinate the freeing of the callback from two different softirq paths, we need to defer the kfree from inside our irq_work callback, for which we can use kfree_rcu. Fixes: 81c0ed21aa91 ("drm/i915/fence: Avoid del_timer_sync() from inside a timer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213094802.28243-1-chris@chris-wilson.co.uk Hopefully this explains a lot of weirdness. It starts with: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html (gem_exec_flush:1725) igt-aux-CRITICAL: Failed assertion: !"GPU hung" <7>[ 170.959810] [IGT] gem_exec_flush: starting subtest basic-uc-set-default ... <7>[ 177.768550] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915] ... <6>[ 180.857713] [drm] GPU HANG: ecode 9:0:0x8fdafffa, in gem_exec_flush [1725], reason: Hang on rcs0, action: reset <7>[ 180.858578] [drm:i915_reset_device [i915]] resetting chip <5>[ 180.858755] i915 0000:00:02.0: Resetting chip after gpu hang <7>[ 181.361556] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING <3>[ 182.065059] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout then: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html incomplete pstore looks legit, but most of it is already in dmesg. from dmesg: <7>[ 187.664410] [IGT] gem_exec_reloc: exiting, ret=77 <4>[ 187.770081] WARNING: can't dereference iret registers at 00000000b08c140c for ip page_fault+0x7/0x30 <0>[ 187.770083] BUG: stack guard page was hit at 00000000dd82e48c (stack is 00000000ca52e808..0000000085d6198d) <4>[ 187.770178] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP <0>[ 187.770191] Dumping ftrace buffer: <0>[ 187.770199] --------------------------------- <0>[ 187.770281] CPU:3 [LOST 63587 EVENTS] gem_exec-1729 3..s1 173324421us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=5d0bc ... <4>[ 187.801374] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 187.801505] CPU: 2 PID: 1377 Comm: python3 Tainted: G U W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 187.801531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 187.801564] RIP: 0010:page_fault+0x7/0x30 <4>[ 187.801578] RSP: 0018:ffffc90001d83fa8 EFLAGS: 00010083 <4>[ 187.801597] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000000 <4>[ 187.801618] RDX: 0000000080000610 RSI: 0000000000000000 RDI: ffffc90001d840f8 <4>[ 187.801639] RBP: 0000000080000610 R08: 0000000000000001 R09: 0101010101010101 <4>[ 187.801659] R10: ffffc90001d87a90 R11: 0000000000000000 R12: ffffc90001d840f8 <4>[ 187.801680] R13: ffff8801733c51c0 R14: 0000000000000001 R15: ffff8801733c51c0 <4>[ 187.801702] FS: 00007fac1674e700(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 <4>[ 187.801726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 187.801744] CR2: ffffc90001d83f98 CR3: 000000016f353000 CR4: 0000000000340ee0 <4>[ 187.801764] Call Trace: <4>[ 187.801782] ? no_context+0x3dc/0x430 <4>[ 187.801800] ? __do_page_fault+0x196/0x560 ... <1>[ 187.804948] RIP: page_fault+0x7/0x30 RSP: ffffc90001d83fa8 <4>[ 187.804968] ---[ end trace 7832dee94e24beea ]--- <3>[ 188.000284] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 <3>[ 188.000315] in_atomic(): 1, irqs_disabled(): 1, pid: 1377, name: python3 <4>[ 188.000334] INFO: lockdep is turned off. <4>[ 188.000347] irq event stamp: 1180122 <4>[ 188.000367] hardirqs last enabled at (1180121): [<00000000e846d9d1>] get_page_from_freelist+0x24c/0x14c0 <4>[ 188.000395] hardirqs last disabled at (1180122): [<00000000804f94d3>] __slab_alloc.isra.24.constprop.29+0x19/0x70 <4>[ 188.000425] softirqs last enabled at (1179892): [<000000002b075771>] __do_softirq+0x3aa/0x4de <4>[ 188.000451] softirqs last disabled at (1179885): [<00000000a976b967>] irq_exit+0xaa/0xc0 <3>[ 188.000473] Preemption disabled at: <4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0 <4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 188.000560] Call Trace: <4>[ 188.000578] dump_stack+0x5f/0x86 <4>[ 188.000593] ___might_sleep+0x1d9/0x240 then continue in: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log and it is actually a Softdog: <3>[ 188.000473] Preemption disabled at: <4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0 <4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 188.000560] Call Trace: <4>[ 188.000578] dump_stack+0x5f/0x86 <4>[ 188.000593] ___might_sleep+0x1d9/0x240 <4>[ 188.000610] exit_signals+0x1b/0x2a0 <4>[ 188.000624] do_exit+0x93/0xcc0 <4>[ 188.000638] ? trace_hardirqs_off_caller+0x75/0xd0 <4>[ 188.000654] ? do_syscall_64+0x19/0x1a0 <4>[ 188.000671] rewind_stack_do_exit+0x17/0x20 <6>[ 188.000769] note: python3[1377] exited with preempt_count 1 <12>[ 277.879085] owatch: TIMEOUT! <12>[ 277.879252] owatch: timeout for /dev/watchdog0 set to 10 (requested 10) https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1763) igt-aux-CRITICAL: Failed assertion: !"GPU hung" https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_wait@basic-await-all.html dmesg: [ 245.788590] general protection fault: 0000 [#1] PREEMPT SMP [ 245.788604] Dumping ftrace buffer: [ 245.788610] --------------------------------- [ 245.788704] CPU:0 [LOST 73316 EVENTS] gem_exec-1768 0..s1 202889461us : execlists_submission_tasklet: rcs0 in[0]: ctx=4.2, seqno=80f1a ... then a bunch of backtraces that are repeated in the pstore from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_workarounds@basic-read.html incomplete ON CI_DRM_3606: First test with dmesg warn: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3606/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html then a lot of the following following igt@gem_exec_* tests are hit with dmesg-warn. It started here: <4>[ 120.453872] WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:3459 schedule_idle+0x2c/0x30 <4>[ 120.453878] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 120.453969] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G U 4.15.0-rc7-CI-CI_DRM_3606+ #1 <4>[ 120.453973] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 120.453979] RIP: 0010:schedule_idle+0x2c/0x30 <4>[ 120.453983] RSP: 0018:ffffc900000d3ee8 EFLAGS: 00010286 <4>[ 120.453991] RAX: ee000000fe000000 RBX: 0000000000000003 RCX: 0000000000000001 <4>[ 120.453995] RDX: 0000000000000000 RSI: ffffffff820ab24f RDI: ffffffff820b8d9d <4>[ 120.453998] RBP: ffff88017a942740 R08: 0000000000000000 R09: 0000000000000001 <4>[ 120.454002] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8233caf0 <4>[ 120.454006] R13: ffff88017a942740 R14: ffff88017fdab550 R15: ffffffff82293980 <4>[ 120.454010] FS: 0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 <4>[ 120.454014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 120.454018] CR2: 00007f8fab777000 CR3: 000000016f6fc000 CR4: 0000000000340ee0 <4>[ 120.454022] Call Trace: <4>[ 120.454031] do_idle+0x14b/0x1d0 <4>[ 120.454042] cpu_startup_entry+0x14/0x20 <4>[ 120.454049] start_secondary+0x129/0x160 <4>[ 120.454057] secondary_startup_64+0xa5/0xb0 <4>[ 120.454076] Code: 48 8b 04 25 80 4e 01 00 53 48 8b 40 08 48 85 c0 75 19 65 48 8b 1c 25 80 4e 01 00 31 ff e8 cd f0 ff ff 48 8b 03 a8 08 75 f2 5b c3 <0f> ff eb e3 bf 01 00 00 00 e8 e6 e8 7e ff e8 a1 fb ff ff bf 01 Then there is ~170Mb of various WARNs from: kernel/locking/lockdep.c *** Bug 103615 has been marked as a duplicate of this bug. *** Last seen: CI_DRM_3783: 2018-02-16 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.