CI_DRM_3293 fi-glk-dsi igt@gem_exec_nop@basic-parallel failed: (gem_exec_nop:1922) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_nop:1922) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-parallel failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@gem_exec_nop@basic-parallel.html NOTE: after this there are a bunch a spuriously skipped tests and then fail on: (kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406: (kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Failed assertion: !wedged (kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor Subtest hang-read-crc-pipe-B failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html and then there is a dmesg-warn on: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html [ 395.321772] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 395.321921] WARN_ON(reset && reset != -19) [ 395.321959] ------------[ cut here ]------------ [ 395.321995] WARNING: CPU: 2 PID: 2462 at drivers/gpu/drm/i915/i915_gem.c:4725 i915_gem_sanitize+0x52/0x80 [i915] [ 395.321997] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me snd_pcm prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel [ 395.322070] CPU: 2 PID: 2462 Comm: kworker/u8:7 Tainted: G U 4.14.0-rc6-CI-CI_DRM_3293+ #1 [ 395.322073] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 395.322080] Workqueue: events_unbound async_run_entry_fn [ 395.322084] task: ffff88017807cec0 task.stack: ffffc90002194000 [ 395.322117] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915] [ 395.322120] RSP: 0018:ffffc90002197c50 EFLAGS: 00010282 [ 395.322124] RAX: 000000000000001e RBX: ffff880168480000 RCX: 0000000000000006 [ 395.322126] RDX: 0000000000000006 RSI: ffffffff81d0e984 RDI: ffffffff81cc2576 [ 395.322130] RBP: ffffc90002197c60 R08: 0000000000000000 R09: 0000000000000001 [ 395.322132] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880168480070 [ 395.322134] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cea5ef [ 395.322136] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 395.322139] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 395.322141] CR2: 0000560453047068 CR3: 0000000174e85000 CR4: 00000000003406e0 [ 395.322143] Call Trace: [ 395.322178] i915_gem_suspend+0x111/0x170 [i915] [ 395.322208] i915_drm_suspend+0x6d/0x170 [i915] [ 395.322238] i915_pm_suspend+0x28/0x40 [i915] [ 395.322246] pci_pm_suspend+0x78/0x140 [ 395.322251] dpm_run_callback+0x6f/0x310 [ 395.322255] ? pci_pm_freeze+0xf0/0xf0 [ 395.322260] __device_suspend+0x102/0x380 [ 395.322264] ? dpm_watchdog_set+0x70/0x70 [ 395.322270] async_suspend+0x1f/0xa0 [ 395.322274] async_run_entry_fn+0x38/0x160 [ 395.322279] process_one_work+0x221/0x650 [ 395.322286] worker_thread+0x4e/0x3b0 [ 395.322292] kthread+0x114/0x150 [ 395.322294] ? process_one_work+0x650/0x650 [ 395.322297] ? kthread_create_on_node+0x40/0x40 [ 395.322303] ret_from_fork+0x27/0x40 [ 395.322311] Code: 5d c3 be ff ff ff ff 48 89 df e8 da dc 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 c8 3a 23 a0 48 c7 c7 dc 0a 22 a0 e8 0f 51 fb e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 8c 00 7d e1 48 89 df [ 395.322431] ---[ end trace 82f68684f9edfc24 ]--- Why are we having this inconsistent behavior when the GPU is wedged. Also, see bug 102848. On the shards it would be very time consuming to find these patterns.
Here is a patchwork example, although the patch was about enabling runtime_pm, this pattern is repeating itself. starting at: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@gem_exec_nop@basic-series.html fail: (gem_exec_nop:1852) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_nop:1852) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-series failed. dmesg: [ 226.781384] Setting dangerous option reset - tainting kernel [ 243.856241] i915 0000:00:02.0: Resetting chip after gpu hang [ 245.063521] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 245.063716] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 then: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html fail: (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406: (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Failed assertion: !wedged (kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor Subtest hang-read-crc-pipe-A failed. and finally: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html dmesg-warn: [ 375.382594] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 375.382740] WARN_ON(reset && reset != -19) [ 375.382776] ------------[ cut here ]------------ [ 375.382812] WARNING: CPU: 2 PID: 2383 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915] [ 375.382814] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 375.382892] CPU: 2 PID: 2383 Comm: kworker/u8:7 Tainted: G U 4.14.0-rc8-CI-Patchwork_6996+ #1 [ 375.382894] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 375.382899] Workqueue: events_unbound async_run_entry_fn [ 375.382903] task: ffff88016e980040 task.stack: ffffc90002164000 [ 375.382937] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915] [ 375.382939] RSP: 0018:ffffc90002167c58 EFLAGS: 00010292 [ 375.382943] RAX: 000000000000001e RBX: ffff880166e00000 RCX: 0000000000000006 [ 375.382945] RDX: 0000000000000006 RSI: ffffffff81d0ed64 RDI: ffffffff81cc294e [ 375.382947] RBP: ffffc90002167c68 R08: 0000000000000000 R09: 0000000000000001 [ 375.382949] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880166e00070 [ 375.382951] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cea9cf [ 375.382954] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 375.382957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.382959] CR2: 000055b4ccc93068 CR3: 0000000003e0f000 CR4: 00000000003406e0 [ 375.382961] Call Trace: [ 375.382997] i915_gem_suspend+0x111/0x170 [i915] [ 375.383027] i915_drm_suspend+0x6d/0x170 [i915] [ 375.383057] i915_pm_suspend+0x28/0x40 [i915] [ 375.383063] pci_pm_suspend+0x78/0x140 [ 375.383068] dpm_run_callback+0x6f/0x310 [ 375.383072] ? pci_pm_freeze+0xf0/0xf0 [ 375.383077] __device_suspend+0x102/0x380 [ 375.383081] ? dpm_watchdog_set+0x70/0x70 [ 375.383087] async_suspend+0x1f/0xa0 [ 375.383091] async_run_entry_fn+0x38/0x160 [ 375.383096] process_one_work+0x221/0x650 [ 375.383103] worker_thread+0x4e/0x3c0 [ 375.383108] kthread+0x114/0x150 [ 375.383111] ? process_one_work+0x650/0x650 [ 375.383114] ? kthread_create_on_node+0x40/0x40 [ 375.383119] ret_from_fork+0x27/0x40 [ 375.383127] Code: 5d c3 be ff ff ff ff 48 89 df e8 2a e3 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 58 aa 26 a0 48 c7 c7 a4 7a 25 a0 e8 4f df f7 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 fc a0 79 e1 48 89 df [ 375.383247] ---[ end trace 7978d132da79d715 ]---
On CI_DRM_3321 the issue started here: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html (gem_exec_flush:1727) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1727) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-batch-kernel-default-uc failed. Then dmesg is filled with: [ 144.151041] [drm:gen8_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 this cause a lot of following tests to either be skipped or dmesg-warn. but then from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html all the following tests pass. However, the machine did not boot up for CI_DRM_3322 and CI_DRM_3323. When rebooting manually in the lab the display was full of garbage.
Here is another example: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@gem_exec_flush@basic-wb-prw-default.html (gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-prw-default failed. all tests are skipped until: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406: (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Failed assertion: !wedged (kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor Subtest hang-read-crc-pipe-A failed. then fail the following tests and incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@pm_rpm@basic-rte.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_491/fi-glk-dsi/igt@gem_ctx_switch@basic-default.html (gem_ctx_switch:1560) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1560) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default failed. skip a lot of tests, fail some, skip some and this time actually no incomplete.
Here is another: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3976/fi-glk-dsi/igt@gem_exec_flush@basic-wb-pro-default.html
Rising priority since it is BAT.
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html fail: (gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-uc-set-default failed. the a bunch of skips then fail on: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html due to: (kms_pipe_crc_basic:2287) igt-gt-CRITICAL: Failed assertion: !wedged then https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html [ 343.853966] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 343.854132] WARN_ON(reset && reset != -19) [ 343.854169] ------------[ cut here ]------------ [ 343.854205] WARNING: CPU: 2 PID: 2335 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915] [ 343.854207] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 343.854263] CPU: 2 PID: 2335 Comm: kworker/u8:4 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3331+ #1 [ 343.854265] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 343.854273] Workqueue: events_unbound async_run_entry_fn [ 343.854277] task: ffff88017a41cec0 task.stack: ffffc90002218000 [ 343.854309] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915] [ 343.854312] RSP: 0018:ffffc9000221bc58 EFLAGS: 00010292 [ 343.854316] RAX: 000000000000001e RBX: ffff880167100000 RCX: 0000000000000006 [ 343.854318] RDX: 0000000000000006 RSI: ffffffff81d11314 RDI: ffffffff81cc3dee [ 343.854320] RBP: ffffc9000221bc68 R08: 0000000000000000 R09: 0000000000000001 [ 343.854322] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880167100070 [ 343.854324] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cecf7f [ 343.854327] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 343.854329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 343.854331] CR2: 000055b19f0a4068 CR3: 0000000003e0f000 CR4: 00000000003406e0 [ 343.854334] Call Trace: [ 343.854368] i915_gem_suspend+0x111/0x170 [i915] [ 343.854398] i915_drm_suspend+0x6d/0x170 [i915] [ 343.854429] i915_pm_suspend+0x28/0x40 [i915] [ 343.854434] pci_pm_suspend+0x78/0x140 [ 343.854440] dpm_run_callback+0x6f/0x310 [ 343.854444] ? pci_pm_freeze+0xf0/0xf0 [ 343.854449] __device_suspend+0x102/0x380 [ 343.854453] ? dpm_watchdog_set+0x70/0x70 [ 343.854460] async_suspend+0x1f/0xa0 [ 343.854463] async_run_entry_fn+0x38/0x160 [ 343.854469] process_one_work+0x221/0x650 [ 343.854475] worker_thread+0x4e/0x3c0 [ 343.854481] kthread+0x114/0x150 [ 343.854484] ? process_one_work+0x650/0x650 [ 343.854487] ? kthread_create_on_node+0x40/0x40 [ 343.854493] ret_from_fork+0x27/0x40 [ 343.854501] Code: 5d c3 be ff ff ff ff 48 89 df e8 ca e5 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 a8 fa 24 a0 48 c7 c7 a4 ca 23 a0 e8 6f 90 f9 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 9c a1 7b e1 48 89 df [ 343.854622] ---[ end trace 92cf58358a865d76 ]---
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7201/fi-glk-dsi/igt@gem_exec_flush@basic-uc-pro-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3366/fi-glk-dsi/igt@gem_sync@basic-store-each.html (gem_sync:3613) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3613) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-each failed. Above doesn't cause any after effect. But that is maybe due to test coming after not being sensitive to wedged GPU.
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7208/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-wb.html (gem_exec_flush:1784) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1784) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-batch-kernel-default-wb failed. [ 152.381045] Setting dangerous option reset - tainting kernel [ 164.786879] i915 0000:00:02.0: Resetting chip after gpu hang [ 165.992480] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 165.992686] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5 then a lot of skips and incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_sync@basic-store-each.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3371/fi-glk-dsi/igt@gem_sync@basic-store-all.html (gem_sync:3559) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3559) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-all failed. no other tests affected.
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4005/fi-glk-dsi/igt@gem_sync@basic-all.html (gem_sync:3498) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3498) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-all failed.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3375/fi-glk-dsi/igt@gem_cs_tlb@basic-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@gem_sync@basic-many-each.html (gem_sync:3475) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3475) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-many-each failed. Note this dmesg has more information: [ 314.719246] Setting dangerous option reset - tainting kernel [ 316.762341] general protection fault: 0000 [#1] PREEMPT SMP [ 316.762366] Dumping ftrace buffer: [ 316.762379] --------------------------------- [ 316.762487] CPU:3 [LOST 255731 EVENTS] gem_sync-3495 3..s1 316076943us : execlists_submission_tasklet: bcs0 in[0]: ctx=2.2, seqno=238ed ... [ 316.795368] gem_sync-3526 1..s. 316804524us : execlists_submission_tasklet: bcs0 out[0]: ctx=2.2, seqno=239ff [ 316.795396] --------------------------------- [ 316.795411] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mei mii prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 316.795535] CPU: 3 PID: 3483 Comm: gem_sync Tainted: G U 4.14.0-CI-CI_DRM_3377+ #1 [ 316.795559] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 316.795588] task: ffff8801719a2780 task.stack: ffffc90000df4000 [ 316.795612] RIP: 0010:blk_flush_plug_list+0x54/0x270 [ 316.795628] RSP: 0018:ffffc90000df7b38 EFLAGS: 00010292 [ 316.795645] RAX: 0010000000000020 RBX: ffffc90000df7b50 RCX: 0000000000000000 [ 316.795665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0010000000000000 [ 316.795685] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001 [ 316.795704] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200 [ 316.795724] R13: dead000000000100 R14: 0010000000000000 R15: ffff8801681eec40 [ 316.795745] FS: 00007fd4aa2dc700(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 [ 316.795768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 316.795785] CR2: 00007fd4af1b6010 CR3: 0000000171ee0000 CR4: 00000000003406e0 [ 316.795804] Call Trace: [ 316.795822] io_schedule_prepare+0x3c/0x40 [ 316.795839] io_schedule_timeout+0xf/0x40 [ 316.795911] i915_wait_request+0x33c/0x830 [i915] [ 316.795932] ? wake_up_q+0x70/0x70 [ 316.795945] ? wake_up_q+0x70/0x70 [ 316.796017] i915_gem_object_wait_fence+0xc8/0xe0 [i915] [ 316.796090] i915_gem_object_wait+0x282/0x3d0 [i915] [ 316.796164] i915_gem_wait_ioctl+0x10f/0x280 [i915] [ 316.796235] ? i915_gem_unset_wedged+0x180/0x180 [i915] [ 316.796254] drm_ioctl_kernel+0x65/0xb0 [ 316.796269] drm_ioctl+0x295/0x340 [ 316.796337] ? i915_gem_unset_wedged+0x180/0x180 [i915] [ 316.796355] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 316.796374] ? lock_acquire+0xaf/0x200 [ 316.796389] ? __fget+0xe4/0x1f0 [ 316.796405] do_vfs_ioctl+0x8f/0x670 [ 316.796420] ? __fget+0x101/0x1f0 [ 316.796435] SyS_ioctl+0x3b/0x70 [ 316.796450] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 316.796465] RIP: 0033:0x7fd4ad5fc587 [ 316.796477] RSP: 002b:00007fd4aa2dbce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 316.796501] RAX: ffffffffffffffda RBX: ffffc90000df7ff0 RCX: 00007fd4ad5fc587 [ 316.796520] RDX: 00007fd4aa2dbd20 RSI: 00000000c010646c RDI: 0000000000000003 [ 316.796540] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000c [ 316.796560] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000046 [ 316.796579] R13: 00007ffff3b389ff R14: 00007fd4aa2dc9c0 R15: 00007fd4aa2dc700 [ 316.796604] Code: de 48 83 ec 28 48 8d 44 24 08 48 8d 5c 24 18 48 89 44 24 08 48 89 44 24 10 48 8d 47 20 48 89 5c 24 18 48 89 5c 24 20 48 89 04 24 <49> 8b 46 20 48 39 04 24 74 6d 49 8b 46 20 48 8b 34 24 48 39 c6 [ 316.796770] RIP: blk_flush_plug_list+0x54/0x270 RSP: ffffc90000df7b38 [ 316.860599] ---[ end trace 8af6cac31fbe4619 ]--- [ 321.627562] i915 0000:00:02.0: Resetting chip after gpu hang
This is probably the aftermath of https://bugs.freedesktop.org/show_bug.cgi?id=103514#c15 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@pm_backlight@basic-brightness.html (pm_backlight:4059) igt-kms-CRITICAL: Test assertion failure function do_display_commit, file igt_kms.c:2895: (pm_backlight:4059) igt-kms-CRITICAL: Failed assertion: ret == 0 (pm_backlight:4059) igt-kms-CRITICAL: Last errno: 13, Permission denied (pm_backlight:4059) igt-kms-CRITICAL: error: -13 != 0 Test pm_backlight failed.
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_531/fi-glk-dsi/igt@gem_exec_fence@await-hang-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3381/fi-glk-dsi/igt@gem_exec_flush@basic-uc-ro-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3384/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1790) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1790) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-ro-before-default failed. then the usual skipping and failing.
CI_DRM_3388 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html (gem_ctx_switch:1525) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1525) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default-heavy failed. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_sync@basic-store-each.html (gem_sync:3587) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_sync:3587) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-store-each failed. dmesg: [ 339.327621] [drm:fw_domains_get_with_fallback [i915]] *ERROR* blitter: timed out waiting for forcewake ack request. https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_fence_blits@basic.html [ 353.875381] BUG: stack guard page was hit at ffffc90000923fb8 (stack is ffffc90000924000..ffffc90000927fff) then incomplete: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_pread_basic.html run.log: running: igt/gem_tiled_pread_basic [154/289] skip: 14, pass: 137, fail: 1, dmesg-fail: 2 - owatch: TIMEOUT! owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html typical start: (gem_ctx_switch:1590) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_ctx_switch:1590) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-default-heavy failed. The this looks like igt/piglit error: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_exec_parallel@basic.html Exception <class 'UnicodeDecodeError'>'utf-8' codec can't decode byte 0xac in position 109087: invalid start byte Traceback File "/opt/igt/piglit/framework/test/base.py", line 205, in execute self.run() File "/opt/igt/piglit/framework/test/base.py", line 271, in run self._run_command() File "/opt/igt/piglit/framework/test/base.py", line 338, in _run_command out, err = proc.communicate(timeout=self.timeout) File "/usr/lib/python3.5/subprocess.py", line 801, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib/python3.5/subprocess.py", line 1488, in _communicate self.stderr.encoding) File "/usr/lib/python3.5/subprocess.py", line 705, in _translate_newlines data = data.decode(encoding) then incomplete on: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@kms_addfb_basic@bad-pitch-63.html run.log: running: igt/kms_addfb_basic/bad-pitch-63 [172/289] skip: 69, pass: 101, fail: 2 | Build timed out (after 17 minutes). Marking the build as aborted.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html (gem_exec_flush:1769) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484: (gem_exec_flush:1769) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-rw-before-default failed.
Reference on https://patchwork.freedesktop.org/series/34623/
This thing keeps hitting more subtests: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd.html from dmesg: <4>[ 125.592642] general protection fault: 0000 [#1] PREEMPT SMP <0>[ 125.592662] Dumping ftrace buffer: <0>[ 125.592672] --------------------------------- <0>[ 125.592763] CPU:3 [LOST 234 EVENTS] gem_clos-1534 3..s1 68186424us : execlists_submission_tasklet: bcs0 in[0]: ctx=4.1, seqno=11f ... then there is a softdog: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd1.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3429/fi-glk-dsi/igt@kms_chamelium@hdmi-crc-fast.html Something is clearly wrong, glk-dsi is not connected to chamelium. dmesg: <7>[ 342.680647] [drm:gen9_enable_dc5 [i915]] Enabling DC5 <7>[ 342.680721] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01 <7>[ 342.840269] [IGT] kms_chamelium: executing <7>[ 342.881613] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.030927] [IGT] kms_chamelium: executing <7>[ 343.069355] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.234065] [IGT] kms_chamelium: executing <7>[ 343.274821] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.451939] [IGT] kms_chamelium: executing <7>[ 343.489785] [IGT] kms_chamelium: exiting, ret=77 <7>[ 343.647317] [IGT] kms_chamelium: executing <7>[ 343.689507] [IGT] kms_chamelium: exiting, ret=77 <4>[ 343.823626] general protection fault: 0000 [#1] PREEMPT SMP [ 343.823643] Dumping ftrace buffer: [ 343.823651] --------------------------------- [ 343.823727] CPU:3 [LOST 258392 EVENTS] kms_addf-3908 3..s1 338706705us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=25314 ... <0>[ 343.844386] <idle>-0 1..s1 343766801us : execlists_submission_tasklet: vecs0 cs-irq head=5 [5], tail=5 [5] <0>[ 343.844401] --------------------------------- <4>[ 343.844408] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 343.844476] CPU: 3 PID: 3947 Comm: kms_chamelium Tainted: G U 4.15.0-rc1-CI-CI_DRM_3429+ #1 <4>[ 343.844490] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 343.844505] task: ffff8801732c8040 task.stack: ffffc90002654000 <4>[ 343.844518] RIP: 0010:do_dentry_open.isra.1+0xf6/0x300 <4>[ 343.844526] RSP: 0018:ffffc90002657d40 EFLAGS: 00010202 <4>[ 343.844535] RAX: 31ffffff30a25d80 RBX: ffff88017737f140 RCX: 0000000000000001 <4>[ 343.844546] RDX: 00000000b6001000 RSI: 0000000000000001 RDI: ffff88016823ed40 <4>[ 343.844576] RBP: ffff88016823e9a0 R08: ffff8801732c8908 R09: 000000004ebbe168 <4>[ 343.844586] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 343.844597] R13: ffff88017737f150 R14: ffffc90002657e38 R15: 0000000000000000 <4>[ 343.844626] FS: 0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 <4>[ 343.844638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 343.844646] CR2: 00007fcd9d3811f0 CR3: 0000000174984000 CR4: 0000000000340ee0 <4>[ 343.844656] Call Trace: <4>[ 343.844665] path_openat+0x281/0x9d0 <4>[ 343.844674] do_filp_open+0x85/0xf0 <4>[ 343.844685] ? __alloc_fd+0xe9/0x200 <4>[ 343.844696] ? do_sys_open+0x12b/0x1f0 <4>[ 343.844702] do_sys_open+0x12b/0x1f0 <4>[ 343.844712] entry_SYSCALL_64_fastpath+0x1c/0x89 <4>[ 343.844720] RIP: 0033:0x7fcda07bb7c7 <4>[ 343.844726] RSP: 002b:00007fffc0672af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 <4>[ 343.844738] RAX: ffffffffffffffda RBX: 00007fcda097ea38 RCX: 00007fcda07bb7c7 <4>[ 343.844748] RDX: 00007fcda09a29f0 RSI: 0000000000080000 RDI: 00007fcda097ef10 <4>[ 343.844758] RBP: 00007fffc0672b70 R08: 0000000000000000 R09: 00007fffc0672bcf <4>[ 343.844768] R10: 00007fffc0672be0 R11: 0000000000000246 R12: 000000006ffffdff <4>[ 343.844779] R13: 00007fffc0672c58 R14: 000000037ffff1a0 R15: 0000000000000802 <4>[ 343.844793] Code: 00 00 00 01 00 0f b7 45 00 66 25 00 f0 66 2d 00 40 66 a9 00 b0 0f 84 cf 00 00 00 48 8b 85 08 02 00 00 48 85 c0 0f 84 ad 00 00 00 <48> 8b 38 e8 c2 11 f2 ff 84 c0 0f 84 9d 00 00 00 48 8b 85 08 02 <1>[ 343.844877] RIP: do_dentry_open.isra.1+0xf6/0x300 RSP: ffffc90002657d40 <4>[ 343.844939] ---[ end trace a0a331bdf01f2df4 ]---
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3439/fi-glk-dsi/igt@gem_exec_basic@gtt-default.html [ 115.068452] BUG: unable to handle kernel paging request at 00000000818bc06f [ 115.068476] IP: do_error_trap+0x14/0xa0 [ 115.068491] Oops: 0002 [#1] PREEMPT SMP [ 115.068501] Dumping ftrace buffer: [ 115.068508] --------------------------------- [ 115.068584] CPU:0 [LOST 8035 EVENTS] gem_ctx_-1513 0..s1 68581260us : execlists_submission_tasklet: rcs0 in[0]: ctx=51.1, seqno=490d ... <0>[ 115.088691] <idle>-0 1..s1 115050529us : execlists_submission_tasklet: vecs0 out[0]: ctx=3.1, seqno=823 <0>[ 115.088705] --------------------------------- <4>[ 115.088712] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 115.088778] CPU: 0 PID: 1565 Comm: python3 Tainted: G U 4.15.0-rc1-CI-CI_DRM_3439+ #1 <4>[ 115.088791] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 115.088805] task: ffff8801736e8040 task.stack: ffffc90000398000 <4>[ 115.088816] RIP: 0010:do_error_trap+0x14/0xa0 <4>[ 115.088823] RSP: 0018:ffffc9000039b9c8 EFLAGS: 00010246 <4>[ 115.088832] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: 0000000000000006 <4>[ 115.088842] RDX: ffffffff81c62a96 RSI: 0000000000000000 RDI: ffffc9000039b9f8 <4>[ 115.088852] RBP: ffffffffffffffff R08: 0000000000000004 R09: ffffffffffffffff <4>[ 115.088862] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.088872] R13: ffffffff81c62a96 R14: 0000000000000004 R15: ffff880171f3c008 <4>[ 115.088883] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 <4>[ 115.088894] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 115.088903] CR2: 00000000818bc06f CR3: 0000000172278000 CR4: 0000000000340ef0 <4>[ 115.088913] Call Trace: <4>[ 115.088922] invalid_op+0x18/0x20 <4>[ 115.088930] RIP: 0010:do_general_protection+0x9/0x1d0 <4>[ 115.088937] RSP: 0018:ffffc9000039baa0 EFLAGS: 00010006 <4>[ 115.088946] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: ffffffff818bc077 <4>[ 115.088956] RDX: ff118801784043c0 RSI: 0000000000000000 RDI: ffffc9000039bac9 <4>[ 115.088966] RBP: ffffffffffffffff R08: 0000000000000000 R09: ffffffffffffffff <4>[ 115.088976] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.088986] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c008 <4>[ 115.088999] ? native_iret+0x7/0x7 <4>[ 115.089009] general_protection+0x22/0x30 <4>[ 115.089042] RIP: 0010:unmap_page_range+0x46/0x8e0 <4>[ 115.089050] RSP: 0018:ffffc9000039bb78 EFLAGS: 00010206 <4>[ 115.089059] RAX: 00000000000007f0 RBX: ffff88016fca7af0 RCX: 00007f2fe8d5b000 <4>[ 115.089069] RDX: ff118801784043c0 RSI: ffff88016fca7af0 RDI: ffffc9000039bc90 <4>[ 115.089080] RBP: ffffffffffffffff R08: 0000000000000000 R09: 00007f2fe8d5b000 <4>[ 115.089108] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 <4>[ 115.089118] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c077 <4>[ 115.089133] ? unmap_page_range+0x724/0x8e0 <4>[ 115.089146] unmap_vmas+0x47/0x90 <4>[ 115.089154] exit_mmap+0xa0/0x170 <4>[ 115.089165] mmput+0x5c/0x120 <4>[ 115.089173] flush_old_exec+0x644/0x850 <4>[ 115.089183] load_elf_binary+0x3b1/0x16b3 <4>[ 115.089193] ? __lock_acquire+0x42c/0x15a0 <4>[ 115.089202] ? search_binary_handler+0x72/0x1e0 <4>[ 115.089212] search_binary_handler+0x7f/0x1e0 <4>[ 115.089221] do_execveat_common.isra.12+0x658/0x950 <4>[ 115.089232] SyS_execve+0x27/0x30 <4>[ 115.089240] do_syscall_64+0x59/0x1a0 <4>[ 115.089247] entry_SYSCALL64_slow_path+0x25/0x25 <4>[ 115.089255] RIP: 0033:0x7f2ff59cd767 <4>[ 115.089261] RSP: 002b:00007f2fea55b518 EFLAGS: 00000206 ORIG_RAX: 000000000000003b <4>[ 115.089273] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007f2ff59cd767 <4>[ 115.089283] RDX: 00007f2fe4003200 RSI: 00007f2fe4007580 RDI: 00007f2fe4007540 <4>[ 115.089293] RBP: 00007f2fe4004250 R08: 0000000000000002 R09: 0000000000000000 <4>[ 115.089303] R10: 0000000000000008 R11: 0000000000000206 R12: 00007f2fea596348 <4>[ 115.089313] R13: 0000000000000000 R14: 00007f2fe4003200 R15: 0000000000000000 <4>[ 115.089327] Code: 00 00 ba 02 00 00 00 eb cd 48 8b 85 80 00 00 00 ba 01 00 00 00 eb bf 41 56 41 55 45 89 c6 41 54 55 49 89 f4 53 49 89 d5 48 fc fb <48> 89 2c e8 b4 91 88 00 85 c0 dd 09 80 3d 1d 6a ee 00 75 74 2a <1>[ 115.089411] RIP: do_error_trap+0x14/0xa0 RSP: ffffc9000039b9c8 <4>[ 115.089420] CR2: 00000000818bc06f <4>[ 115.089427] ---[ end trace 25fb74e124e0a858 ]--- <3>[ 115.229290] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 <3>[ 115.229309] in_atomic(): 0, irqs_disabled(): 1, pid: 1565, name: python3 <4>[ 115.229319] INFO: lockdep is turned off. <4>[ 115.229326] irq event stamp: 1412 <4>[ 115.229339] hardirqs last enabled at (1411): [<ffffffff811793a8>] free_unref_page+0x48/0x60 <4>[ 115.229354] hardirqs last disabled at (1412): [<ffffffff818bc8f6>] error_entry+0x66/0xc0 <4>[ 115.229384] softirqs last enabled at (1336): [<ffffffff818bf29a>] __do_softirq+0x3aa/0x4de <4>[ 115.229399] softirqs last disabled at (1313): [<ffffffff810804ea>] irq_exit+0xaa/0xc0 <4>[ 115.229412] CPU: 0 PID: 1565 Comm: python3 Tainted: G UD 4.15.0-rc1-CI-CI_DRM_3439+ #1 <4>[ 115.229425] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 115.229440] Call Trace: <4>[ 115.229452] dump_stack+0x5f/0x86 <4>[ 115.229461] ___might_sleep+0x1d9/0x240 <4>[ 115.229471] exit_signals+0x1b/0x2a0 <4>[ 115.229480] do_exit+0x93/0xcc0 <4>[ 115.229490] ? SyS_execve+0x27/0x30 <4>[ 115.229498] rewind_stack_do_exit+0x17/0x20 <7>[ 115.791287] [IGT] gem_exec_basic: executing
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3446/fi-glk-dsi/igt@gem_busy@basic-hang-default.html [ 40.373746] Setting dangerous option reset - tainting kernel [ 40.381284] Setting dangerous option reset - tainting kernel [ 48.444040] java: Corrupted page table at address 7fabb819e100 [ 48.444091] Bad pagetable: 000d [#1] PREEMPT SMP [ 48.444108] Dumping ftrace buffer: [ 48.444119] --------------------------------- [ 48.444221] ksoftirq-29 3..s. 38752448us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=52 ... 48.476354] <idle>-0 1..s1 40476071us : execlists_submission_tasklet: rcs0 csb[2d]: status=0x00000001:0x00000000 [ 48.476380] --------------------------------- [ 48.476392] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii snd_pcm mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 48.476498] CPU: 1 PID: 1085 Comm: java Tainted: G U 4.15.0-rc2-CI-CI_DRM_3446+ #1 [ 48.476519] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 48.476545] task: 000000007a9b860d task.stack: 00000000695182e3 [ 48.476561] RIP: 0033:0x7fabb8064bd1 [ 48.476572] RSP: 002b:00007fab91084cd8 EFLAGS: 00010206 [ 48.476587] RAX: 0000000000000000 RBX: 00007fab746c1940 RCX: 0000000000000000 [ 48.476604] RDX: 0000000000000010 RSI: 00007fab91084d3f RDI: 000007fab746c993 [ 48.476621] RBP: 00007fab74000020 R08: 00007fabb819e100 R09: 0000000000000001 [ 48.476638] R10: 0000000000000003 R11: 00007fab91084d30 R12: 0000000000007ff0 [ 48.476655] R13: 00007fab746c9930 R14: 0000000000000000 R15: 0000000000003be0 [ 48.476673] FS: 00007fab91085700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 48.476692] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.476707] CR2: 00007fabb819e100 CR3: 000000016ec92000 CR4: 0000000000340ee0 [ 48.476726] RIP: 0x7fabb8064bd1 RSP: 00007fab91084cd8 [ 48.476741] ---[ end trace 8747c98619fd2a37 ]--- [ 48.585697] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 [ 48.585729] in_atomic(): 0, irqs_disabled(): 1, pid: 1085, name: java [ 48.585749] INFO: lockdep is turned off. [ 48.585763] irq event stamp: 28302 [ 48.585783] hardirqs last enabled at (28301): [<00000000430c0964>] swapgs_restore_regs_and_return_to_usermode+0x0/0x20 [ 48.585816] hardirqs last disabled at (28302): [<00000000d609682f>] error_entry+0x60/0xc0 [ 48.585842] softirqs last enabled at (28300): [<0000000040735435>] __do_softirq+0x3aa/0x4de [ 48.585870] softirqs last disabled at (28277): [<000000007bcb8038>] irq_exit+0xaa/0xc0 [ 48.585897] CPU: 1 PID: 1085 Comm: java Tainted: G UD 4.15.0-rc2-CI-CI_DRM_3446+ #1 [ 48.585923] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 48.585953] Call Trace: [ 48.585973] dump_stack+0x5f/0x86 [ 48.585989] ___might_sleep+0x1d9/0x240 [ 48.586009] exit_signals+0x1b/0x2a0 [ 48.586026] ? oops_end+0x61/0x80 [ 48.586040] do_exit+0x93/0xcc0 [ 48.586054] ? __do_page_fault+0x3e6/0x560 [ 48.586077] rewind_stack_do_exit+0x17/0x20 [ 57.768110] i915 0000:00:02.0: Resetting rcs0 after gpu hang
It starts here: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html then the issue on this subtest hasn't been seen before: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@prime_busy@basic-wait-before-default.html [ 385.598295] ------------[ cut here ]------------ [ 385.598302] list_del corruption. next->prev should be 0000000071b3538c, but was 000000000ce0e81a [ 385.598328] WARNING: CPU: 1 PID: 2600 at lib/list_debug.c:56 __list_del_entry_valid+0x8a/0x90 [ 385.598331] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 385.598398] CPU: 1 PID: 2600 Comm: python3 Tainted: G U W 4.15.0-rc2-CI-CI_DRM_3448+ #1 [ 385.598401] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 [ 385.598403] task: 000000009cfb4ec2 task.stack: 000000008c8f3191 [ 385.598407] RIP: 0010:__list_del_entry_valid+0x8a/0x90 [ 385.598409] RSP: 0018:ffffc9000033bad0 EFLAGS: 00010082 [ 385.598414] RAX: 0000000000000054 RBX: ffff880173bf56e8 RCX: 0000000000000002 [ 385.598417] RDX: 0000000080000002 RSI: ffffffff81ca8e7d RDI: 00000000ffffffff [ 385.598419] RBP: ffffffff81f3d818 R08: 0000000000000000 R09: 0000000000000001 [ 385.598421] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea000530ae60 [ 385.598424] R13: 0000000000000001 R14: ffffea00058bdfc0 R15: 0000000000000014 [ 385.598426] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 385.598429] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 385.598431] CR2: 00007fa4d2b809e0 CR3: 000000016e65e000 CR4: 0000000000340ee0 [ 385.598434] Call Trace: [ 385.598439] release_pages+0x11f/0x370 [ 385.598452] tlb_flush_mmu_free+0x2c/0x50 [ 385.598458] unmap_page_range+0x794/0x8e0 [ 385.598473] unmap_vmas+0x47/0x90 [ 385.598480] exit_mmap+0xa0/0x170 [ 385.598493] mmput+0x5c/0x120 [ 385.598498] flush_old_exec+0x644/0x850 [ 385.598506] load_elf_binary+0x3b1/0x16b3 [ 385.598513] ? __lock_acquire+0x42c/0x15a0 [ 385.598521] ? search_binary_handler+0x72/0x1e0 [ 385.598529] search_binary_handler+0x7f/0x1e0 [ 385.598535] do_execveat_common.isra.12+0x658/0x950 [ 385.598544] SyS_execve+0x27/0x30 [ 385.598549] do_syscall_64+0x59/0x1a0 [ 385.598555] entry_SYSCALL64_slow_path+0x25/0x25 [ 385.598559] RIP: 0033:0x7fa4d2b4b767 [ 385.598561] RSP: 002b:00007fa4c76b8138 EFLAGS: 00000246 ORIG_RAX: 000000000000003b [ 385.598566] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007fa4d2b4b767 [ 385.598568] RDX: 00007ffc0ced82f8 RSI: 00007fa4c000aa70 RDI: 00007fa4c0012b40 [ 385.598571] RBP: 00007fa4c0013040 R08: 0000000000000002 R09: 0000000000000005 [ 385.598573] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4c76dcf08 [ 385.598575] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005 [ 385.598587] Code: ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 28 6b cb 81 e8 1a ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 68 6b cb 81 e8 06 ef c1 ff <0f> ff 31 c0 c3 90 41 57 41 56 41 55 41 54 55 53 48 83 ec 20 9c [ 385.598722] ---[ end trace a55595b2ba3d54dd ]---
Reference https://patchwork.freedesktop.org/series/34623/
Starts with: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1756) CRITICAL: Test assertion failure function run, file gem_exec_flush.c:323: (gem_exec_flush:1756) CRITICAL: Failed assertion: map[i] == i (gem_exec_flush:1756) CRITICAL: error: 0x8e7c0142 != 0x142 Subtest basic-wb-ro-before-default failed. then: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-default.html (gem_exec_flush:1761) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_flush:1761) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-wb-ro-default failed. then a lot of unexpected skips followed by: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap@basic-small-bo.html [ 221.187721] BUG: unable to handle kernel paging request at 000000007276fe22 [ 221.187743] IP: __lock_acquire+0xb0/0x15a0 [ 221.187753] Oops: 0002 [#1] PREEMPT SMP [ 221.187761] Dumping ftrace buffer: [ 221.187766] --------------------------------- [ 221.187849] CPU:3 [LOST 86041 EVENTS] gem_exec-1765 3..s1 200397128us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.2, seqno=80ab7 followed by Softdog: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap_gtt@basic.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3497/fi-glk-dsi/igt@gem_exec_flush@basic-uc-rw-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3498/fi-glk-dsi/igt@gem_exec_store@basic-vebox.html result say: Received signal SIGSEGV. Stack trace: #0 [fatal_sig_handler+0x12f] #1 [killpg+0x40] #2 [_IO_file_fopen+0x42] This is yet another weird thing on this machine.
One random memcorruption issue fixed: commit 7d622351c94172a42bfe9b13bdb0fdc2be90ed3b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 13 09:48:02 2017 +0000 drm/i915/fence: Use rcu to defer freeing of irq_work It is illegal to perform an immediate free of the struct irq_work from inside the irq_work callback (as irq_work_run_list modifies work->flags after execution of the work->func()). As we use the irq_work to coordinate the freeing of the callback from two different softirq paths, we need to defer the kfree from inside our irq_work callback, for which we can use kfree_rcu. Fixes: 81c0ed21aa91 ("drm/i915/fence: Avoid del_timer_sync() from inside a timer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213094802.28243-1-chris@chris-wilson.co.uk Hopefully this explains a lot of weirdness.
It starts with: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html (gem_exec_flush:1725) igt-aux-CRITICAL: Failed assertion: !"GPU hung" <7>[ 170.959810] [IGT] gem_exec_flush: starting subtest basic-uc-set-default ... <7>[ 177.768550] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915] ... <6>[ 180.857713] [drm] GPU HANG: ecode 9:0:0x8fdafffa, in gem_exec_flush [1725], reason: Hang on rcs0, action: reset <7>[ 180.858578] [drm:i915_reset_device [i915]] resetting chip <5>[ 180.858755] i915 0000:00:02.0: Resetting chip after gpu hang <7>[ 181.361556] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING <3>[ 182.065059] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout then: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html incomplete pstore looks legit, but most of it is already in dmesg. from dmesg: <7>[ 187.664410] [IGT] gem_exec_reloc: exiting, ret=77 <4>[ 187.770081] WARNING: can't dereference iret registers at 00000000b08c140c for ip page_fault+0x7/0x30 <0>[ 187.770083] BUG: stack guard page was hit at 00000000dd82e48c (stack is 00000000ca52e808..0000000085d6198d) <4>[ 187.770178] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP <0>[ 187.770191] Dumping ftrace buffer: <0>[ 187.770199] --------------------------------- <0>[ 187.770281] CPU:3 [LOST 63587 EVENTS] gem_exec-1729 3..s1 173324421us : execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=5d0bc ... <4>[ 187.801374] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 187.801505] CPU: 2 PID: 1377 Comm: python3 Tainted: G U W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 187.801531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 187.801564] RIP: 0010:page_fault+0x7/0x30 <4>[ 187.801578] RSP: 0018:ffffc90001d83fa8 EFLAGS: 00010083 <4>[ 187.801597] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000000 <4>[ 187.801618] RDX: 0000000080000610 RSI: 0000000000000000 RDI: ffffc90001d840f8 <4>[ 187.801639] RBP: 0000000080000610 R08: 0000000000000001 R09: 0101010101010101 <4>[ 187.801659] R10: ffffc90001d87a90 R11: 0000000000000000 R12: ffffc90001d840f8 <4>[ 187.801680] R13: ffff8801733c51c0 R14: 0000000000000001 R15: ffff8801733c51c0 <4>[ 187.801702] FS: 00007fac1674e700(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 <4>[ 187.801726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 187.801744] CR2: ffffc90001d83f98 CR3: 000000016f353000 CR4: 0000000000340ee0 <4>[ 187.801764] Call Trace: <4>[ 187.801782] ? no_context+0x3dc/0x430 <4>[ 187.801800] ? __do_page_fault+0x196/0x560 ... <1>[ 187.804948] RIP: page_fault+0x7/0x30 RSP: ffffc90001d83fa8 <4>[ 187.804968] ---[ end trace 7832dee94e24beea ]--- <3>[ 188.000284] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34 <3>[ 188.000315] in_atomic(): 1, irqs_disabled(): 1, pid: 1377, name: python3 <4>[ 188.000334] INFO: lockdep is turned off. <4>[ 188.000347] irq event stamp: 1180122 <4>[ 188.000367] hardirqs last enabled at (1180121): [<00000000e846d9d1>] get_page_from_freelist+0x24c/0x14c0 <4>[ 188.000395] hardirqs last disabled at (1180122): [<00000000804f94d3>] __slab_alloc.isra.24.constprop.29+0x19/0x70 <4>[ 188.000425] softirqs last enabled at (1179892): [<000000002b075771>] __do_softirq+0x3aa/0x4de <4>[ 188.000451] softirqs last disabled at (1179885): [<00000000a976b967>] irq_exit+0xaa/0xc0 <3>[ 188.000473] Preemption disabled at: <4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0 <4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 188.000560] Call Trace: <4>[ 188.000578] dump_stack+0x5f/0x86 <4>[ 188.000593] ___might_sleep+0x1d9/0x240 then continue in: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log and it is actually a Softdog: <3>[ 188.000473] Preemption disabled at: <4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0 <4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W 4.15.0-rc3-CI-CI_DRM_3511+ #1 <4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 188.000560] Call Trace: <4>[ 188.000578] dump_stack+0x5f/0x86 <4>[ 188.000593] ___might_sleep+0x1d9/0x240 <4>[ 188.000610] exit_signals+0x1b/0x2a0 <4>[ 188.000624] do_exit+0x93/0xcc0 <4>[ 188.000638] ? trace_hardirqs_off_caller+0x75/0xd0 <4>[ 188.000654] ? do_syscall_64+0x19/0x1a0 <4>[ 188.000671] rewind_stack_do_exit+0x17/0x20 <6>[ 188.000769] note: python3[1377] exited with preempt_count 1 <12>[ 277.879085] owatch: TIMEOUT! <12>[ 277.879252] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html (gem_exec_flush:1763) igt-aux-CRITICAL: Failed assertion: !"GPU hung" https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_wait@basic-await-all.html dmesg: [ 245.788590] general protection fault: 0000 [#1] PREEMPT SMP [ 245.788604] Dumping ftrace buffer: [ 245.788610] --------------------------------- [ 245.788704] CPU:0 [LOST 73316 EVENTS] gem_exec-1768 0..s1 202889461us : execlists_submission_tasklet: rcs0 in[0]: ctx=4.2, seqno=80f1a ... then a bunch of backtraces that are repeated in the pstore from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_workarounds@basic-read.html incomplete
ON CI_DRM_3606: First test with dmesg warn: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3606/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html then a lot of the following following igt@gem_exec_* tests are hit with dmesg-warn. It started here: <4>[ 120.453872] WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:3459 schedule_idle+0x2c/0x30 <4>[ 120.453878] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 120.453969] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G U 4.15.0-rc7-CI-CI_DRM_3606+ #1 <4>[ 120.453973] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 120.453979] RIP: 0010:schedule_idle+0x2c/0x30 <4>[ 120.453983] RSP: 0018:ffffc900000d3ee8 EFLAGS: 00010286 <4>[ 120.453991] RAX: ee000000fe000000 RBX: 0000000000000003 RCX: 0000000000000001 <4>[ 120.453995] RDX: 0000000000000000 RSI: ffffffff820ab24f RDI: ffffffff820b8d9d <4>[ 120.453998] RBP: ffff88017a942740 R08: 0000000000000000 R09: 0000000000000001 <4>[ 120.454002] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8233caf0 <4>[ 120.454006] R13: ffff88017a942740 R14: ffff88017fdab550 R15: ffffffff82293980 <4>[ 120.454010] FS: 0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000 <4>[ 120.454014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 120.454018] CR2: 00007f8fab777000 CR3: 000000016f6fc000 CR4: 0000000000340ee0 <4>[ 120.454022] Call Trace: <4>[ 120.454031] do_idle+0x14b/0x1d0 <4>[ 120.454042] cpu_startup_entry+0x14/0x20 <4>[ 120.454049] start_secondary+0x129/0x160 <4>[ 120.454057] secondary_startup_64+0xa5/0xb0 <4>[ 120.454076] Code: 48 8b 04 25 80 4e 01 00 53 48 8b 40 08 48 85 c0 75 19 65 48 8b 1c 25 80 4e 01 00 31 ff e8 cd f0 ff ff 48 8b 03 a8 08 75 f2 5b c3 <0f> ff eb e3 bf 01 00 00 00 e8 e6 e8 7e ff e8 a1 fb ff ff bf 01 Then there is ~170Mb of various WARNs from: kernel/locking/lockdep.c
*** Bug 103615 has been marked as a duplicate of this bug. ***
Last seen: CI_DRM_3783: 2018-02-16
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.