Bug 103514 - [BAT] [GLK-DSI only] igt@gem_* - Failed assertion: !"GPU hung" - and its aftermath
Summary: [BAT] [GLK-DSI only] igt@gem_* - Failed assertion: !"GPU hung" - and its afte...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Kimmo Nikkanen
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 103615 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-10-30 11:12 UTC by Marta Löfstedt
Modified: 2018-03-16 09:39 UTC (History)
1 user (show)

See Also:
i915 platform: GLK
i915 features: GEM/Other


Attachments

Description Marta Löfstedt 2017-10-30 11:12:55 UTC
CI_DRM_3293 fi-glk-dsi igt@gem_exec_nop@basic-parallel failed:

(gem_exec_nop:1922) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_nop:1922) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-parallel failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@gem_exec_nop@basic-parallel.html

NOTE: after this there are a bunch a spuriously skipped tests and then
fail on:	
(kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406:
(kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Failed assertion: !wedged
(kms_pipe_crc_basic:2409) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor
Subtest hang-read-crc-pipe-B failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html

and then there is a dmesg-warn on:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3293/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

[  395.321772] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  395.321921] WARN_ON(reset && reset != -19)
[  395.321959] ------------[ cut here ]------------
[  395.321995] WARNING: CPU: 2 PID: 2462 at drivers/gpu/drm/i915/i915_gem.c:4725 i915_gem_sanitize+0x52/0x80 [i915]
[  395.321997] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me snd_pcm prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel
[  395.322070] CPU: 2 PID: 2462 Comm: kworker/u8:7 Tainted: G     U          4.14.0-rc6-CI-CI_DRM_3293+ #1
[  395.322073] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  395.322080] Workqueue: events_unbound async_run_entry_fn
[  395.322084] task: ffff88017807cec0 task.stack: ffffc90002194000
[  395.322117] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915]
[  395.322120] RSP: 0018:ffffc90002197c50 EFLAGS: 00010282
[  395.322124] RAX: 000000000000001e RBX: ffff880168480000 RCX: 0000000000000006
[  395.322126] RDX: 0000000000000006 RSI: ffffffff81d0e984 RDI: ffffffff81cc2576
[  395.322130] RBP: ffffc90002197c60 R08: 0000000000000000 R09: 0000000000000001
[  395.322132] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880168480070
[  395.322134] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cea5ef
[  395.322136] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
[  395.322139] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  395.322141] CR2: 0000560453047068 CR3: 0000000174e85000 CR4: 00000000003406e0
[  395.322143] Call Trace:
[  395.322178]  i915_gem_suspend+0x111/0x170 [i915]
[  395.322208]  i915_drm_suspend+0x6d/0x170 [i915]
[  395.322238]  i915_pm_suspend+0x28/0x40 [i915]
[  395.322246]  pci_pm_suspend+0x78/0x140
[  395.322251]  dpm_run_callback+0x6f/0x310
[  395.322255]  ? pci_pm_freeze+0xf0/0xf0
[  395.322260]  __device_suspend+0x102/0x380
[  395.322264]  ? dpm_watchdog_set+0x70/0x70
[  395.322270]  async_suspend+0x1f/0xa0
[  395.322274]  async_run_entry_fn+0x38/0x160
[  395.322279]  process_one_work+0x221/0x650
[  395.322286]  worker_thread+0x4e/0x3b0
[  395.322292]  kthread+0x114/0x150
[  395.322294]  ? process_one_work+0x650/0x650
[  395.322297]  ? kthread_create_on_node+0x40/0x40
[  395.322303]  ret_from_fork+0x27/0x40
[  395.322311] Code: 5d c3 be ff ff ff ff 48 89 df e8 da dc 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 c8 3a 23 a0 48 c7 c7 dc 0a 22 a0 e8 0f 51 fb e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 8c 00 7d e1 48 89 df 
[  395.322431] ---[ end trace 82f68684f9edfc24 ]---

Why are we having this inconsistent behavior when the GPU is wedged. Also, see bug 102848. On the shards it would be very time consuming to find these patterns.
Comment 1 Marta Löfstedt 2017-11-08 08:46:42 UTC
Here is a patchwork example, although the patch was about enabling runtime_pm, this pattern is repeating itself.
starting at:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@gem_exec_nop@basic-series.html
fail:	
(gem_exec_nop:1852) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_nop:1852) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-series failed.
dmesg:	
[  226.781384] Setting dangerous option reset - tainting kernel
[  243.856241] i915 0000:00:02.0: Resetting chip after gpu hang
[  245.063521] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  245.063716] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5

then:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html
fail:
(kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406:
(kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Failed assertion: !wedged
(kms_pipe_crc_basic:2326) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor
Subtest hang-read-crc-pipe-A failed.

and finally:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6996/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
dmesg-warn:
[  375.382594] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  375.382740] WARN_ON(reset && reset != -19)
[  375.382776] ------------[ cut here ]------------
[  375.382812] WARNING: CPU: 2 PID: 2383 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915]
[  375.382814] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[  375.382892] CPU: 2 PID: 2383 Comm: kworker/u8:7 Tainted: G     U          4.14.0-rc8-CI-Patchwork_6996+ #1
[  375.382894] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  375.382899] Workqueue: events_unbound async_run_entry_fn
[  375.382903] task: ffff88016e980040 task.stack: ffffc90002164000
[  375.382937] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915]
[  375.382939] RSP: 0018:ffffc90002167c58 EFLAGS: 00010292
[  375.382943] RAX: 000000000000001e RBX: ffff880166e00000 RCX: 0000000000000006
[  375.382945] RDX: 0000000000000006 RSI: ffffffff81d0ed64 RDI: ffffffff81cc294e
[  375.382947] RBP: ffffc90002167c68 R08: 0000000000000000 R09: 0000000000000001
[  375.382949] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880166e00070
[  375.382951] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cea9cf
[  375.382954] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
[  375.382957] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  375.382959] CR2: 000055b4ccc93068 CR3: 0000000003e0f000 CR4: 00000000003406e0
[  375.382961] Call Trace:
[  375.382997]  i915_gem_suspend+0x111/0x170 [i915]
[  375.383027]  i915_drm_suspend+0x6d/0x170 [i915]
[  375.383057]  i915_pm_suspend+0x28/0x40 [i915]
[  375.383063]  pci_pm_suspend+0x78/0x140
[  375.383068]  dpm_run_callback+0x6f/0x310
[  375.383072]  ? pci_pm_freeze+0xf0/0xf0
[  375.383077]  __device_suspend+0x102/0x380
[  375.383081]  ? dpm_watchdog_set+0x70/0x70
[  375.383087]  async_suspend+0x1f/0xa0
[  375.383091]  async_run_entry_fn+0x38/0x160
[  375.383096]  process_one_work+0x221/0x650
[  375.383103]  worker_thread+0x4e/0x3c0
[  375.383108]  kthread+0x114/0x150
[  375.383111]  ? process_one_work+0x650/0x650
[  375.383114]  ? kthread_create_on_node+0x40/0x40
[  375.383119]  ret_from_fork+0x27/0x40
[  375.383127] Code: 5d c3 be ff ff ff ff 48 89 df e8 2a e3 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 58 aa 26 a0 48 c7 c7 a4 7a 25 a0 e8 4f df f7 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 fc a0 79 e1 48 89 df 
[  375.383247] ---[ end trace 7978d132da79d715 ]---
Comment 2 Marta Löfstedt 2017-11-09 07:05:24 UTC
On CI_DRM_3321 the issue started here:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html


(gem_exec_flush:1727) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1727) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-batch-kernel-default-uc failed.

Then dmesg is filled with:
[  144.151041] [drm:gen8_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080

this cause a lot of following tests to either be skipped or dmesg-warn.

but then from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3321/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html
all the following tests pass.
However, the machine did not boot up for CI_DRM_3322 and CI_DRM_3323.
When rebooting manually in the lab the display was full of garbage.
Comment 3 Marta Löfstedt 2017-11-14 09:00:35 UTC
Here is another example:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@gem_exec_flush@basic-wb-prw-default.html

(gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wb-prw-default failed.

all tests are skipped until:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
(kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Test assertion failure function igt_force_gpu_reset, file igt_gt.c:406:
(kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Failed assertion: !wedged
(kms_pipe_crc_basic:2273) igt-gt-CRITICAL: Last errno: 9, Bad file descriptor
Subtest hang-read-crc-pipe-A failed.
then fail the following tests and incomplete on:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7093/fi-glk-dsi/igt@pm_rpm@basic-rte.html
Comment 4 Marta Löfstedt 2017-11-14 13:05:59 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_491/fi-glk-dsi/igt@gem_ctx_switch@basic-default.html

(gem_ctx_switch:1560) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_ctx_switch:1560) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-default failed.

skip a lot of tests, fail some, skip some and this time actually no incomplete.
Comment 6 Elizabeth 2017-11-14 19:57:14 UTC
Rising priority since it is BAT.
Comment 7 Marta Löfstedt 2017-11-20 10:57:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html
fail:
(gem_exec_flush:1775) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1775) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-uc-set-default failed.

the a bunch of skips

then fail on:
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-b.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html

due to:
(kms_pipe_crc_basic:2287) igt-gt-CRITICAL: Failed assertion: !wedged

then
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_490/fi-glk-dsi/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
[  343.853966] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  343.854132] WARN_ON(reset && reset != -19)
[  343.854169] ------------[ cut here ]------------
[  343.854205] WARNING: CPU: 2 PID: 2335 at drivers/gpu/drm/i915/i915_gem.c:4724 i915_gem_sanitize+0x52/0x80 [i915]
[  343.854207] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[  343.854263] CPU: 2 PID: 2335 Comm: kworker/u8:4 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3331+ #1
[  343.854265] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  343.854273] Workqueue: events_unbound async_run_entry_fn
[  343.854277] task: ffff88017a41cec0 task.stack: ffffc90002218000
[  343.854309] RIP: 0010:i915_gem_sanitize+0x52/0x80 [i915]
[  343.854312] RSP: 0018:ffffc9000221bc58 EFLAGS: 00010292
[  343.854316] RAX: 000000000000001e RBX: ffff880167100000 RCX: 0000000000000006
[  343.854318] RDX: 0000000000000006 RSI: ffffffff81d11314 RDI: ffffffff81cc3dee
[  343.854320] RBP: ffffc9000221bc68 R08: 0000000000000000 R09: 0000000000000001
[  343.854322] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880167100070
[  343.854324] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81cecf7f
[  343.854327] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
[  343.854329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  343.854331] CR2: 000055b19f0a4068 CR3: 0000000003e0f000 CR4: 00000000003406e0
[  343.854334] Call Trace:
[  343.854368]  i915_gem_suspend+0x111/0x170 [i915]
[  343.854398]  i915_drm_suspend+0x6d/0x170 [i915]
[  343.854429]  i915_pm_suspend+0x28/0x40 [i915]
[  343.854434]  pci_pm_suspend+0x78/0x140
[  343.854440]  dpm_run_callback+0x6f/0x310
[  343.854444]  ? pci_pm_freeze+0xf0/0xf0
[  343.854449]  __device_suspend+0x102/0x380
[  343.854453]  ? dpm_watchdog_set+0x70/0x70
[  343.854460]  async_suspend+0x1f/0xa0
[  343.854463]  async_run_entry_fn+0x38/0x160
[  343.854469]  process_one_work+0x221/0x650
[  343.854475]  worker_thread+0x4e/0x3c0
[  343.854481]  kthread+0x114/0x150
[  343.854484]  ? process_one_work+0x650/0x650
[  343.854487]  ? kthread_create_on_node+0x40/0x40
[  343.854493]  ret_from_fork+0x27/0x40
[  343.854501] Code: 5d c3 be ff ff ff ff 48 89 df e8 ca e5 02 00 85 c0 74 ea 83 f8 ed 74 e5 48 c7 c6 a8 fa 24 a0 48 c7 c7 a4 ca 23 a0 e8 6f 90 f9 e0 <0f> ff eb ce 4c 8d 67 70 31 f6 4c 89 e7 e8 9c a1 7b e1 48 89 df 
[  343.854622] ---[ end trace 92cf58358a865d76 ]---
Comment 9 Marta Löfstedt 2017-11-21 06:44:35 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3366/fi-glk-dsi/igt@gem_sync@basic-store-each.html

(gem_sync:3613) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_sync:3613) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-store-each failed.

Above doesn't cause any after effect. But that is maybe due to test coming after not being sensitive to wedged GPU.
Comment 11 Marta Löfstedt 2017-11-21 12:56:51 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-wb.html
	
(gem_exec_flush:1784) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1784) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-batch-kernel-default-wb failed.

[  152.381045] Setting dangerous option reset - tainting kernel
[  164.786879] i915 0000:00:02.0: Resetting chip after gpu hang
[  165.992480] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  165.992686] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -5


then a lot of skips and incomplete on:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7217/fi-glk-dsi/igt@gem_sync@basic-store-each.html
Comment 12 Marta Löfstedt 2017-11-22 11:21:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3371/fi-glk-dsi/igt@gem_sync@basic-store-all.html

(gem_sync:3559) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_sync:3559) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-store-all failed.

no other tests affected.
Comment 13 Marta Löfstedt 2017-11-23 06:39:55 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4005/fi-glk-dsi/igt@gem_sync@basic-all.html

(gem_sync:3498) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_sync:3498) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-all failed.
Comment 15 Marta Löfstedt 2017-11-23 14:25:17 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@gem_sync@basic-many-each.html

(gem_sync:3475) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_sync:3475) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-many-each failed.

Note this dmesg has more information:

	

[  314.719246] Setting dangerous option reset - tainting kernel
[  316.762341] general protection fault: 0000 [#1] PREEMPT SMP
[  316.762366] Dumping ftrace buffer:
[  316.762379] ---------------------------------
[  316.762487] CPU:3 [LOST 255731 EVENTS]
               gem_sync-3495    3..s1 316076943us : execlists_submission_tasklet: bcs0 in[0]:  ctx=2.2, seqno=238ed
...
[  316.795368] gem_sync-3526    1..s. 316804524us : execlists_submission_tasklet: bcs0 out[0]: ctx=2.2, seqno=239ff
[  316.795396] ---------------------------------
[  316.795411] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mei mii prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[  316.795535] CPU: 3 PID: 3483 Comm: gem_sync Tainted: G     U          4.14.0-CI-CI_DRM_3377+ #1
[  316.795559] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  316.795588] task: ffff8801719a2780 task.stack: ffffc90000df4000
[  316.795612] RIP: 0010:blk_flush_plug_list+0x54/0x270
[  316.795628] RSP: 0018:ffffc90000df7b38 EFLAGS: 00010292
[  316.795645] RAX: 0010000000000020 RBX: ffffc90000df7b50 RCX: 0000000000000000
[  316.795665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0010000000000000
[  316.795685] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001
[  316.795704] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
[  316.795724] R13: dead000000000100 R14: 0010000000000000 R15: ffff8801681eec40
[  316.795745] FS:  00007fd4aa2dc700(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000
[  316.795768] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  316.795785] CR2: 00007fd4af1b6010 CR3: 0000000171ee0000 CR4: 00000000003406e0
[  316.795804] Call Trace:
[  316.795822]  io_schedule_prepare+0x3c/0x40
[  316.795839]  io_schedule_timeout+0xf/0x40
[  316.795911]  i915_wait_request+0x33c/0x830 [i915]
[  316.795932]  ? wake_up_q+0x70/0x70
[  316.795945]  ? wake_up_q+0x70/0x70
[  316.796017]  i915_gem_object_wait_fence+0xc8/0xe0 [i915]
[  316.796090]  i915_gem_object_wait+0x282/0x3d0 [i915]
[  316.796164]  i915_gem_wait_ioctl+0x10f/0x280 [i915]
[  316.796235]  ? i915_gem_unset_wedged+0x180/0x180 [i915]
[  316.796254]  drm_ioctl_kernel+0x65/0xb0
[  316.796269]  drm_ioctl+0x295/0x340
[  316.796337]  ? i915_gem_unset_wedged+0x180/0x180 [i915]
[  316.796355]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  316.796374]  ? lock_acquire+0xaf/0x200
[  316.796389]  ? __fget+0xe4/0x1f0
[  316.796405]  do_vfs_ioctl+0x8f/0x670
[  316.796420]  ? __fget+0x101/0x1f0
[  316.796435]  SyS_ioctl+0x3b/0x70
[  316.796450]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  316.796465] RIP: 0033:0x7fd4ad5fc587
[  316.796477] RSP: 002b:00007fd4aa2dbce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  316.796501] RAX: ffffffffffffffda RBX: ffffc90000df7ff0 RCX: 00007fd4ad5fc587
[  316.796520] RDX: 00007fd4aa2dbd20 RSI: 00000000c010646c RDI: 0000000000000003
[  316.796540] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000c
[  316.796560] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000046
[  316.796579] R13: 00007ffff3b389ff R14: 00007fd4aa2dc9c0 R15: 00007fd4aa2dc700
[  316.796604] Code: de 48 83 ec 28 48 8d 44 24 08 48 8d 5c 24 18 48 89 44 24 08 48 89 44 24 10 48 8d 47 20 48 89 5c 24 18 48 89 5c 24 20 48 89 04 24 <49> 8b 46 20 48 39 04 24 74 6d 49 8b 46 20 48 8b 34 24 48 39 c6 
[  316.796770] RIP: blk_flush_plug_list+0x54/0x270 RSP: ffffc90000df7b38
[  316.860599] ---[ end trace 8af6cac31fbe4619 ]---
[  321.627562] i915 0000:00:02.0: Resetting chip after gpu hang
Comment 16 Marta Löfstedt 2017-11-23 14:27:50 UTC
This is probably the aftermath of https://bugs.freedesktop.org/show_bug.cgi?id=103514#c15 

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3377/fi-glk-dsi/igt@pm_backlight@basic-brightness.html

(pm_backlight:4059) igt-kms-CRITICAL: Test assertion failure function do_display_commit, file igt_kms.c:2895:
(pm_backlight:4059) igt-kms-CRITICAL: Failed assertion: ret == 0
(pm_backlight:4059) igt-kms-CRITICAL: Last errno: 13, Permission denied
(pm_backlight:4059) igt-kms-CRITICAL: error: -13 != 0
Test pm_backlight failed.
Comment 19 Marta Löfstedt 2017-11-27 07:51:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3384/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html

(gem_exec_flush:1790) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1790) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wb-ro-before-default failed.

then the usual skipping and failing.
Comment 20 Marta Löfstedt 2017-11-27 07:56:25 UTC
CI_DRM_3388

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html
(gem_ctx_switch:1525) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_ctx_switch:1525) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-default-heavy failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_sync@basic-store-each.html
(gem_sync:3587) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_sync:3587) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-store-each failed.
dmesg:
[  339.327621] [drm:fw_domains_get_with_fallback [i915]] *ERROR* blitter: timed out waiting for forcewake ack request.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_fence_blits@basic.html
[  353.875381] BUG: stack guard page was hit at ffffc90000923fb8 (stack is ffffc90000924000..ffffc90000927fff)

then incomplete:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3388/fi-glk-dsi/igt@gem_tiled_pread_basic.html
run.log:
running: igt/gem_tiled_pread_basic                     

[154/289] skip: 14, pass: 137, fail: 1, dmesg-fail: 2 -
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
Comment 21 Marta Löfstedt 2017-11-27 08:01:57 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html

typical start:
(gem_ctx_switch:1590) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_ctx_switch:1590) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-default-heavy failed.

The this looks like igt/piglit error:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@gem_exec_parallel@basic.html

Exception	

<class 'UnicodeDecodeError'>'utf-8' codec can't decode byte 0xac in position 109087: invalid start byte

Traceback	

  File "/opt/igt/piglit/framework/test/base.py", line 205, in execute
    self.run()
  File "/opt/igt/piglit/framework/test/base.py", line 271, in run
    self._run_command()
  File "/opt/igt/piglit/framework/test/base.py", line 338, in _run_command
    out, err = proc.communicate(timeout=self.timeout)
  File "/usr/lib/python3.5/subprocess.py", line 801, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1488, in _communicate
    self.stderr.encoding)
  File "/usr/lib/python3.5/subprocess.py", line 705, in _translate_newlines
    data = data.decode(encoding)

then incomplete on:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3390/fi-glk-dsi/igt@kms_addfb_basic@bad-pitch-63.html

run.log:
running: igt/kms_addfb_basic/bad-pitch-63

[172/289] skip: 69, pass: 101, fail: 2 | 
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 22 Marta Löfstedt 2017-11-28 15:30:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html

(gem_exec_flush:1769) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:484:
(gem_exec_flush:1769) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wb-rw-before-default failed.
Comment 23 Jani Saarinen 2017-11-29 15:45:51 UTC
Reference on https://patchwork.freedesktop.org/series/34623/
Comment 24 Marta Löfstedt 2017-11-30 07:34:09 UTC
This thing keeps hitting more subtests:

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd.html
from dmesg:
<4>[  125.592642] general protection fault: 0000 [#1] PREEMPT SMP
<0>[  125.592662] Dumping ftrace buffer:
<0>[  125.592672] ---------------------------------
<0>[  125.592763] CPU:3 [LOST 234 EVENTS]
                  gem_clos-1534    3..s1 68186424us : execlists_submission_tasklet: bcs0 in[0]:  ctx=4.1, seqno=11f
...


then there is a softdog:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4025/fi-glk-dsi/igt@gem_exec_basic@basic-bsd1.html
Comment 25 Marta Löfstedt 2017-12-04 07:00:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3429/fi-glk-dsi/igt@kms_chamelium@hdmi-crc-fast.html

Something is clearly wrong, glk-dsi is not connected to chamelium.

dmesg:
<7>[  342.680647] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[  342.680721] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
<7>[  342.840269] [IGT] kms_chamelium: executing
<7>[  342.881613] [IGT] kms_chamelium: exiting, ret=77
<7>[  343.030927] [IGT] kms_chamelium: executing
<7>[  343.069355] [IGT] kms_chamelium: exiting, ret=77
<7>[  343.234065] [IGT] kms_chamelium: executing
<7>[  343.274821] [IGT] kms_chamelium: exiting, ret=77
<7>[  343.451939] [IGT] kms_chamelium: executing
<7>[  343.489785] [IGT] kms_chamelium: exiting, ret=77
<7>[  343.647317] [IGT] kms_chamelium: executing
<7>[  343.689507] [IGT] kms_chamelium: exiting, ret=77
<4>[  343.823626] general protection fault: 0000 [#1] PREEMPT SMP
[  343.823643] Dumping ftrace buffer:
[  343.823651] ---------------------------------
[  343.823727] CPU:3 [LOST 258392 EVENTS]
               kms_addf-3908    3..s1 338706705us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=25314
...
<0>[  343.844386]   <idle>-0       1..s1 343766801us : execlists_submission_tasklet: vecs0 cs-irq head=5 [5], tail=5 [5]
<0>[  343.844401] ---------------------------------
<4>[  343.844408] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 snd_pcm mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[  343.844476] CPU: 3 PID: 3947 Comm: kms_chamelium Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3429+ #1
<4>[  343.844490] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  343.844505] task: ffff8801732c8040 task.stack: ffffc90002654000
<4>[  343.844518] RIP: 0010:do_dentry_open.isra.1+0xf6/0x300
<4>[  343.844526] RSP: 0018:ffffc90002657d40 EFLAGS: 00010202
<4>[  343.844535] RAX: 31ffffff30a25d80 RBX: ffff88017737f140 RCX: 0000000000000001
<4>[  343.844546] RDX: 00000000b6001000 RSI: 0000000000000001 RDI: ffff88016823ed40
<4>[  343.844576] RBP: ffff88016823e9a0 R08: ffff8801732c8908 R09: 000000004ebbe168
<4>[  343.844586] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[  343.844597] R13: ffff88017737f150 R14: ffffc90002657e38 R15: 0000000000000000
<4>[  343.844626] FS:  0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000
<4>[  343.844638] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  343.844646] CR2: 00007fcd9d3811f0 CR3: 0000000174984000 CR4: 0000000000340ee0
<4>[  343.844656] Call Trace:
<4>[  343.844665]  path_openat+0x281/0x9d0
<4>[  343.844674]  do_filp_open+0x85/0xf0
<4>[  343.844685]  ? __alloc_fd+0xe9/0x200
<4>[  343.844696]  ? do_sys_open+0x12b/0x1f0
<4>[  343.844702]  do_sys_open+0x12b/0x1f0
<4>[  343.844712]  entry_SYSCALL_64_fastpath+0x1c/0x89
<4>[  343.844720] RIP: 0033:0x7fcda07bb7c7
<4>[  343.844726] RSP: 002b:00007fffc0672af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
<4>[  343.844738] RAX: ffffffffffffffda RBX: 00007fcda097ea38 RCX: 00007fcda07bb7c7
<4>[  343.844748] RDX: 00007fcda09a29f0 RSI: 0000000000080000 RDI: 00007fcda097ef10
<4>[  343.844758] RBP: 00007fffc0672b70 R08: 0000000000000000 R09: 00007fffc0672bcf
<4>[  343.844768] R10: 00007fffc0672be0 R11: 0000000000000246 R12: 000000006ffffdff
<4>[  343.844779] R13: 00007fffc0672c58 R14: 000000037ffff1a0 R15: 0000000000000802
<4>[  343.844793] Code: 00 00 00 01 00 0f b7 45 00 66 25 00 f0 66 2d 00 40 66 a9 00 b0 0f 84 cf 00 00 00 48 8b 85 08 02 00 00 48 85 c0 0f 84 ad 00 00 00 <48> 8b 38 e8 c2 11 f2 ff 84 c0 0f 84 9d 00 00 00 48 8b 85 08 02 
<1>[  343.844877] RIP: do_dentry_open.isra.1+0xf6/0x300 RSP: ffffc90002657d40
<4>[  343.844939] ---[ end trace a0a331bdf01f2df4 ]---
Comment 26 Marta Löfstedt 2017-12-04 07:03:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3439/fi-glk-dsi/igt@gem_exec_basic@gtt-default.html

	

[  115.068452] BUG: unable to handle kernel paging request at 00000000818bc06f
[  115.068476] IP: do_error_trap+0x14/0xa0
[  115.068491] Oops: 0002 [#1] PREEMPT SMP
[  115.068501] Dumping ftrace buffer:
[  115.068508] ---------------------------------
[  115.068584] CPU:0 [LOST 8035 EVENTS]
               gem_ctx_-1513    0..s1 68581260us : execlists_submission_tasklet: rcs0 in[0]:  ctx=51.1, seqno=490d
...
<0>[  115.088691]   <idle>-0       1..s1 115050529us : execlists_submission_tasklet: vecs0 out[0]: ctx=3.1, seqno=823
<0>[  115.088705] ---------------------------------
<4>[  115.088712] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel
<4>[  115.088778] CPU: 0 PID: 1565 Comm: python3 Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3439+ #1
<4>[  115.088791] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  115.088805] task: ffff8801736e8040 task.stack: ffffc90000398000
<4>[  115.088816] RIP: 0010:do_error_trap+0x14/0xa0
<4>[  115.088823] RSP: 0018:ffffc9000039b9c8 EFLAGS: 00010246
<4>[  115.088832] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: 0000000000000006
<4>[  115.088842] RDX: ffffffff81c62a96 RSI: 0000000000000000 RDI: ffffc9000039b9f8
<4>[  115.088852] RBP: ffffffffffffffff R08: 0000000000000004 R09: ffffffffffffffff
<4>[  115.088862] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
<4>[  115.088872] R13: ffffffff81c62a96 R14: 0000000000000004 R15: ffff880171f3c008
<4>[  115.088883] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
<4>[  115.088894] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  115.088903] CR2: 00000000818bc06f CR3: 0000000172278000 CR4: 0000000000340ef0
<4>[  115.088913] Call Trace:
<4>[  115.088922]  invalid_op+0x18/0x20
<4>[  115.088930] RIP: 0010:do_general_protection+0x9/0x1d0
<4>[  115.088937] RSP: 0018:ffffc9000039baa0 EFLAGS: 00010006
<4>[  115.088946] RAX: 00000000818bc077 RBX: 0000000000000001 RCX: ffffffff818bc077
<4>[  115.088956] RDX: ff118801784043c0 RSI: 0000000000000000 RDI: ffffc9000039bac9
<4>[  115.088966] RBP: ffffffffffffffff R08: 0000000000000000 R09: ffffffffffffffff
<4>[  115.088976] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
<4>[  115.088986] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c008
<4>[  115.088999]  ? native_iret+0x7/0x7
<4>[  115.089009]  general_protection+0x22/0x30
<4>[  115.089042] RIP: 0010:unmap_page_range+0x46/0x8e0
<4>[  115.089050] RSP: 0018:ffffc9000039bb78 EFLAGS: 00010206
<4>[  115.089059] RAX: 00000000000007f0 RBX: ffff88016fca7af0 RCX: 00007f2fe8d5b000
<4>[  115.089069] RDX: ff118801784043c0 RSI: ffff88016fca7af0 RDI: ffffc9000039bc90
<4>[  115.089080] RBP: ffffffffffffffff R08: 0000000000000000 R09: 00007f2fe8d5b000
<4>[  115.089108] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
<4>[  115.089118] R13: ffffc9000039bc90 R14: 00007f2fe855b000 R15: ffff880171f3c077
<4>[  115.089133]  ? unmap_page_range+0x724/0x8e0
<4>[  115.089146]  unmap_vmas+0x47/0x90
<4>[  115.089154]  exit_mmap+0xa0/0x170
<4>[  115.089165]  mmput+0x5c/0x120
<4>[  115.089173]  flush_old_exec+0x644/0x850
<4>[  115.089183]  load_elf_binary+0x3b1/0x16b3
<4>[  115.089193]  ? __lock_acquire+0x42c/0x15a0
<4>[  115.089202]  ? search_binary_handler+0x72/0x1e0
<4>[  115.089212]  search_binary_handler+0x7f/0x1e0
<4>[  115.089221]  do_execveat_common.isra.12+0x658/0x950
<4>[  115.089232]  SyS_execve+0x27/0x30
<4>[  115.089240]  do_syscall_64+0x59/0x1a0
<4>[  115.089247]  entry_SYSCALL64_slow_path+0x25/0x25
<4>[  115.089255] RIP: 0033:0x7f2ff59cd767
<4>[  115.089261] RSP: 002b:00007f2fea55b518 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
<4>[  115.089273] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007f2ff59cd767
<4>[  115.089283] RDX: 00007f2fe4003200 RSI: 00007f2fe4007580 RDI: 00007f2fe4007540
<4>[  115.089293] RBP: 00007f2fe4004250 R08: 0000000000000002 R09: 0000000000000000
<4>[  115.089303] R10: 0000000000000008 R11: 0000000000000206 R12: 00007f2fea596348
<4>[  115.089313] R13: 0000000000000000 R14: 00007f2fe4003200 R15: 0000000000000000
<4>[  115.089327] Code: 00 00 ba 02 00 00 00 eb cd 48 8b 85 80 00 00 00 ba 01 00 00 00 eb bf 41 56 41 55 45 89 c6 41 54 55 49 89 f4 53 49 89 d5 48 fc fb <48> 89 2c e8 b4 91 88 00 85 c0 dd 09 80 3d 1d 6a ee 00 75 74 2a 
<1>[  115.089411] RIP: do_error_trap+0x14/0xa0 RSP: ffffc9000039b9c8
<4>[  115.089420] CR2: 00000000818bc06f
<4>[  115.089427] ---[ end trace 25fb74e124e0a858 ]---
<3>[  115.229290] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
<3>[  115.229309] in_atomic(): 0, irqs_disabled(): 1, pid: 1565, name: python3
<4>[  115.229319] INFO: lockdep is turned off.
<4>[  115.229326] irq event stamp: 1412
<4>[  115.229339] hardirqs last  enabled at (1411): [<ffffffff811793a8>] free_unref_page+0x48/0x60
<4>[  115.229354] hardirqs last disabled at (1412): [<ffffffff818bc8f6>] error_entry+0x66/0xc0
<4>[  115.229384] softirqs last  enabled at (1336): [<ffffffff818bf29a>] __do_softirq+0x3aa/0x4de
<4>[  115.229399] softirqs last disabled at (1313): [<ffffffff810804ea>] irq_exit+0xaa/0xc0
<4>[  115.229412] CPU: 0 PID: 1565 Comm: python3 Tainted: G     UD          4.15.0-rc1-CI-CI_DRM_3439+ #1
<4>[  115.229425] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  115.229440] Call Trace:
<4>[  115.229452]  dump_stack+0x5f/0x86
<4>[  115.229461]  ___might_sleep+0x1d9/0x240
<4>[  115.229471]  exit_signals+0x1b/0x2a0
<4>[  115.229480]  do_exit+0x93/0xcc0
<4>[  115.229490]  ? SyS_execve+0x27/0x30
<4>[  115.229498]  rewind_stack_do_exit+0x17/0x20
<7>[  115.791287] [IGT] gem_exec_basic: executing
Comment 27 Marta Löfstedt 2017-12-04 10:02:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3446/fi-glk-dsi/igt@gem_busy@basic-hang-default.html

	

[   40.373746] Setting dangerous option reset - tainting kernel
[   40.381284] Setting dangerous option reset - tainting kernel
[   48.444040] java: Corrupted page table at address 7fabb819e100
[   48.444091] Bad pagetable: 000d [#1] PREEMPT SMP
[   48.444108] Dumping ftrace buffer:
[   48.444119] ---------------------------------
[   48.444221] ksoftirq-29      3..s. 38752448us : execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, seqno=52
...
   48.476354]   <idle>-0       1..s1 40476071us : execlists_submission_tasklet: rcs0 csb[2d]: status=0x00000001:0x00000000
[   48.476380] ---------------------------------
[   48.476392] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii snd_pcm mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[   48.476498] CPU: 1 PID: 1085 Comm: java Tainted: G     U           4.15.0-rc2-CI-CI_DRM_3446+ #1
[   48.476519] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[   48.476545] task: 000000007a9b860d task.stack: 00000000695182e3
[   48.476561] RIP: 0033:0x7fabb8064bd1
[   48.476572] RSP: 002b:00007fab91084cd8 EFLAGS: 00010206
[   48.476587] RAX: 0000000000000000 RBX: 00007fab746c1940 RCX: 0000000000000000
[   48.476604] RDX: 0000000000000010 RSI: 00007fab91084d3f RDI: 000007fab746c993
[   48.476621] RBP: 00007fab74000020 R08: 00007fabb819e100 R09: 0000000000000001
[   48.476638] R10: 0000000000000003 R11: 00007fab91084d30 R12: 0000000000007ff0
[   48.476655] R13: 00007fab746c9930 R14: 0000000000000000 R15: 0000000000003be0
[   48.476673] FS:  00007fab91085700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
[   48.476692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.476707] CR2: 00007fabb819e100 CR3: 000000016ec92000 CR4: 0000000000340ee0
[   48.476726] RIP: 0x7fabb8064bd1 RSP: 00007fab91084cd8
[   48.476741] ---[ end trace 8747c98619fd2a37 ]---
[   48.585697] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
[   48.585729] in_atomic(): 0, irqs_disabled(): 1, pid: 1085, name: java
[   48.585749] INFO: lockdep is turned off.
[   48.585763] irq event stamp: 28302
[   48.585783] hardirqs last  enabled at (28301): [<00000000430c0964>] swapgs_restore_regs_and_return_to_usermode+0x0/0x20
[   48.585816] hardirqs last disabled at (28302): [<00000000d609682f>] error_entry+0x60/0xc0
[   48.585842] softirqs last  enabled at (28300): [<0000000040735435>] __do_softirq+0x3aa/0x4de
[   48.585870] softirqs last disabled at (28277): [<000000007bcb8038>] irq_exit+0xaa/0xc0
[   48.585897] CPU: 1 PID: 1085 Comm: java Tainted: G     UD          4.15.0-rc2-CI-CI_DRM_3446+ #1
[   48.585923] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[   48.585953] Call Trace:
[   48.585973]  dump_stack+0x5f/0x86
[   48.585989]  ___might_sleep+0x1d9/0x240
[   48.586009]  exit_signals+0x1b/0x2a0
[   48.586026]  ? oops_end+0x61/0x80
[   48.586040]  do_exit+0x93/0xcc0
[   48.586054]  ? __do_page_fault+0x3e6/0x560
[   48.586077]  rewind_stack_do_exit+0x17/0x20
[   57.768110] i915 0000:00:02.0: Resetting rcs0 after gpu hang
Comment 28 Marta Löfstedt 2017-12-05 06:49:28 UTC
It starts here:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@gem_exec_flush@basic-wb-rw-before-default.html

then the issue on this subtest hasn't been seen before:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4032/fi-glk-dsi/igt@prime_busy@basic-wait-before-default.html

[  385.598295] ------------[ cut here ]------------
[  385.598302] list_del corruption. next->prev should be 0000000071b3538c, but was 000000000ce0e81a
[  385.598328] WARNING: CPU: 1 PID: 2600 at lib/list_debug.c:56 __list_del_entry_valid+0x8a/0x90
[  385.598331] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
[  385.598398] CPU: 1 PID: 2600 Comm: python3 Tainted: G     U  W        4.15.0-rc2-CI-CI_DRM_3448+ #1
[  385.598401] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  385.598403] task: 000000009cfb4ec2 task.stack: 000000008c8f3191
[  385.598407] RIP: 0010:__list_del_entry_valid+0x8a/0x90
[  385.598409] RSP: 0018:ffffc9000033bad0 EFLAGS: 00010082
[  385.598414] RAX: 0000000000000054 RBX: ffff880173bf56e8 RCX: 0000000000000002
[  385.598417] RDX: 0000000080000002 RSI: ffffffff81ca8e7d RDI: 00000000ffffffff
[  385.598419] RBP: ffffffff81f3d818 R08: 0000000000000000 R09: 0000000000000001
[  385.598421] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea000530ae60
[  385.598424] R13: 0000000000000001 R14: ffffea00058bdfc0 R15: 0000000000000014
[  385.598426] FS:  0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
[  385.598429] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  385.598431] CR2: 00007fa4d2b809e0 CR3: 000000016e65e000 CR4: 0000000000340ee0
[  385.598434] Call Trace:
[  385.598439]  release_pages+0x11f/0x370
[  385.598452]  tlb_flush_mmu_free+0x2c/0x50
[  385.598458]  unmap_page_range+0x794/0x8e0
[  385.598473]  unmap_vmas+0x47/0x90
[  385.598480]  exit_mmap+0xa0/0x170
[  385.598493]  mmput+0x5c/0x120
[  385.598498]  flush_old_exec+0x644/0x850
[  385.598506]  load_elf_binary+0x3b1/0x16b3
[  385.598513]  ? __lock_acquire+0x42c/0x15a0
[  385.598521]  ? search_binary_handler+0x72/0x1e0
[  385.598529]  search_binary_handler+0x7f/0x1e0
[  385.598535]  do_execveat_common.isra.12+0x658/0x950
[  385.598544]  SyS_execve+0x27/0x30
[  385.598549]  do_syscall_64+0x59/0x1a0
[  385.598555]  entry_SYSCALL64_slow_path+0x25/0x25
[  385.598559] RIP: 0033:0x7fa4d2b4b767
[  385.598561] RSP: 002b:00007fa4c76b8138 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[  385.598566] RAX: ffffffffffffffda RBX: 00000000000000a8 RCX: 00007fa4d2b4b767
[  385.598568] RDX: 00007ffc0ced82f8 RSI: 00007fa4c000aa70 RDI: 00007fa4c0012b40
[  385.598571] RBP: 00007fa4c0013040 R08: 0000000000000002 R09: 0000000000000005
[  385.598573] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4c76dcf08
[  385.598575] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005
[  385.598587] Code: ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 28 6b cb 81 e8 1a ef c1 ff 0f ff 31 c0 c3 48 89 fe 48 c7 c7 68 6b cb 81 e8 06 ef c1 ff <0f> ff 31 c0 c3 90 41 57 41 56 41 55 41 54 55 53 48 83 ec 20 9c 
[  385.598722] ---[ end trace a55595b2ba3d54dd ]---
Comment 29 Jani Saarinen 2017-12-05 17:50:34 UTC
Reference https://patchwork.freedesktop.org/series/34623/
Comment 30 Marta Löfstedt 2017-12-11 07:31:41 UTC
Starts with:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html
	
(gem_exec_flush:1756) CRITICAL: Test assertion failure function run, file gem_exec_flush.c:323:
(gem_exec_flush:1756) CRITICAL: Failed assertion: map[i] == i
(gem_exec_flush:1756) CRITICAL: error: 0x8e7c0142 != 0x142
Subtest basic-wb-ro-before-default failed.

then:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-default.html
	
(gem_exec_flush:1761) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482:
(gem_exec_flush:1761) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
Subtest basic-wb-ro-default failed.

then a lot of unexpected skips followed by:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap@basic-small-bo.html
	

[  221.187721] BUG: unable to handle kernel paging request at 000000007276fe22
[  221.187743] IP: __lock_acquire+0xb0/0x15a0
[  221.187753] Oops: 0002 [#1] PREEMPT SMP
[  221.187761] Dumping ftrace buffer:
[  221.187766] ---------------------------------
[  221.187849] CPU:3 [LOST 86041 EVENTS]
               gem_exec-1765    3..s1 200397128us : execlists_submission_tasklet: rcs0 in[0]:  ctx=2.2, seqno=80ab7

followed by Softdog:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3488/fi-glk-dsi/igt@gem_mmap_gtt@basic.html
Comment 32 Marta Löfstedt 2017-12-12 07:10:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3498/fi-glk-dsi/igt@gem_exec_store@basic-vebox.html

result say:
Received signal SIGSEGV.
Stack trace: 
 #0 [fatal_sig_handler+0x12f]
 #1 [killpg+0x40]
 #2 [_IO_file_fopen+0x42]

This is yet another weird thing on this machine.
Comment 33 Chris Wilson 2017-12-13 12:53:01 UTC
One random memcorruption issue fixed:

commit 7d622351c94172a42bfe9b13bdb0fdc2be90ed3b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 13 09:48:02 2017 +0000

    drm/i915/fence: Use rcu to defer freeing of irq_work
    
    It is illegal to perform an immediate free of the struct irq_work from
    inside the irq_work callback (as irq_work_run_list modifies work->flags
    after execution of the work->func()). As we use the irq_work to
    coordinate the freeing of the callback from two different softirq paths,
    we need to defer the kfree from inside our irq_work callback, for which
    we can use kfree_rcu.
    
    Fixes: 81c0ed21aa91 ("drm/i915/fence: Avoid del_timer_sync() from inside a timer")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171213094802.28243-1-chris@chris-wilson.co.uk

Hopefully this explains a lot of weirdness.
Comment 34 Marta Löfstedt 2017-12-14 07:27:52 UTC
It starts with:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html
(gem_exec_flush:1725) igt-aux-CRITICAL: Failed assertion: !"GPU hung"

<7>[  170.959810] [IGT] gem_exec_flush: starting subtest basic-uc-set-default
...
<7>[  177.768550] missed_breadcrumb rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<6>[  180.857713] [drm] GPU HANG: ecode 9:0:0x8fdafffa, in gem_exec_flush [1725], reason: Hang on rcs0, action: reset
<7>[  180.858578] [drm:i915_reset_device [i915]] resetting chip
<5>[  180.858755] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  181.361556] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
<3>[  182.065059] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout

then:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html
incomplete pstore looks legit, but most of it is already in dmesg.

from dmesg:
<7>[  187.664410] [IGT] gem_exec_reloc: exiting, ret=77
<4>[  187.770081] WARNING: can't dereference iret registers at 00000000b08c140c for ip page_fault+0x7/0x30
<0>[  187.770083] BUG: stack guard page was hit at 00000000dd82e48c (stack is 00000000ca52e808..0000000085d6198d)
<4>[  187.770178] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
<0>[  187.770191] Dumping ftrace buffer:
<0>[  187.770199] ---------------------------------
<0>[  187.770281] CPU:3 [LOST 63587 EVENTS]
                  gem_exec-1729    3..s1 173324421us : execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, seqno=5d0bc
...
<4>[  187.801374] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[  187.801505] CPU: 2 PID: 1377 Comm: python3 Tainted: G     U  W        4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  187.801531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  187.801564] RIP: 0010:page_fault+0x7/0x30
<4>[  187.801578] RSP: 0018:ffffc90001d83fa8 EFLAGS: 00010083
<4>[  187.801597] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000000
<4>[  187.801618] RDX: 0000000080000610 RSI: 0000000000000000 RDI: ffffc90001d840f8
<4>[  187.801639] RBP: 0000000080000610 R08: 0000000000000001 R09: 0101010101010101
<4>[  187.801659] R10: ffffc90001d87a90 R11: 0000000000000000 R12: ffffc90001d840f8
<4>[  187.801680] R13: ffff8801733c51c0 R14: 0000000000000001 R15: ffff8801733c51c0
<4>[  187.801702] FS:  00007fac1674e700(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
<4>[  187.801726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  187.801744] CR2: ffffc90001d83f98 CR3: 000000016f353000 CR4: 0000000000340ee0
<4>[  187.801764] Call Trace:
<4>[  187.801782]  ? no_context+0x3dc/0x430
<4>[  187.801800]  ? __do_page_fault+0x196/0x560
...
<1>[  187.804948] RIP: page_fault+0x7/0x30 RSP: ffffc90001d83fa8
<4>[  187.804968] ---[ end trace 7832dee94e24beea ]---
<3>[  188.000284] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
<3>[  188.000315] in_atomic(): 1, irqs_disabled(): 1, pid: 1377, name: python3
<4>[  188.000334] INFO: lockdep is turned off.
<4>[  188.000347] irq event stamp: 1180122
<4>[  188.000367] hardirqs last  enabled at (1180121): [<00000000e846d9d1>] get_page_from_freelist+0x24c/0x14c0
<4>[  188.000395] hardirqs last disabled at (1180122): [<00000000804f94d3>] __slab_alloc.isra.24.constprop.29+0x19/0x70
<4>[  188.000425] softirqs last  enabled at (1179892): [<000000002b075771>] __do_softirq+0x3aa/0x4de
<4>[  188.000451] softirqs last disabled at (1179885): [<00000000a976b967>] irq_exit+0xaa/0xc0
<3>[  188.000473] Preemption disabled at:
<4>[  188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[  188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G     UD W        4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  188.000560] Call Trace:
<4>[  188.000578]  dump_stack+0x5f/0x86
<4>[  188.000593]  ___might_sleep+0x1d9/0x240
then continue in:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log and it is actually a Softdog:

<3>[  188.000473] Preemption disabled at:
<4>[  188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[  188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G     UD W        4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  188.000560] Call Trace:
<4>[  188.000578]  dump_stack+0x5f/0x86
<4>[  188.000593]  ___might_sleep+0x1d9/0x240
<4>[  188.000610]  exit_signals+0x1b/0x2a0
<4>[  188.000624]  do_exit+0x93/0xcc0
<4>[  188.000638]  ? trace_hardirqs_off_caller+0x75/0xd0
<4>[  188.000654]  ? do_syscall_64+0x19/0x1a0
<4>[  188.000671]  rewind_stack_do_exit+0x17/0x20
<6>[  188.000769] note: python3[1377] exited with preempt_count 1
<12>[  277.879085] owatch: TIMEOUT!
<12>[  277.879252] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
Comment 35 Marta Löfstedt 2017-12-15 07:04:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_exec_flush@basic-wb-ro-before-default.html
(gem_exec_flush:1763) igt-aux-CRITICAL: Failed assertion: !"GPU hung"

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_wait@basic-await-all.html
dmesg:	
[  245.788590] general protection fault: 0000 [#1] PREEMPT SMP
[  245.788604] Dumping ftrace buffer:
[  245.788610] ---------------------------------
[  245.788704] CPU:0 [LOST 73316 EVENTS]
               gem_exec-1768    0..s1 202889461us : execlists_submission_tasklet: rcs0 in[0]:  ctx=4.2, seqno=80f1a
...
then a bunch of backtraces that are repeated in the pstore from:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3521/fi-glk-dsi/igt@gem_workarounds@basic-read.html
incomplete
Comment 36 Marta Löfstedt 2018-01-08 13:52:10 UTC
ON CI_DRM_3606:
First test with dmesg warn:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3606/fi-glk-dsi/igt@gem_exec_flush@basic-batch-kernel-default-uc.html
then a lot of the following following igt@gem_exec_* tests are hit with dmesg-warn. 

It started here:

<4>[  120.453872] WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:3459 schedule_idle+0x2c/0x30
<4>[  120.453878] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[  120.453969] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G     U           4.15.0-rc7-CI-CI_DRM_3606+ #1
<4>[  120.453973] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  120.453979] RIP: 0010:schedule_idle+0x2c/0x30
<4>[  120.453983] RSP: 0018:ffffc900000d3ee8 EFLAGS: 00010286
<4>[  120.453991] RAX: ee000000fe000000 RBX: 0000000000000003 RCX: 0000000000000001
<4>[  120.453995] RDX: 0000000000000000 RSI: ffffffff820ab24f RDI: ffffffff820b8d9d
<4>[  120.453998] RBP: ffff88017a942740 R08: 0000000000000000 R09: 0000000000000001
<4>[  120.454002] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8233caf0
<4>[  120.454006] R13: ffff88017a942740 R14: ffff88017fdab550 R15: ffffffff82293980
<4>[  120.454010] FS:  0000000000000000(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000
<4>[  120.454014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  120.454018] CR2: 00007f8fab777000 CR3: 000000016f6fc000 CR4: 0000000000340ee0
<4>[  120.454022] Call Trace:
<4>[  120.454031]  do_idle+0x14b/0x1d0
<4>[  120.454042]  cpu_startup_entry+0x14/0x20
<4>[  120.454049]  start_secondary+0x129/0x160
<4>[  120.454057]  secondary_startup_64+0xa5/0xb0
<4>[  120.454076] Code: 48 8b 04 25 80 4e 01 00 53 48 8b 40 08 48 85 c0 75 19 65 48 8b 1c 25 80 4e 01 00 31 ff e8 cd f0 ff ff 48 8b 03 a8 08 75 f2 5b c3 <0f> ff eb e3 bf 01 00 00 00 e8 e6 e8 7e ff e8 a1 fb ff ff bf 01 

Then there is ~170Mb of various WARNs from: kernel/locking/lockdep.c
Comment 37 Marta Löfstedt 2018-02-08 14:19:36 UTC
*** Bug 103615 has been marked as a duplicate of this bug. ***
Comment 38 Marta Löfstedt 2018-03-16 09:37:49 UTC
Last seen: CI_DRM_3783: 2018-02-16


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.