On CI_DRM_3011, the machine fi-glk-2a hits the following issue when running igt@gem_exec_suspend@basic-s3: [ 320.240058] WARN_ON(wait_for_engine(engine, 50)) [ 320.240117] ------------[ cut here ]------------ [ 320.240163] WARNING: CPU: 0 PID: 3144 at drivers/gpu/drm/i915/i915_gem.c:3385 i915_gem_wait_for_idle+0x19d/0x200 [i915] [ 320.240166] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [ 320.240225] CPU: 0 PID: 3144 Comm: kworker/u8:15 Tainted: G U 4.13.0-rc6-CI-CI_DRM_3011+ #1 [ 320.240229] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0045.B51.1704281422 04/28/2017 [ 320.240236] Workqueue: events_unbound async_run_entry_fn [ 320.240241] task: ffff880174ff2780 task.stack: ffffc90000780000 [ 320.240282] RIP: 0010:i915_gem_wait_for_idle+0x19d/0x200 [i915] [ 320.240285] RSP: 0018:ffffc90000783c40 EFLAGS: 00010286 [ 320.240290] RAX: 0000000000000024 RBX: fffffffffffffffe RCX: 0000000000000006 [ 320.240293] RDX: 0000000000000006 RSI: ffffffff81cf74e4 RDI: ffffffff81cae38e [ 320.240296] RBP: ffffc90000783c70 R08: 0000000000000000 R09: 0000000000000001 [ 320.240299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175e90008 [ 320.240301] R13: ffff880168ad0000 R14: 0000000100004f0e R15: ffff880168ad4350 [ 320.240305] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 [ 320.240308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 320.240310] CR2: 00007f5324877218 CR3: 0000000003e0f000 CR4: 00000000003406f0 [ 320.240313] Call Trace: [ 320.240358] i915_gem_suspend+0x45/0x140 [i915] [ 320.240395] i915_pm_suspend+0x86/0x1a0 [i915] [ 320.240403] pci_pm_suspend+0x78/0x140 [ 320.240411] dpm_run_callback+0x6f/0x310 [ 320.240415] ? pci_pm_resume+0xa0/0xa0 [ 320.240421] __device_suspend+0x102/0x380 [ 320.240427] ? dpm_watchdog_set+0x70/0x70 [ 320.240435] async_suspend+0x1f/0xa0 [ 320.240440] async_run_entry_fn+0x38/0x160 [ 320.240446] process_one_work+0x224/0x650 [ 320.240454] worker_thread+0x4e/0x3b0 [ 320.240462] kthread+0x114/0x150 [ 320.240465] ? process_one_work+0x650/0x650 [ 320.240469] ? kthread_create_on_node+0x40/0x40 [ 320.240475] ret_from_fork+0x27/0x40 [ 320.240486] Code: d0 0f 85 2e ff ff ff 48 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c6 f0 9d 22 a0 48 c7 c7 68 7d 21 a0 e8 e4 f3 fb e0 <0f> ff 31 d2 4c 89 ee 48 c7 c7 90 da 12 a0 e8 50 92 00 e1 48 83 [ 320.240641] ---[ end trace 4fed512b7c104387 ]--- This issue then prevented the machine to suspend to RAM. Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3011/fi-glk-2a/igt@gem_exec_suspend@basic-s3.html
See https://patchwork.freedesktop.org/series/29387/
This should fix the suspend failure commit cad9946c2a4375386062131858881cfd30fc1b8f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 26 12:09:33 2017 +0100 drm/i915: Always sanity check engine state upon idling When we do a locked idle we know that afterwards all requests have been completed and the engines have been cleared of tasks. For whatever reason, this doesn't always happen and we may go into a suspend with ELSP still full, and this causes an issue upon resume as we get very, very confused. If the engines refuse to idle, mark the device as wedged. In the process we get rid of the maybe unused open-coded version of wait_for_engines reported by Nick Desaulniers and Matthias Kaehlcke. v2: Suppress the -EIO before suspend, but keep it for seqno wrap. but leaves the underlying issue unresolved. FAIL -> WARN.
Moving high as being sporadic.
This issue was filed against a machine that is no longer in BAT. The issue has never been reproduced on the current GLK machine in BAT.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.