Created attachment 135859 [details] Test debug information and dmesg errors In SKL, with test igt@gem_exec_suspend@basic-s3, we're getting this fail sometimes when running daily fast-feedback. Looking at history of error in dmesg, I believe the test is being affected by previous tests. Seems a lot like error on bug 103049. Can't replicate it running the test manually. Assertion: (gem_exec_suspend:7270) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:7270) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S3 failed. Output: IGT-Version: 1.20-g476c4b4 (x86_64) (Linux: 4.15.0-rc1-drm-intel-qa-ww48-commit-807db75+ x86_64) [cmd] rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Dec 1 09:03:00 2017 Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x5] #4 [drmIoctl+0x28] #5 [gem_wait+0x87] #6 [gem_sync+0x11] #7 [gem_quiescent_gpu+0xd2] #8 [run_test+0x43b] #9 [__real_main243+0x2a5] #10 [main+0x23] #11 [__libc_start_main+0xf1] #12 [_start+0x29] #13 [<unknown>+0x29] Subtest basic-S3: FAIL (13.363s)
Attach the full dmesg please.
Created attachment 135956 [details] dmesg_log-basic-suspend-s3 Haven't been able to reproduce yet. Attaching dmesg from the system with a round of this test.
Adding KBL, I'll try to get logs for this platform later. Both S3 and S4 failed. Stdout IGT-Version: 1.20-g4112f30 (x86_64) (Linux: 4.15.0-rc3-drm-intel-qa-ww50-commit-8fa442b+ x86_64) Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x5] #4 [drmIoctl+0x28] #5 [gem_wait+0x87] #6 [gem_sync+0x11] #7 [gem_quiescent_gpu+0xd2] #8 [run_test+0x43b] #9 [__real_main243+0x13c] #10 [main+0x23] #11 [__libc_start_main+0xf1] #12 [_start+0x29] #13 [<unknown>+0x29] Subtest basic-S4-devices: FAIL (17.234s) Stderr (gem_exec_suspend:28815) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:28815) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed. **** DEBUG **** (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, 0) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, 0) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_has_ring(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: gem_can_store_dword(fd, engine) (gem_exec_suspend:28815) DEBUG: Test requirement passed: nengine (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_exec_suspend:28815) ioctl-wrappers-DEBUG: Test requirement passed: __gem_set_caching(fd, handle, caching) == 0 (gem_exec_suspend:28815) DEBUG: Test requirement passed: __gem_execbuf(fd, &execbuf) == 0 (gem_exec_suspend:28815) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_exec_suspend:28815) igt-aux-DEBUG: Test requirement passed: (power_dir = open("/sys/power", O_RDONLY)) >= 0 (gem_exec_suspend:28815) igt-aux-DEBUG: Test requirement passed: get_supported_suspend_states(power_dir) & (1 << state) (gem_exec_suspend:28815) igt-aux-DEBUG: Test requirement passed: test == SUSPEND_TEST_NONE || faccessat(power_dir, "pm_test", R_OK | W_OK, 0) == 0 (gem_exec_suspend:28815) DEBUG: Verifying result (gem_exec_suspend:28815) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:28815) igt-aux-CRITICAL: Failed assertion: !"GPU hung" (gem_exec_suspend:28815) igt-core-INFO: Stack trace: (gem_exec_suspend:28815) igt-core-INFO: #0 [__igt_fail_assert+0x101] (gem_exec_suspend:28815) igt-core-INFO: #1 [sig_abort+0x3a] (gem_exec_suspend:28815) igt-core-INFO: #2 [killpg+0x40] (gem_exec_suspend:28815) igt-core-INFO: #3 [ioctl+0x5] (gem_exec_suspend:28815) igt-core-INFO: #4 [drmIoctl+0x28] (gem_exec_suspend:28815) igt-core-INFO: #5 [gem_wait+0x87] (gem_exec_suspend:28815) igt-core-INFO: #6 [gem_sync+0x11] (gem_exec_suspend:28815) igt-core-INFO: #7 [gem_quiescent_gpu+0xd2] (gem_exec_suspend:28815) igt-core-INFO: #8 [run_test+0x43b] (gem_exec_suspend:28815) igt-core-INFO: #9 [__real_main243+0x13c] (gem_exec_suspend:28815) igt-core-INFO: #10 [main+0x23] (gem_exec_suspend:28815) igt-core-INFO: #11 [__libc_start_main+0xf1] (gem_exec_suspend:28815) igt-core-INFO: #12 [_start+0x29] (gem_exec_suspend:28815) igt-core-INFO: #13 [<unknown>+0x29] **** END ****
On SKL igt@gem_exec_suspend@basic-s3 failed with !GPU Hung assertion with today's commit 8fa442b.
On KBL, igt@gem_exec_suspend@basic-s3 igt@gem_exec_suspend@basic-s4-devices kern log from this cycle: https://bugs.freedesktop.org/attachment.cgi?id=136328 dmesg after PM hybernate: [ 83.805593] [drm] GPU HANG: ecode 9:0:0xfffffffe, in gem_exec_suspen [16810], reason: Hang on rcs0, action: reset IGT-Version: 1.20-gd3bcc7d (x86_64) (Linux: 4.15.0-rc4-drm-intel-qa-ww51-commit-b480e79+ x86_64) Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x5] #4 [drmIoctl+0x28] #5 [__gem_execbuf+0x15] #6 [gem_execbuf+0x9] #7 [run_test+0x2d1] #8 [test_all+0x69] #9 [run_test+0x44e] #10 [__real_main243+0x13c] #11 [main+0x23] #12 [__libc_start_main+0xf1] #13 [_start+0x29] #14 [<unknown>+0x29] Subtest basic-S4-devices: FAIL (15.837s) Stderr (gem_exec_suspend:12272) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:12272) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
So still using unsafe module options. What happens if you don't force enable the guc?
Created attachment 136437 [details] kern_log_no-guc_kbl No fails without guc/huc. So, guilty guc??
Still getting it with FWs, what should be next step for this? KBL, igt@gem_exec_suspend@basic-s3, ! "GPU" Hung, commit-cb4a985+ KBL, igt@gem_exec_suspend@basic-s4-devices,! "GPU" Hung, commit-cb4a985+
And SKL, igt@gem_exec_suspend@basic-s3, ! "GPU" Hung, commit-eb3dae3+
Today's: KBL, igt@gem_exec_suspend@basic-s3, ! "GPU" Hung, commit-914d61a KBL, igt@gem_exec_suspend@basic-s4-devices,! "GPU" Hung, commit-914d61a SKL, igt@gem_exec_suspend@basic-s3, ! "GPU" Hung, commit-914d61a
KBL, igt@gem_exec_suspend@basic-s3,commit-d373aa7, commit-11030d7 KBL, igt@gem_exec_suspend@basic-s4-devices, commit-d373aa7, commit-11030d7
SKL, igt@gem_exec_suspend@basic-s3,commit-d373aa7
KBL, igt@gem_exec_suspend@basic-s3, commit-a0ca279+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-a0ca279+ SKL, igt@gem_exec_suspend@basic-s3, commit-a0ca279+ (gem_exec_suspend:19253) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:19253) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
KBL, igt@gem_exec_suspend@basic-s3, commit-8748fd9+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-8748fd9+
SKL, igt@gem_exec_suspend@basic-s3, commit-59275f1+
KBL, igt@gem_exec_suspend@basic-s3, commit-e1b21c1+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-e1b21c1+
Today's FF: KBL, igt@gem_exec_suspend@basic-s4-devices, commit-6c10ba2+ (gem_exec_suspend:21109) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:21109) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
KBL, igt@gem_exec_suspend@basic-s3, commit-ec41124+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-ec41124+
the following test is failing on KBL igt@gem_exec_suspend@basic-s4 kernel version : 4.16.0-rc1-drm-intel-qa-ww8-commit-67f1480+ libdrm : 2.4.90 intel-gpu-tools (tag) : intel-gpu-tools-1.21-112-gdd61508a intel-gpu-tools (commit) : dd61508a Firmware ====================================== dmc fw loaded : yes dmc version : 1.4 guc fw loaded : fetch SUCCESS, load SUCCESS guc version wanted : wanted 9.39, found 9.39 guc version found : wanted 9.39, found 9.39 --------------------------------------- (gem_exec_suspend:8502) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:482: (gem_exec_suspend:8502) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
With today's commit also failed with !GPU Hung: KBL, igt@gem_exec_suspend@basic-s3, commit-01a067a+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-01a067a+
With commit-9a02ae1 still are failing: igt@gem_exec_suspend@basic-s3 igt@gem_exec_suspend@basic-s4-devices (gem_exec_suspend:18959) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:18959) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S3 failed.
Keeping track, still failed: KBL, igt@gem_exec_suspend@basic-s3, commit-b2e10fd+ KBL, igt@gem_exec_suspend@basic-s4-devices, commit-b2e10fd+
With IGT-Version: 1.21-gc2af514, kernel 4.16.0-rc3 commit-995edb2+ igt@gem_exec_suspend@basic-S4-devices (gem_exec_suspend:19609) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:19609) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
Last seen with test igt@gem_exec_suspend@basic-s3 was with kernel 4.16.0-rc4-commit-6c6e100, a whole week without failure now. Meanwhile test igt@gem_exec_suspend@basic-s4-devices still keeps failing with same assertion: IGT-Version: 1.22-g1bb3995, kernel 4.16.0-rc5-commit-307515c (gem_exec_suspend:19825) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:19825) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
Also removing SKL since issue in that platform hasn't appeared again in more than a month.
With today's FF: igt@gem_exec_suspend@basic-s4-devices IGT-Version: 1.22-gb09e979 (x86_64) (Linux: 4.16.0-rc6-drm-intel-qa-ww12-commit-141def2+ x86_64) (gem_exec_suspend:19843) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:19843) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
And is back: igt@gem_exec_suspend@basic-s3 Out IGT-Version: 1.22-g94e8862 (x86_64) (Linux: 4.16.0-rc6-drm-intel-qa-ww12-commit-9d737ce+ x86_64) [cmd] rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Mar 1 08:36:46 2018 Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x5] #4 [drmIoctl+0x28] #5 [__gem_execbuf+0x15] #6 [gem_execbuf+0x9] #7 [run_test+0x2d1] #8 [test_all+0x65] #9 [run_test+0x44e] #10 [__real_main231+0x2a5] #11 [main+0x23] #12 [__libc_start_main+0xf1] #13 [_start+0x29] #14 [<unknown>+0x29] Subtest basic-S3: FAIL (9.847s) Err (gem_exec_suspend:19489) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:19489) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S3 failed. Same assertion on igt@gem_exec_suspend@basic-s4-devices.
With commit-0110d63: igt@gem_exec_suspend@basic-s3 igt@gem_exec_suspend@basic-s4-devices (gem_exec_suspend:20414) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:20414) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Both test still fail: igt@gem_exec_suspend@basic-s3 igt@gem_exec_suspend@basic-s4-devices Results for igt@gem_exec_suspend@basic-s4-devices Result: fail Out IGT-Version: 1.21-ge3a0ed9 (x86_64) (Linux: 4.16.0-rc7-drm-intel-qa-ww14-commit-c46052c+ x86_64) Stack trace: #0 [__igt_fail_assert+0x101] #1 [sig_abort+0x3a] #2 [killpg+0x40] #3 [ioctl+0x7] #4 [drmIoctl+0x28] #5 [__gem_execbuf+0x15] #6 [gem_execbuf+0x9] #7 [run_test+0x2d1] #8 [test_all+0x65] #9 [run_test+0x44e] #10 [__real_main231+0x13c] #11 [main+0x23] #12 [__libc_start_main+0xf1] #13 [_start+0x29] #14 [<unknown>+0x29] Subtest basic-S4-devices: FAIL (14.753s) Err (gem_exec_suspend:20507) igt-aux-CRITICAL: Test assertion failure function sig_abort, file igt_aux.c:481: (gem_exec_suspend:20507) igt-aux-CRITICAL: Failed assertion: !"GPU hung" Subtest basic-S4-devices failed.
OK, thanks.
Joonas, do you know status here?
ping
As we don't these systems we can not reproduce this issue. Closing this bug.
We have seen this in BAT at least once since enabling guc on a skl system. I'm sure CIBugLog will tell us when it happens again.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.