Created attachment 128014 [details] Dmesg Test setup: - KBL-U QL9J (haven't seen this yet on other platforms) - Fairly up to date Ubuntu 16.04 with DRI3 & Unity desktop - Latest kernel and rest of 3D stack within a few weeks kernel git://anongit.freedesktop.org/drm-intel at 04145fe15cf8c81c221e62fc9d65d93053f9bd1a 2016-11-15_14-49-57 Test-case: - Boot - Run Unigine, GLBenchmark 2.7, GfxBench 4.0, SynMark 7.0 benchmarks several times Expected outcome: - Everything works fine Actual outcome: - After SynMark CSDof (spilling compute shader test), rest of tests fail to: intel_do_flush_locked failed: Input/output error - After 3D tests have been stopped and few minutes have been waited, device idle power usage is still very high (3x normal) Logs show that when device is idling afterwards: - Package & cores are in lower power states as expected - GPU frequency is still at max (allowed by TDP), with 0% in RC6* - compiz is 100% in (GPU?) IOWAIT - Unlike in normal situations, powertop shows: ------------------- Usage;Wakeups/s;GPU ops/s;Disk IO/s;GFX Wakeups/s;Category;Description 77,9 ms/s;;;;;kWork;i915_hangcheck_elapsed ------------------- Same issue happens with yesterday night version of X server, Intel DDX, Mesa (which should fix one issue with spilling) and few week older versions of them. Dmesg attached. Earlier GEN bug 92774 seems to have had similar issue.
They are all secondary effects to the GPU not resetting.
That device has succeeded in running the full test set until end only twice before this, in early September and 27th of October (latter had same X, Intel DDX and Mesa as the version which has this extra symptoms). On both of these cases, there's been hang with CSDof and GPU reset fail. However, there were no repeated hang resets and GPU was completely idle after the tests had finished.
Eero, can you also attached the error dump?
Eero, can you have a try with Chris' patch: https://patchwork.freedesktop.org/series/15471/ ?
Created attachment 128054 [details] Last error state from build where there were no repeated resets
Created attachment 128055 [details] Last error state from build with the repeated resets
Created attachment 128056 [details] Last error state from build with the repeated resets, using newer mesa git This one uses: - kernel: 04145fe15cf8c81c221e62fc9d65d93053f9bd1a - mesa: 341fc0073a3c05fd43e9c7a33613bcb881f25f33
(In reply to yann from comment #4) > Eero, can you have a try with Chris' patch: > https://patchwork.freedesktop.org/series/15471/ ? Didn't help. Still does recurring hangs after test-case stops. Valtteri came up with test-case that triggers the issue within few minutes: ------ hang.sh -------- #!/bin/sh for i in $(seq $1); do ./synmark2 OglBatch0 & sleep 2 killall synmark2 done ----------------------- $ ./hang.sh 100 ----------------------- (Mika's now looking into issue.)
This does not appear on other gt3 boxes? I suggest we close this and reopen if it does. Eero?
(In reply to Mika Kuoppala from comment #9) > This does not appear on other gt3 boxes? I suggest we close this and reopen > if it does. Eero? If you refer to reset request timeouts or higher power usage due to GPU reset failing completely, I haven't seen those on any HW in last couple of weeks. But we don't anymore have the KBL-U QL9J machine in regular testing. (There have been system hangs on the same CarChase offscreen tests on SKL GT2 & BXT, but I guess that's a different issue.)
Something similar may now be happening on SKL GT2, since yesterday. After GFXBench CarChase tests (which often GPU hangs) all tests fail. However, I don't have logs as Jenkins timeouts the test-run, and reboots to another test-run.
I haven't seen reset request timeout errors this year, so I think this can be closed. BXT J4205 had higher power consumption after all tests had been run (and CarChase offscreen had hanged as earlier) on 3 days around May 7th, but no reset timeouts, so it's different issue. Didn't see anything similar on other devices on last 2 months (or when using newer Mesa that doesn't anymore trigger the hangs so frequently).
Haven't seem this in a long time, so marking it as fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.