Bug 87184 - [BYT/BSW]igt/gem_concurrent_blit kms_flip and gem_reset_stats sporadically causes *ERROR* Timed out: waiting for Render to ack
Summary: [BYT/BSW]igt/gem_concurrent_blit kms_flip and gem_reset_stats sporadically ca...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-10 07:00 UTC by lu hua
Modified: 2017-08-15 06:55 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg(kms_flip) (124.86 KB, text/plain)
2014-12-26 05:38 UTC, lu hua
no flags Details

Description lu hua 2014-12-10 07:00:39 UTC
==System Environment==
--------------------------
Regression: not sure

Non-working platforms: BSW

==kernel==
--------------------------
drm-intel-nightly/34d267c2ba9c0845432baf959a2c4deed87f3ee4

==Bug detailed description==
-----------------------------
When run automation, it sporadically causes *ERROR* Timed out: waiting for Render to ack. I am unable to reproduce it manually.
It happens on different subcase if run mutiple cycles.

log:
@test: Intel_gpu_tools/igt_gem_concurrent_blit_gpu-bcs-early-read-forked
info: @@@Returncode: 0

test case start at: Sat Jan  6 23:19:38 2001
test case end at:   Sat Jan  6 23:19:51 2001

Errors:


Dmesg:
<3>[  895.532394] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.


Output:
             command   pid dev master a   uid      magic
Test Environment check: Succeeded.
[1/1] dmesg-warn: 1 |
[1/1] dmesg-warn: 1 /


Thank you for running Piglit!
Results have been written to /GFX/Test/Piglit/piglit/t
{
    "results_version": 2,
    "uname": "Linux x-bsw01 3.18.0_drm-intel-nightly_34d267_20141209+ #2416 SMP Tue Dec 9 11:25:02 CST 2014 x86_64 x86_64 x86_64 GNU/Linux\n",
    "time_elapsed": 7.836741924285889,
    "tests": {
        "igt/gem_concurrent_blit/gpu-bcs-early-read-forked": {
            "dmesg": "[  895.532394] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.",
            "returncode": 0,
            "err": "",
            "environment": "PIGLIT_SOURCE_DIR=\"/GFX/Test/Piglit/piglit\" PIGLIT_PLATFORM=\"mixed_glx_egl\"",
            "command": "/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests/gem_concurrent_blit --run-subtest gpu-bcs-early-read-forked",
            "result": "dmesg-warn",
            "time": 7.61755907535553,
            "out": "IGT-Version: 1.8-gf333981 (x86_64) (Linux: 3.18.0_drm-intel-nightly_34d267_20141209+ x86_64)\nusing 2x512 buffers, each 1MiB\nSubtest gpu-bcs-early-read-forked: SUCCESS (7.422s)\n"
        }
    },
    "name": "t",
    "lspci": "00:00.0 Host bridge: Intel Corporation Device 2280 (rev 15)\n00:02.0 VGA compatible controller: Intel Corporation Device 22b0 (rev 15)\n00:03.0 Multimedia controller: Intel Corporation Device 22b8 (rev 15)\n00:0b.0 Signal processing controller: Intel Corporation Device 22dc (rev 15)\n00:13.0 SATA controller: Intel Corporation Device 22a3 (rev 15)\n00:14.0 USB controller: Intel Corporation Device 22b5 (rev 15)\n00:1a.0 Encryption controller: Intel Corporation Device 2298 (rev 15)\n00:1b.0 Audio device: Intel Corporation Device 2284 (rev 15)\n00:1c.0 PCI bridge: Intel Corporation Device 22c8 (rev 15)\n00:1c.1 PCI bridge: Intel Corporation Device 22ca (rev 15)\n00:1f.0 ISA bridge: Intel Corporation Device 229c (rev 15)\n00:1f.3 SMBus: Intel Corporation Device 2292 (rev 15)\n02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 2b)\n",
    "options": {
        "profile": [
            "tests/igt.py"
        ],
        "dmesg": false,
        "execute": true,
        "log_level": "quiet",
        "concurrent": "some",
        "valgrind": false,
        "sync": false,
        "filter": [
            "igt/gem_concurrent_blit/gpu-bcs-early-read-forked$"
        ],
        "platform": "mixed_glx_egl",
        "exclude_tests": [],
        "env": {
            "PIGLIT_SOURCE_DIR": "/GFX/Test/Piglit/piglit",
            "PIGLIT_PLATFORM": "mixed_glx_egl"
        },
        "exclude_filter": []
    }
}
returncode: 0
result: dmesg-warn
summary: Intel_gpu_tools/igt_gem_concurrent_blit_gpu-bcs-early-read-forked    DMESG_WARN    

Reproduce steps:
-------------------------
1.  run all igt case
Comment 1 lu hua 2014-12-22 06:53:08 UTC
BYT also has this error. When run automation testing, drv_hangman and gem_reset_stats also have this error. Run more than 5 cycles, I am unable to reproduce it.

@test: Intel_gpu_tools/igt_drv_hangman_error-state-capture-bsd
returncode: 0
info: @@@Returncode: 0

test case start at: Sat Dec 20 00:13:01 2014
test case end at:   Sat Dec 20 00:13:21 2014

Errors:


Dmesg:
<6>[   96.817503] [drm] stuck on bsd ring
<6>[   96.824051] [drm] GPU HANG: ecode 7:1:0xfffffffe, in drv_hangman [5367], reason: Ring hung, action: reset
<6>[   96.824060] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[   96.824063] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[   96.824065] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[   96.824068] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[   96.824070] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<3>[   96.824152] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<6>[   96.826195] [drm] Simulated gpu hang, resetting stop_rings
<5>[   96.826200] drm/i915: Resetting chip after gpu hang

@test: Intel_gpu_tools/igt_gem_reset_stats_ban-bsd
returncode: 0
info: @@@Returncode: 0

test case start at: Sat Dec 20 05:10:18 2014
test case end at:   Sat Dec 20 05:10:35 2014

Errors:


Dmesg:
<6>[  754.357515] [drm] stuck on bsd ring
<6>[  754.364622] [drm] GPU HANG: ecode 7:1:0x277fffff, in gem_reset_stats [17204], reason: Ring hung, action: reset
<6>[  754.364638] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6>[  754.364641] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6>[  754.364643] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6>[  754.364646] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6>[  754.364648] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<3>[  754.364730] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<6>[  754.367345] [drm] Simulated gpu hang, resetting stop_rings
<5>[  754.367350] drm/i915: Resetting chip after gpu hang
<6>[  760.362407] [drm] stuck on bsd ring
<6>[  760.368876] [drm] GPU HANG: ecode 7:1:0x277fffff, in gem_reset_stats [17204], reason: Ring hung, action: reset
<6>[  760.371375] [drm] Simulated gpu hang, resetting stop_rings
<5>[  760.371380] drm/i915: Resetting chip after gpu hang
Comment 2 lu hua 2014-12-26 05:38:12 UTC
Created attachment 111355 [details]
dmesg(kms_flip)

Many kms_flip subcases also have this issue.
Run ./kms_flip --run-subtest flip-vs-panning-vs-hang, it fails 1 in 2 runs.
root@x-bsw01:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./kms_flip --run-subtest flip-vs-panning-vs-hang
IGT-Version: 1.9-geb799b2 (x86_64) (Linux: 3.18.0-rc7_drm-intel-next-queued_140fd3_20141226+ x86_64)
Using monotonic timestamps
Beginning flip-vs-panning-vs-hang on crtc 8, connector 29
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
....
flip-vs-panning-vs-hang on crtc 8, connector 29: PASSED

Beginning flip-vs-panning-vs-hang on crtc 13, connector 29
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
....
flip-vs-panning-vs-hang on crtc 13, connector 29: PASSED

Subtest flip-vs-panning-vs-hang: SUCCESS (49.886s)
root@x-bsw01:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# dmesg -r | egrep "<[1-4]>" |grep drm
<3>[  250.789840] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<3>[  268.790830] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
Comment 3 lu hua 2015-01-15 07:20:39 UTC
Test on the latest -nightly kernel, It still has this issue.

root@x-bsw08:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./kms_flip --run-subtest flip-vs-modeset-vs-hang
IGT-Version: 1.9-g5fb26d1 (x86_64) (Linux: 3.19.0-rc3_drm-intel-nightly_0056b6_20150109+ x86_64)
Using monotonic timestamps
Beginning flip-vs-modeset-vs-hang on crtc 19, connector 40
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
...
flip-vs-modeset-vs-hang on crtc 19, connector 40: PASSED

Beginning flip-vs-modeset-vs-hang on crtc 24, connector 40
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
...
flip-vs-modeset-vs-hang on crtc 24, connector 40: PASSED

Subtest flip-vs-modeset-vs-hang: SUCCESS (36.625s)
root@x-bsw08:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# dmesg -r|egrep "<[1-4]>"|grep drm
<3>[  512.803434] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
Comment 4 lu hua 2015-01-15 07:30:26 UTC
Test gem_reloc_vs_gpu on BYT, It also has this error(bug 88358 track fail).
gem_reloc_vs_gpu/forked-faulting-reloc-hang
igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-faulting-reloc-thrashing-hang
igt/gem_reloc_vs_gpu/forked-hang
igt/gem_reloc_vs_gpu/forked-interruptible-faulting-reloc-thrashing-hang
igt/gem_reloc_vs_gpu/forked-thrash-inactive-hang
igt/gem_reloc_vs_gpu/forked-thrashing-hang

root@x-bytm02:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-hang
IGT-Version: 1.9-g5fb26d1 (x86_64) (Linux: 3.19.0-rc4_drm-intel-nightly_95cce4_20150115+ x86_64)
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
Test assertion failure function do_test, file gem_reloc_vs_gpu.c:240:
Failed assertion: test == 0xdeadbeef
mismatch in buffer 0: 0x00000000 instead of 0xdeadbeef
child 12 failed with exit status 99
Subtest forked-faulting-reloc-hang: FAIL (104.062s)
root@x-bytm02:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# dmesg -r|egrep "<[1-4]>"|grep drm
<3>[ 7186.608528] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<3>[ 7198.618182] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<3>[ 7204.622584] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<3>[ 7210.619451] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
<3>[ 7246.660833] [drm:__vlv_force_wake_get [i915]] *ERROR* Timed out: waiting for Render to ack.
Comment 5 Rodrigo Vivi 2015-01-21 02:27:49 UTC
Does it also affect BDW?
What happens with ppgtt disabled?
Why is it critical? What is customer impact here?
Comment 6 lu hua 2015-01-21 08:46:19 UTC
(In reply to Rodrigo Vivi from comment #5)
> Does it also affect BDW?
Run ./kms_flip --run-subtest flip-vs-panning-vs-hang 5 cycles on BDW, it doesn't have this error.

> What happens with ppgtt disabled?
I will give it a try.

> Why is it critical? What is customer impact here?
More than 200 cases have this error on BYT/BSW. gem_reset_stats has 53 subcases kms_flip has 80+ subcase, gem_concurrent_blit has 108 subcases.  
And the result is unstable, It will interfere with result check and prts bisect.
We hope we could focus on real regression and new case's fail, So disabled these cases. If these unstable issue could be fixed in time, it's valuable. Do you think so?
Comment 7 lu hua 2015-01-22 07:33:40 UTC
I test on the latest -nightly kernel. Run ./kms_flip --run-subtest flip-vs-panning-vs-hang ./gem_reloc_vs_gpu --run-subtest forked-faulting-reloc-hang, the error is unable to reproduce. Due to it sporadically fail, I will double check it. If fixed, I will close it.
Comment 8 lu hua 2015-01-29 02:34:17 UTC
Run these cases on BYT and BSW, this error goes away. Close it.
kms_flip has *ERROR* The master control interrupt lied (PM)! on BSW and tracked in bug 87347.
Comment 9 lu hua 2015-01-29 02:34:42 UTC
Verified.Fixed.
Comment 10 Jari Tahvanainen 2017-08-15 06:55:48 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.