System Environment: -------------------------- Arch: x86_64 Platform: GM45 Kernel: drm-intel-next-queued cb8b2a30b32cde5ac9053d399d084c487598976a Bug detailed description: ------------------------- It happens on GM45 with drm-intel-next-queued kernel, It works well on drm-intel-fixes kernel. Many igt cases will fail after run ZZ_hangman. It caused by igt commit. Bisect shows: 1cb4f90946289457c3b92773f2ce96b0b03e4a22 is the first bad commit commit 1cb4f90946289457c3b92773f2ce96b0b03e4a22 Author: Imre Deak <imre.deak@intel.com> AuthorDate: Tue May 28 17:35:32 2013 +0300 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Tue May 28 18:32:32 2013 +0200 tests/lib: make sure the GPU is idle at test start and exit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64270 v2: - Make sure also that the GPU is idle at start and error exit of any test using drm_open_any(). (Daniel) v3: - actually call gem_quiescent_gpu() at exit Signed-off-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> output: rings stopped gem_set_domain:467 failed, ret=-1, errno=5 ./ZZ_hangman: line 30: 4247 Aborted (core dumped) $SOURCE_DIR/gem_exec_big gpu hang correctly dectected dmesg: [ 120.374100] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 120.376368] [drm:i915_driver_open], [ 120.376383] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 120.376389] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 120.376392] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 120.376394] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 120.376400] [drm:i915_driver_open], [ 126.708148] [drm:i915_hangcheck_elapsed] *ERROR* render ring: stuck on addr 0xbac8 [ 126.708224] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 126.711675] [drm:i915_error_work_func], resetting chip [ 126.711720] [drm] Simulated gpu hang, resetting stop_rings [ 126.711765] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 126.711768] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 126.711825] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 132.704157] [drm:i915_hangcheck_elapsed] *ERROR* bsd ring: stuck on addr 0x28 [ 132.704310] [drm:i915_error_work_func], resetting chip [ 132.704370] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 132.704373] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 132.704417] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 133.198449] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 133.198455] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 133.198458] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 133.198460] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 133.208290] [drm:i915_driver_open], [ 133.208298] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 133.208302] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 133.208304] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 133.208306] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 133.208311] [drm:i915_driver_open], [ 135.704156] [drm:i915_hangcheck_elapsed] *ERROR* bsd ring: stuck on addr 0x28 [ 135.704837] [drm:i915_error_work_func], resetting chip [ 135.704878] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 135.704921] [drm:i915_reset] *ERROR* Failed to reset chip. [ 135.704958] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 145.704169] [drm:i915_gem_wait_for_error] *ERROR* Timed out waiting for the gpu reset to complete [ 146.090217] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 146.090226] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 146.090229] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 146.090231] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 146.486347] [drm:i915_error_state_write], Resetting error state Reproduce steps: ---------------- 1../ZZ_hangman
The fix is in the drm-intel-fixes queue: commit 7abb690a0e095717420ba78dcab4309abbbec78a Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri May 24 21:29:32 2013 +0200 drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus
We also need commit 2e7c8ee7a6bf3440478120f14cbf597d416f88b2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 28 10:38:44 2013 +0100 drm/i915: Avoid promoting a simulated hang to 'wedged' from dinq for this case here.
It still happens on latest drm-intel-next-queued kernel(commit:22e407d749a418b4bb4cc93ef76e0429a9f83c82).
Can you please attach a new dmesg from latest -nightly?
Test latest -nightly branch(commit 4f9e7cfb09aa3e2fc3b3bba635c6d0c558ce1b70 Merge: 284e9e5 91f8f10) Run the 1st cycle: output: rings stopped gpu hang correctly dectected Run the 2nd cycle: output: rings stopped gem_quiescent_gpu:146 failed, ret=-1, errno=5 ./ZZ_hangman: line 30: 4491 Aborted (core dumped) $SOURCE_DIR/gem_exec_big gpu hang not dectected dmesg: [ 51.656092] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 51.682678] [drm:i915_driver_open], [ 51.682695] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 51.682702] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 51.682705] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 51.682707] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 51.682713] [drm:i915_driver_open], [ 57.708078] [drm:i915_hangcheck_elapsed] *ERROR* render ring: stuck on addr 0x0 [ 57.708153] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 57.711553] [drm:i915_error_work_func], resetting chip [ 57.711602] [drm] Simulated gpu hang, resetting stop_rings [ 57.711640] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 57.711644] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 57.711710] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 63.704115] [drm:i915_hangcheck_elapsed] *ERROR* bsd ring: stuck on addr 0x28 [ 63.704288] [drm:i915_error_work_func], resetting chip [ 63.704677] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 63.704680] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 63.704728] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 64.235617] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 64.235625] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 64.235627] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 64.235629] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 64.245361] [drm:i915_driver_open], [ 64.245369] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 64.245372] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 64.245375] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 64.245377] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 64.245382] [drm:i915_driver_open], [ 66.712078] [drm:i915_hangcheck_elapsed] *ERROR* bsd ring: stuck on addr 0x28 [ 66.712231] [drm:i915_error_work_func], resetting chip [ 66.712299] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 66.712343] [drm:i915_reset] *ERROR* Failed to reset chip. [ 66.712377] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 66.712839] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 66.712844] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 66.712847] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 66.712849] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 67.088554] [drm:i915_error_state_write], Resetting error state
So we do need the other half of my patch then. :-p
Created attachment 80719 [details] [review] Ignore EIO during set-to-domain
Note that this bug should now be impossible to reproduce on dinq.
Created attachment 80754 [details] i915_error_state ZZ_hangman works well on latest -driq kernel. Run ZZ_hangman then run following cases, they will cause GPU hang: igt/debugfs_emon_crash igt/drm_vma_limiter igt/gem_cpu_concurrent_blit/overwrite-source igt/gem_gtt_concurrent_blit/early-read igt/gem_mmap dmesg: [ 60.419186] [drm:i915_driver_open], [ 60.419202] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 60.419209] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 60.419212] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 60.419214] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 60.419220] [drm:i915_driver_open], [ 60.419260] [drm:i915_getparam], Unknown parameter 22 [ 60.419288] [drm:i915_getparam], Unknown parameter 22 [ 61.969754] [drm:i915_driver_open], [ 61.969766] [drm:i915_driver_open], [ 61.969792] [drm:i915_getparam], Unknown parameter 22 [ 62.225992] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 62.226021] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 62.226025] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 62.226027] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 83.838268] [drm:i915_driver_open], [ 83.838283] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 83.838288] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 83.838291] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 83.838293] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 83.838299] [drm:i915_driver_open], [ 83.838336] [drm:i915_getparam], Unknown parameter 22 [ 83.838362] [drm:i915_getparam], Unknown parameter 22 [ 85.384475] [drm:i915_driver_open], [ 85.384488] [drm:i915_driver_open], [ 85.384514] [drm:i915_getparam], Unknown parameter 22 [ 85.638461] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 85.638472] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 85.638475] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 85.638477] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 89.911226] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 89.914325] [drm:i915_driver_open], [ 89.914341] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 89.914347] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 89.914350] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 89.914352] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 89.914358] [drm:i915_driver_open], [ 97.708011] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 97.708058] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 97.711483] [drm:i915_error_work_func], resetting chip [ 97.711532] [drm] Simulated gpu hang, resetting stop_rings [ 97.711574] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 97.711637] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 97.712079] [drm:i915_getparam], Unknown parameter 22 [ 105.704130] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 105.704373] [drm:i915_error_work_func], resetting chip [ 105.704440] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 105.704514] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 106.231569] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 106.231576] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 106.231579] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 106.231581] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 106.241424] [drm:i915_driver_open], [ 106.241432] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 106.241436] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 106.241438] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 106.241440] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 106.241446] [drm:i915_driver_open], [ 106.241471] [drm:i915_getparam], Unknown parameter 22 [ 114.704133] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 114.704391] [drm:i915_error_work_func], resetting chip [ 114.704467] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 114.704549] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 114.704878] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 114.704882] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 114.704885] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 114.704887] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 114.752250] [drm:i915_error_state_write], Resetting error state [ 128.187576] [drm:i915_driver_open], [ 128.187591] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 128.187597] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 128.187600] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 128.187602] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 128.187608] [drm:i915_driver_open], [ 128.187651] [drm:i915_getparam], Unknown parameter 22 [ 135.704042] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 135.704090] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state [ 135.705435] [drm:i915_error_work_func], resetting chip [ 135.705519] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 135.705604] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 135.705940] [drm:i915_getparam], Unknown parameter 22 [ 137.282160] [drm:i915_driver_open], [ 137.282174] [drm:i915_driver_open], [ 137.282203] [drm:i915_getparam], Unknown parameter 22 [ 139.712142] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 139.712354] [drm:i915_error_work_func], resetting chip [ 139.717439] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 139.717484] [drm:i915_reset] *ERROR* Failed to reset chip. [ 139.717518] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 139.971331] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 139.971343] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 139.971346] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 139.971348] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3]
The bsd ring fails after being reset; the breadcrumb write fails to materialise.
This is wreaking havoc with running igt on my gm45 ... I guess I should take a look at fixing gpu reset on it.
Created attachment 81830 [details] [review] fix media reset on gm45 Can you please test the attached patch? Seems to work better on some light testing here at least ...
(In reply to comment #12) > Created attachment 81830 [details] [review] [review] > fix media reset on gm45 > > Can you please test the attached patch? Seems to work better on some light > testing here at least ... Test with this patch. output: gem_quiescent_gpu:155 failed, ret=-1, errno=5 ./ZZ_hangman: line 30: 3291 Aborted (core dumped) $SOURCE_DIR/gem_exec_big gpu hang correctly dectected dmesg: [ 79.660881] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 79.688312] [drm:i915_driver_open], [ 79.688328] [drm:intel_crtc_cursor_set], cursor off [ 79.688331] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 79.688337] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 79.688340] [drm:intel_crtc_cursor_set], cursor off [ 79.688342] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 79.688344] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 79.688350] [drm:i915_driver_open], [ 83.703210] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 83.703276] [drm] capturing error event; look for more information in /sys/class/drm/card0/error [ 83.704560] [drm:i915_error_work_func], resetting chip [ 83.704603] [drm] Simulated gpu hang, resetting stop_rings [ 83.704640] [drm:i915_reset] *ERROR* Failed to reset chip. [ 83.704680] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 83.777870] [drm:intel_crtc_cursor_set], cursor off [ 83.777875] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 83.777882] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 83.777885] [drm:intel_crtc_cursor_set], cursor off [ 83.777886] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 83.777888] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 84.143386] [drm:i915_error_state_write], Resetting error state
Yeah, that patch is broken, I've failed to properly test it. Back to the drawing board.
Created attachment 81860 [details] [review] run full gem hw init after gpu resets Hopefully I haven't botched the testing on my side again, but this seems to actually work. Please test, thanks.
(In reply to comment #15) > Created attachment 81860 [details] [review] [review] > run full gem hw init after gpu resets > > Hopefully I haven't botched the testing on my side again, but this seems to > actually work. Please test, thanks. Fixed by this patch. output: rings stopped gpu hang correctly dectected dmesg: [ 195.724135] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 195.726451] [drm:i915_driver_open], [ 195.726467] [drm:intel_crtc_cursor_set], cursor off [ 195.726470] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 195.726476] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 195.726479] [drm:intel_crtc_cursor_set], cursor off [ 195.726480] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 195.726483] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 195.726489] [drm:i915_driver_open], [ 199.707175] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 199.707240] [drm] capturing error event; look for more information in /sys/class/drm/card0/error [ 199.710689] [drm:i915_error_work_func], resetting chip [ 199.710735] [drm] Simulated gpu hang, resetting stop_rings [ 199.710780] [drm:init_status_page], render ring hws offset: 0x00477000 [ 199.710960] [drm:init_status_page], bsd ring hws offset: 0x0049a000 [ 199.711131] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 199.711135] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 199.711195] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 200.290435] [drm:intel_crtc_cursor_set], cursor off [ 200.290440] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 200.290447] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.290450] [drm:intel_crtc_cursor_set], cursor off [ 200.290451] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 200.290454] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.300637] [drm:i915_driver_open], [ 200.300646] [drm:intel_crtc_cursor_set], cursor off [ 200.300647] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 200.300651] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.300653] [drm:intel_crtc_cursor_set], cursor off [ 200.300655] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 200.300657] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.300663] [drm:i915_driver_open], [ 200.300700] [drm:intel_crtc_cursor_set], cursor off [ 200.300702] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 200.300705] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.300708] [drm:intel_crtc_cursor_set], cursor off [ 200.300709] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 200.300711] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 200.695181] [drm:i915_error_state_write], Resetting error state
Ok, the previous patch had some pretty massive issues, so new patch to test: https://patchwork.kernel.org/patch/2816111/ It seems to work here, but please confirm that this one is still good.
(In reply to comment #17) > Ok, the previous patch had some pretty massive issues, so new patch to test: > > https://patchwork.kernel.org/patch/2816111/ > > It seems to work here, but please confirm that this one is still good. Works well with this patch. output: rings stopped gpu hang correctly dectected dmesg: [ 199.855498] [drm:i915_ring_stop_set], Stopping rings 0x0000000f [ 199.857770] [drm:i915_driver_open], [ 199.857786] [drm:intel_crtc_cursor_set], cursor off [ 199.857789] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 199.857794] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 199.857797] [drm:intel_crtc_cursor_set], cursor off [ 199.857799] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 199.857801] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 199.857807] [drm:i915_driver_open], [ 203.707172] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring [ 203.707240] [drm] capturing error event; look for more information in /sys/class/drm/card0/error [ 203.710729] [drm:i915_error_work_func], resetting chip [ 203.710774] [drm] Simulated gpu hang, resetting stop_rings [ 203.710819] [drm:i915_gem_context_init], Disabling HW Contexts; old hardware [ 203.710822] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe B [ 203.710879] [drm:i9xx_update_plane], Writing base 00046000 00000000 0 0 5120 [ 204.245423] [drm:intel_crtc_cursor_set], cursor off [ 204.245428] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 204.245434] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.245437] [drm:intel_crtc_cursor_set], cursor off [ 204.245438] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 204.245441] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.255272] [drm:i915_driver_open], [ 204.255280] [drm:intel_crtc_cursor_set], cursor off [ 204.255282] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 204.255285] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.255287] [drm:intel_crtc_cursor_set], cursor off [ 204.255289] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 204.255291] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.255297] [drm:i915_driver_open], [ 204.255332] [drm:intel_crtc_cursor_set], cursor off [ 204.255334] [drm:intel_crtc_set_config], [CRTC:3] [FB:37] #connectors=1 (x y) (0 0) [ 204.255337] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.255339] [drm:intel_crtc_cursor_set], cursor off [ 204.255341] [drm:intel_crtc_set_config], [CRTC:4] [NOFB] [ 204.255343] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:3] [ 204.646094] [drm:i915_error_state_write], Resetting error state
Patch merged to -fixes: commit 035dc1e0f9008b48630e02bf0eaa7cc547416d1d Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 3 12:56:54 2013 +0200 drm/i915: reinit status page registers after gpu reset
Test on latest -nightly branch. output: ./ZZ_hangman checking /sys/kernel/debug/dri/0/i915_error_state rings stopped gpu hang correctly detected checking /sys/class/drm/card0/error rings stopped gpu hang correctly detected dmesg has <6>[ 1438.703273] [drm] capturing error event; look for more information in /sys/class/drm/card0/error # cat /sys/class/drm/card0/error no error state collected Is it expected?
(In reply to comment #20) > Test on latest -nightly branch. > output: > ./ZZ_hangman > checking /sys/kernel/debug/dri/0/i915_error_state > rings stopped > gpu hang correctly detected > checking /sys/class/drm/card0/error > rings stopped > gpu hang correctly detected > > dmesg has <6>[ 1438.703273] [drm] capturing error event; look for more > information in /sys/class/drm/card0/error > > # cat /sys/class/drm/card0/error > no error state collected > > Is it expected? Yes, ZZ_hangman automatically clears the error_state at the end of the test so that if any other test causes a real gpu hang it gets captured.
Verified Fixed.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.