Bug 100261

Summary: [BAT] [IVB] kms_flip blt-wf_vblank-vs-dpms GPU hang
Product: DRI Reporter: Dorota Czaplejewicz <dorota.czaplejewicz>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: christophe.prigent, conselvan2, dorota.czaplejewicz, intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: HSW, IVB i915 features:
Attachments:
Description Flags
Complete compressed dmesg since test beginning
none
GPU error state none

Description Dorota Czaplejewicz 2017-03-17 18:09:11 UTC
Created attachment 130296 [details]
Complete compressed dmesg since test beginning

On IVB-3770 the test kms_flip blt-wf_vblank-vs-dpms hangs the system - no longer responds to pings.

Kernel: drm-tip: 2017y-03m-17d-15h-12m-48s
No screen connected.

Run log:
$ sudo ./kms_flip --r blt-wf_vblank-vs-dpms
IGT-Version: 1.18-gd08d263 (x86_64) (Linux: 4.11.0-rc2-CI-CI_DRM_2362_9942f19+ x86_64)
Using monotonic timestamps
Beginning blt-wf_vblank-vs-dpms on pipe A, connector VGA-1
  1024x768 60 1024 1048 1184 1344 768 771 777 806 0xa 0x40 65000
.

Dmesg:
[  102.388446] [drm:intel_dump_pipe_config [i915]] requested mode:
[  102.388452] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[  102.388481] [drm:intel_dump_pipe_config [i915]] adjusted mode:
[  102.388488] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[  102.388517] [drm:intel_dump_pipe_config [i915]] crtc timings: 65000 1024 1048 1184 1344 768 771 777 806, type: 0x40 flags: 0xa
[  102.388543] [drm:intel_dump_pipe_config [i915]] port clock: 65000, pipe src size: 1024x768, pixel rate 65000
[  102.388572] [drm:intel_dump_pipe_config [i915]] pch pfit: pos: 0x00000000, size: 0x00000000, disabled
[  102.388601] [drm:intel_dump_pipe_config [i915]] ips: 0, double wide: 0
[  102.388629] [drm:ibx_dump_hw_state [i915]] dpll_hw_state: dpll: 0xc4100010, dpll_md: 0x0, fp0: 0x10c09, fp1: 0x10c09
[  102.388655] [drm:intel_dump_pipe_config [i915]] planes on this crtc
[  102.388684] [drm:intel_dump_pipe_config [i915]] [PLANE:26:primary A] FB:71, fb = 1024x768 format = XR24 little-endian (0x34325258)
[  102.388710] [drm:intel_dump_pipe_config [i915]] [PLANE:28:sprite A] disabled, scaler_id = 0
[  102.388762] [drm:intel_dump_pipe_config [i915]] [PLANE:30:cursor A] disabled, scaler_id = 0
[  102.388792] [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:32:pipe A] has [PLANE:26:primary A] with fb 71
[  102.388821] [drm:intel_plane_atomic_calc_changes [i915]] [PLANE:26:primary A] visible 0 -> 1, off 0, on 1, ms 1
[  102.388866] [drm:intel_find_shared_dpll [i915]] [CRTC:32:pipe A] allocated PCH DPLL A
[  102.388893] [drm:intel_reference_shared_dpll [i915]] using PCH DPLL A for pipe A
[  102.388940] [drm:drm_atomic_commit] commiting ffff8802226864f8
[  104.825671] [drm:missed_breadcrumb [i915]] blitter ring missed breadcrumb at intel_breadcrumbs_hangcheck+0x5c/0x80 [i915], irq posted? no
[  108.930913] [drm] GPU HANG: ecode 7:1:0xe77ffef3, in kms_flip [1096], reason: Hang on blitter ring, action: reset
[  108.930941] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  108.930943] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  108.930944] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  108.930945] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  108.930946] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  108.931235] [drm:i915_reset_and_wakeup [i915]] resetting chip
[  113.277565] asynchronous wait on fence i915:[global]:a timed out
Comment 1 Dorota Czaplejewicz 2017-03-17 18:25:47 UTC
Same issue on HSW-4770R
Comment 2 Jari Tahvanainen 2017-03-20 08:25:54 UTC
Dorota - please if this is GPU Hang then please provide error state. See 
https://01.org/linuxgraphics/documentation/how-get-gpu-error-state
Comment 3 Dorota Czaplejewicz 2017-03-20 10:05:06 UTC
I haven't done that yet because the system freezes soon afterwards. I will report back if I'm susccessful in fetching the state.
Comment 4 Ander Conselvan de Oliveira 2017-03-22 13:24:46 UTC
*** Bug 100300 has been marked as a duplicate of this bug. ***
Comment 5 Ander Conselvan de Oliveira 2017-03-22 13:27:05 UTC
Created attachment 130378 [details]
GPU error state
Comment 6 Ander Conselvan de Oliveira 2017-03-22 13:29:38 UTC
If I revert the following commit the test passes.

commit 96ec8cb3b5ec1fc2927d6cff8e09930082301d7e
Author: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Date:   Sat Oct 29 01:01:05 2016 +0300

    igt/kms_flip: Use new igt_spin_batch
Comment 7 Ander Conselvan de Oliveira 2017-03-24 11:37:08 UTC
Reverting the following patch also makes the test pass:

commit 39d2eda23342e5a473563cf1c954246eb07529c7
Author: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Date:   Thu Dec 1 12:50:19 2016 +0200

    igt/kms_flip: Fix set_dpms called with an idle bo


My understanding is that that the call to set_dpms() just after igt_spin_batch_set_timeout() blocks in intel_atomic_commit() in

    if (!nonblock) {
        i915_sw_fence_wait(&intel_state->commit_ready);
Comment 8 Ander Conselvan de Oliveira 2017-03-24 11:39:10 UTC
(Hit save too soon by accident)

(In reply to Ander Conselvan de Oliveira from comment #7)
> Reverting the following patch also makes the test pass:
> 
> commit 39d2eda23342e5a473563cf1c954246eb07529c7
> Author: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> Date:   Thu Dec 1 12:50:19 2016 +0200
> 
>     igt/kms_flip: Fix set_dpms called with an idle bo
> 
> 
> My understanding is that that the call to set_dpms() just after
> igt_spin_batch_set_timeout() blocks in intel_atomic_commit() in
> 
>     if (!nonblock) {
>         i915_sw_fence_wait(&intel_state->commit_ready);

The commit is waiting for the dummy write from the spin batch to complete, but since the test is blocked the timeout signal handler never fires, so the batch just keeps spinning and the write never completes.
Comment 9 Ander Conselvan de Oliveira 2017-03-27 08:48:17 UTC
https://patchwork.freedesktop.org/series/21912/
Comment 10 Ander Conselvan de Oliveira 2017-03-27 11:39:50 UTC
commit eb6ed462f256dd983108f1c86ddd5d3a6190624b
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Mon Mar 27 14:08:28 2017 +0300

    lib/dummyload: Handle timeout in a new thread instead of signal handler

in i-g-t.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.