Bug 69119

Summary: [hsw] WAIT_FOR_EVENT hangs (full screen vsync on pipe 1 with BCS)
Product: xorg Reporter: Dan Doel <dan.doel>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs, m.manico
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
/sys/kernel/debug/dri/0/i915_error_state
none
Xorg.0.log
none
dmesg none

Description Dan Doel 2013-09-08 22:05:06 UTC
Created attachment 85463 [details]
/sys/kernel/debug/dri/0/i915_error_state

Greetings,

I recently got a second display, and have been experiencing reliable GPU lockups when starting full screen games on one of the displays. The steps to reproduce go as follows:

  1) Run gnome-shell 3.8.4 as your desktop interface (this is sufficient, at least)
  2) Have a multi-monitor setup
  3) Start a full screen game that only appears on one monitor

The result is several lockups that are eventually freed, and errors in dmesg and such. I believe that OpenGL acceleration of the desktop is a necessary prerequisite, as this behavior doesn't occur in, say, fluxbox. But, I've observed the behavior with multiple games, some of which don't (I think) use OpenGL. At the least, I've seen it with Awesomenauts (which may be using OpenGL) and Icewind Dale running in wine (which I would suspect isn't, but I may be wrong).

This is definitely related to multi-monitors, as well, because if I disable one monitor in the gnome system settings, the problem goes away. Also, if the game does full screen by switching the monitors to mirror one another, the problem will not occur (probably because that is equivalent to a single screen setup). I observed this with Battle for Wesnoth.

I had thought this was related to Bug 54226, but was informed that this was unrelated, and should post a new bug.

I'll attach my Xorg.0.log, dmesg and /sys/kernel/debug/dri/0/i915_error_state from after a lockup.

Relevant hardware/software is as follows:

  CPU/GPU: i7 4770/HD4600
  Motherboard: ASrock z87 extreme6
  
  arch linux, kernel 3.10.10
  gnome-shell 3.8.4
  mesa 9.2.0
  xserver 1.14.2
  intel drivers 2.21.15

Let me know if you need any other information, and I'll attempt to provide it.
Comment 1 Dan Doel 2013-09-08 22:05:55 UTC
Created attachment 85464 [details]
Xorg.0.log
Comment 2 Dan Doel 2013-09-08 22:06:20 UTC
Created attachment 85465 [details]
dmesg
Comment 3 Chris Wilson 2013-09-08 22:15:29 UTC
Ok, we are waiting on a scanline outside of the CRTC bounds. Very naughty.
Comment 4 Chris Wilson 2013-09-08 22:25:40 UTC
Actually on a second read, it may just not like the vsync to full-height. I'll prep a patch tomorrow.
Comment 5 Chris Wilson 2013-09-09 08:42:20 UTC
Actually this gets more bizarre, the BCS can't do wait-on-vblank. So, I'll just have to try with a reduced range.
Comment 6 Chris Wilson 2013-09-09 08:44:54 UTC
Does this do anything:

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index be02a78..f94799f 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -3392,6 +3392,12 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna,
        b = kgem_get_batch(&sna->kgem);
        sna->kgem.nbatch += 5;
 
+       /* Reduce the range slightly to accommodate some latency
+        * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119
+        */
+       y1 = (y1 + 7) & ~7;
+       y2 = (y2 - 7) & ~7;
+
        /* The documentation says that the LOAD_SCAN_LINES command
         * always comes in pairs. Don't ask me why. */
        switch (pipe) {
Comment 7 Chris Wilson 2013-09-09 12:58:02 UTC
I'm scratching my head here for lack of ideas. I could just about force this to use the RCS and a wait-for-vblank-event, but really, really don't want to. So hopefully reducing the scan-line window and thereby creating a larger window for the event to fire will do the trick.

The one other possibility is that IVB requires that both y1 and y2 refer to the line before (and not just y2). This is not mentioned in the bspec for hsw.

Just to be safe, I added and extra check:


diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index be02a78..6a5241a 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -3392,6 +3392,14 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna,
        b = kgem_get_batch(&sna->kgem);
        sna->kgem.nbatch += 5;
 
+       /* Reduce the range slightly to accommodate some latency
+        * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119
+        */
+       y1 = (y1 + 7) & ~7;
+       y2 = (y2 - 7) & ~7;
+       if (y2 <= y1)
+               return false;
+
        /* The documentation says that the LOAD_SCAN_LINES command
         * always comes in pairs. Don't ask me why. */
        switch (pipe) {
Comment 8 Chris Wilson 2013-09-09 13:08:02 UTC
And to be extra safe, do the check before incrementing the batch pointer:

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index be02a78..6a5241a 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -3392,6 +3392,14 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna,
        b = kgem_get_batch(&sna->kgem);
        sna->kgem.nbatch += 5;
 
+       /* Reduce the range slightly to accommodate some latency
+        * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119
+        */
+       y1 = (y1 + 7) & ~7;
+       y2 = (y2 - 7) & ~7;
+       if (y2 <= y1)
+               return false;
+
        /* The documentation says that the LOAD_SCAN_LINES command
         * always comes in pairs. Don't ask me why. */
        switch (pipe) {
Comment 9 Dan Doel 2013-09-09 14:04:18 UTC
I'm afraid that didn't seem to change anything. Still getting lockups and hangcheck errors in dmesg.

By the way, the 'extra safe' patch looks identical to the previous patch to me. Was it meant to move the lines before one or more of the upper context lines?

If so, I'll test this evening.
Comment 10 Chris Wilson 2013-09-09 14:13:38 UTC
The last version just moved the check earlier. I doubt it made a difference to your testing (though there is a very, very small chance it could cause a hang of its own.)

Next I would test:


diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index be02a78..20c67c9 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -3411,6 +3411,7 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna,
        }
        b[4] = MI_WAIT_FOR_EVENT | event;
 
+       sna->kgem.batch_flags |= I915_EXEC_SECURE;
        return true;
 }
 

just in case
Comment 11 Chris Wilson 2013-09-09 15:01:45 UTC
Ok, this only "works" on pipe A due to a typo.
Comment 12 Chris Wilson 2013-09-09 15:28:05 UTC
commit 8ff8eb2b38dc705f5c86f524c1cd74a811a7b04c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 9 16:23:04 2013 +0100

    sna/hsw: Scanline waits require both DERRMR and forcewake
    
    Oh my, bspec is missing a few details on how to perform a scanline wait
    on Haswell. But by using the extra steps required for Ivybridge, we can
    successfully send events from the scanout to the BCS ring. Sadly this
    again means that to use vsync on Haswell requires preventing the GPU
    from sleeping whilst it waits for the scanout to advance.
    
    Reported-by: Dan Doel <dan.doel@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69119
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

That seemed to fix it for my test cases (windowed swaps on all 3 pipes), please do reopen if it is still broken for you.
Comment 13 Dan Doel 2013-09-09 21:48:41 UTC
Fix confirmed here. Excellent work.
Comment 14 Chris Wilson 2015-02-20 12:21:28 UTC
*** Bug 89216 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.