Created attachment 85463 [details] /sys/kernel/debug/dri/0/i915_error_state Greetings, I recently got a second display, and have been experiencing reliable GPU lockups when starting full screen games on one of the displays. The steps to reproduce go as follows: 1) Run gnome-shell 3.8.4 as your desktop interface (this is sufficient, at least) 2) Have a multi-monitor setup 3) Start a full screen game that only appears on one monitor The result is several lockups that are eventually freed, and errors in dmesg and such. I believe that OpenGL acceleration of the desktop is a necessary prerequisite, as this behavior doesn't occur in, say, fluxbox. But, I've observed the behavior with multiple games, some of which don't (I think) use OpenGL. At the least, I've seen it with Awesomenauts (which may be using OpenGL) and Icewind Dale running in wine (which I would suspect isn't, but I may be wrong). This is definitely related to multi-monitors, as well, because if I disable one monitor in the gnome system settings, the problem goes away. Also, if the game does full screen by switching the monitors to mirror one another, the problem will not occur (probably because that is equivalent to a single screen setup). I observed this with Battle for Wesnoth. I had thought this was related to Bug 54226, but was informed that this was unrelated, and should post a new bug. I'll attach my Xorg.0.log, dmesg and /sys/kernel/debug/dri/0/i915_error_state from after a lockup. Relevant hardware/software is as follows: CPU/GPU: i7 4770/HD4600 Motherboard: ASrock z87 extreme6 arch linux, kernel 3.10.10 gnome-shell 3.8.4 mesa 9.2.0 xserver 1.14.2 intel drivers 2.21.15 Let me know if you need any other information, and I'll attempt to provide it.
Created attachment 85464 [details] Xorg.0.log
Created attachment 85465 [details] dmesg
Ok, we are waiting on a scanline outside of the CRTC bounds. Very naughty.
Actually on a second read, it may just not like the vsync to full-height. I'll prep a patch tomorrow.
Actually this gets more bizarre, the BCS can't do wait-on-vblank. So, I'll just have to try with a reduced range.
Does this do anything: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index be02a78..f94799f 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -3392,6 +3392,12 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna, b = kgem_get_batch(&sna->kgem); sna->kgem.nbatch += 5; + /* Reduce the range slightly to accommodate some latency + * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119 + */ + y1 = (y1 + 7) & ~7; + y2 = (y2 - 7) & ~7; + /* The documentation says that the LOAD_SCAN_LINES command * always comes in pairs. Don't ask me why. */ switch (pipe) {
I'm scratching my head here for lack of ideas. I could just about force this to use the RCS and a wait-for-vblank-event, but really, really don't want to. So hopefully reducing the scan-line window and thereby creating a larger window for the event to fire will do the trick. The one other possibility is that IVB requires that both y1 and y2 refer to the line before (and not just y2). This is not mentioned in the bspec for hsw. Just to be safe, I added and extra check: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index be02a78..6a5241a 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -3392,6 +3392,14 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna, b = kgem_get_batch(&sna->kgem); sna->kgem.nbatch += 5; + /* Reduce the range slightly to accommodate some latency + * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119 + */ + y1 = (y1 + 7) & ~7; + y2 = (y2 - 7) & ~7; + if (y2 <= y1) + return false; + /* The documentation says that the LOAD_SCAN_LINES command * always comes in pairs. Don't ask me why. */ switch (pipe) {
And to be extra safe, do the check before incrementing the batch pointer: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index be02a78..6a5241a 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -3392,6 +3392,14 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna, b = kgem_get_batch(&sna->kgem); sna->kgem.nbatch += 5; + /* Reduce the range slightly to accommodate some latency + * in detection: https://bugs.freedesktop.org/show_bug.cgi?id=69119 + */ + y1 = (y1 + 7) & ~7; + y2 = (y2 - 7) & ~7; + if (y2 <= y1) + return false; + /* The documentation says that the LOAD_SCAN_LINES command * always comes in pairs. Don't ask me why. */ switch (pipe) {
I'm afraid that didn't seem to change anything. Still getting lockups and hangcheck errors in dmesg. By the way, the 'extra safe' patch looks identical to the previous patch to me. Was it meant to move the lines before one or more of the upper context lines? If so, I'll test this evening.
The last version just moved the check earlier. I doubt it made a difference to your testing (though there is a very, very small chance it could cause a hang of its own.) Next I would test: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index be02a78..20c67c9 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -3411,6 +3411,7 @@ static bool sna_emit_wait_for_scanline_hsw(struct sna *sna, } b[4] = MI_WAIT_FOR_EVENT | event; + sna->kgem.batch_flags |= I915_EXEC_SECURE; return true; } just in case
Ok, this only "works" on pipe A due to a typo.
commit 8ff8eb2b38dc705f5c86f524c1cd74a811a7b04c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 9 16:23:04 2013 +0100 sna/hsw: Scanline waits require both DERRMR and forcewake Oh my, bspec is missing a few details on how to perform a scanline wait on Haswell. But by using the extra steps required for Ivybridge, we can successfully send events from the scanout to the BCS ring. Sadly this again means that to use vsync on Haswell requires preventing the GPU from sleeping whilst it waits for the scanout to advance. Reported-by: Dan Doel <dan.doel@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69119 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That seemed to fix it for my test cases (windowed swaps on all 3 pipes), please do reopen if it is still broken for you.
Fix confirmed here. Excellent work.
*** Bug 89216 has been marked as a duplicate of this bug. ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.