Bug 65099 - [GM45] Video corruption on wide desktop
Summary: [GM45] Video corruption on wide desktop
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-28 21:25 UTC by Paul Donohue
Modified: 2013-06-19 15:51 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xrandr --verbose using 2.20.9-0ubuntu2.1 (8.31 KB, text/plain)
2013-05-28 21:27 UTC, Paul Donohue
no flags Details
xrandr --verbose using 2.21.6-0ubuntu4 (8.31 KB, text/plain)
2013-05-28 21:40 UTC, Paul Donohue
no flags Details
dmesg using 2.21.6-0ubuntu4 (63.51 KB, text/plain)
2013-05-28 22:56 UTC, Paul Donohue
no flags Details
Xorg.0.log using 2.21.6-0ubuntu4 (36.28 KB, text/plain)
2013-05-28 22:56 UTC, Paul Donohue
no flags Details
Desktop before extending onto external monitor (1.34 MB, image/jpeg)
2013-05-29 14:07 UTC, Paul Donohue
no flags Details
Desktop after xrandr ... --right-of LVDS1 (1.63 MB, image/jpeg)
2013-05-29 14:08 UTC, Paul Donohue
no flags Details
xrandr --right-of LVDS1 --fb 7000x2000 (1.80 MB, image/jpeg)
2013-05-29 14:10 UTC, Paul Donohue
no flags Details
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full (1.30 MB, text/plain)
2013-05-30 15:36 UTC, Paul Donohue
no flags Details
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full (444.30 KB, text/plain)
2013-06-18 18:37 UTC, Paul Donohue
no flags Details
Xorg.0.log using 3.8.0-23.34 kernel and xf86-video-intel master branch with debug=full (1.29 MB, text/plain)
2013-06-18 18:41 UTC, Paul Donohue
no flags Details
Output of intel_reg_dumper during corruption (13.57 KB, text/plain)
2013-06-18 23:53 UTC, Paul Donohue
no flags Details
Output of intel_reg_dumper using xrandr --above instead of --right-of (external monitor enabled but no corruption) (13.53 KB, text/plain)
2013-06-18 23:54 UTC, Paul Donohue
no flags Details
Output of intel_reg_dumper using UXA (13.54 KB, text/plain)
2013-06-19 13:42 UTC, Paul Donohue
no flags Details
Output of xwd while screen is corrupted (188.16 KB, text/plain)
2013-06-19 13:49 UTC, Paul Donohue
no flags Details
Detect invalid scanout pitches (1.87 KB, patch)
2013-06-19 15:03 UTC, Chris Wilson
no flags Details | Splinter Review

Description Paul Donohue 2013-05-28 21:25:55 UTC
I have a ThinkPad laptop with a GM45 chipset that I connect to two external monitors via a Matrox DualHead2Go box.

I recently upgraded from Ubuntu Precise to Raring.  After the upgrade, if I enable the external monitors using `xrandr --output VGA1 --mode 3840x1200 --right-of LVDS1` or `xrandr --output VGA1 --mode 3840x1200 --left-of LVDS1`, the external monitors turn on, but the entire screen (including LVDS1) is corrupted.

Using '--above' or '--below' instead of '--right-of' or '--left-of' does not cause screen corruption.  Reverting xserver-xorg-video-intel from 2.21.6-0ubuntu4 to 2.20.9-0ubuntu2.1 without making any other changes fixes the problem (I can use '--right-of' or '--left-of' again without screen corruption).

I'll submit additional information as I collect it.
Comment 1 Paul Donohue 2013-05-28 21:27:02 UTC
Created attachment 79913 [details]
xrandr --verbose using 2.20.9-0ubuntu2.1
Comment 2 Paul Donohue 2013-05-28 21:40:08 UTC
Created attachment 79916 [details]
xrandr --verbose using 2.21.6-0ubuntu4
Comment 3 Chris Wilson 2013-05-28 21:45:41 UTC
Xorg.0.log, dmesg and photograph.
Comment 4 Paul Donohue 2013-05-28 22:56:20 UTC
Created attachment 79924 [details]
dmesg using 2.21.6-0ubuntu4
Comment 5 Paul Donohue 2013-05-28 22:56:51 UTC
Created attachment 79925 [details]
Xorg.0.log using 2.21.6-0ubuntu4
Comment 6 Paul Donohue 2013-05-28 23:06:24 UTC
I mis-spoke before ... I upgraded from Ubuntu Quantal to Raring, not Ubuntu Precise to Raring.

This problem was apparently triggered by the switch from UXA to SNA.  Compiling 2.20.9 from git or Ubuntu with '--with-default-accel=sna' has problems ; Compiling 2.21.6 from git or Ubuntu without '--with-default-accel=sna' works fine ; So it's not a bug introduced between those versions.

I don't have a camera with me now - I'll try to get a photograph tomorrow.
Comment 7 Paul Donohue 2013-05-28 23:15:45 UTC
A few additional observations:

Running `xrandr --output VGA1 --off` when the screen is corrupted causes LVDS1 to revert to normal and the external monitor to turn back off.  (The corruption isn't permanent.)

Using a real 1920x1200 monitor instead of the 3840x1200 virtual monitor via the Matrox box doesn't have any problems.

Messing with the '--fb' setting on xrandr changes the behavior of the corruption but doesn't make it go away.
Comment 8 Paul Donohue 2013-05-29 14:07:19 UTC
Created attachment 79967 [details]
Desktop before extending onto external monitor
Comment 9 Paul Donohue 2013-05-29 14:08:41 UTC
Created attachment 79968 [details]
Desktop after xrandr ... --right-of LVDS1
Comment 10 Paul Donohue 2013-05-29 14:10:08 UTC
Created attachment 79969 [details]
xrandr --right-of LVDS1 --fb 7000x2000
Comment 11 Paul Donohue 2013-05-29 14:24:08 UTC
Three photos containing the same desktop contents (a full-screen browser window and a terminal on the LVDS1 screen, and nothing on the VGA screens).

In the first corrupted case (without --fb), the corrupted image changes significantly approx once per second (I have a clock displayed in the bottom left corner of the screen that counts seconds, so it may be triggered by that).

In the second corrupted case (with --fb), the corrupted image is static and does not change.  Passing different values into --fb causes different corruption patterns on the screen, but passing the same value in repeatedly results in similar (identical?) corruption patterns.

In both cases it looks like ~25 vertical pixels across the top of the LVDS1 screen are correct (not corrupted), but the rest of the LVDS1 and the entire VGA output are corrupted.
Comment 12 Chris Wilson 2013-05-29 15:34:51 UTC
You need a new kernel:

commit 4878cae22a2405b6d33318e2dc99a9c1367fee44
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Feb 18 19:08:48 2013 +0200

    drm/i915: Really wait for pending flips when panning

which will be in v3.10.
Comment 13 Paul Donohue 2013-05-30 13:50:11 UTC
I tried compiling kernel 3.9.0 with the changes in 4878cae22a2405b6d33318e2dc99a9c1367fee44 applied, but that didn't help.
Comment 14 Chris Wilson 2013-05-30 13:52:51 UTC
Can you try the kernel package from ppa:mainline drm-intel-nightly to be sure? If the bug continues to persist, can you please build xf86-video-intel from scratch with --enable-sna --enable-debug=full?
Comment 16 Paul Donohue 2013-05-30 15:36:51 UTC
Created attachment 80045 [details]
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full
Comment 17 Paul Donohue 2013-05-30 15:40:04 UTC
BTW, using xf86-video-intel master branch, if I enable my external monitor with xrandr --above or --below, I get a new type of corruption (this time on the external monitor only - the LVDS is unaffected).
Comment 18 Chris Wilson 2013-05-30 17:08:59 UTC
Looks like a pending flip comes in after changing the fb size, yet another variant of the same race as in the kernel.

Can you please try:

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index 2d59831..9b324e2 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -2686,6 +2686,7 @@ sna_mode_resize(ScrnInfoPtr scrn, int width, int height)
                visit.new = sna->front;
                TraverseTree(root(screen), sna_visit_set_window_pixmap, &visit);
                assert(screen->GetWindowPixmap(root(screen)) == sna->front);
+               sna_dri_destroy_window(root(screen));
        }
        screen->SetScreenPixmap(sna->front);
        assert(screen->GetScreenPixmap(screen) == sna->front);
Comment 19 Paul Donohue 2013-05-30 17:56:21 UTC
X starts, my auto-start apps load, I can do stuff for a brief moment, but about a second or two later, the screen flashes, and I end up with a black screen with only a cursor on it.  The cursor is frozen, but I can switch to a different tty and kill X off.  I never get to the point where I can enable the external monitor to test it.
Comment 20 Chris Wilson 2013-05-30 18:27:59 UTC
Sounds like an assert failure, sadly not captured as part of Xorg.0.log - but should be on the vt where X was launched from (stderr) or captured by a login dm (e.g. /var/log/xdm.log, /var/log/lightdm/:0.log).
Comment 21 Paul Donohue 2013-05-31 19:11:11 UTC
Sorry, haven't had time to try it again ... I'm out on vacation from tomorrow until the 11th, so I won't get another chance to look at it until the 12th.
Comment 22 Chris Wilson 2013-05-31 19:42:26 UTC
No worries. I think the patch should be another step towards the final solution, and so will apply it in the meantime. When you have time, I'd like to finish resolving what is going wrong on your machine. Thanks.
Comment 23 Chris Wilson 2013-06-01 15:54:03 UTC
Changed my mind - the patch is wrong and should be unnecessary. The abort in the log is a misplaced assert(). When you get a chance, please do grab a fresh debug=full Xorg.log with xf86-video-intel.git.
Comment 24 Paul Donohue 2013-06-18 18:34:14 UTC
Last week was busy trying to catch up after my vacation, but I can work on this again now.

I tried using http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/linux-image-3.10.0-994-generic_3.10.0-994.201306140422_amd64.deb and xf86-video-intel commit c3695c3c6b7bc13b5e642c9d92648e8228411bed ... emerald failed to start with the following error:
X Error: BadWindow (invalid Window parameter) 3
  Major opcode: 20 (X_GetProperty)
  Resource id:  0xe00005
Compiz also printed this message, although I'm not sure if it is related to the emerald failure:
intel_do_flush_locked failed: No such file or directory

Regardless, I was able to enable the external monitor (without a fully working window manager), but the output was still corrupted.

I switched back to the stock 3.8.0-23.34 kernel (but still using xf86-video-intel commit c3695c3c6b7bc13b5e642c9d92648e8228411bed), and compiz and emerald were back to normal ... still the same problem with the external monitor.
Comment 25 Paul Donohue 2013-06-18 18:37:50 UTC
Created attachment 81033 [details]
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full
Comment 26 Paul Donohue 2013-06-18 18:41:10 UTC
Created attachment 81034 [details]
Xorg.0.log using 3.8.0-23.34 kernel and xf86-video-intel master branch with debug=full
Comment 27 Chris Wilson 2013-06-18 21:47:23 UTC
The bug in -nightly should be fixed by now. However, I am still not seeing a good explanation for the failure. The logs and assertions are all consistent with it behaving normally, so where it goes wrong on the way to the scanout is still a mystery.

Can you please checkout intel-gpu-tools and run intel_reg_dumper when the output goes haywire?
Comment 28 Paul Donohue 2013-06-18 23:53:44 UTC
Created attachment 81044 [details]
Output of intel_reg_dumper during corruption
Comment 29 Paul Donohue 2013-06-18 23:54:46 UTC
Created attachment 81045 [details]
Output of intel_reg_dumper using xrandr --above instead of --right-of (external monitor enabled but no corruption)
Comment 30 Chris Wilson 2013-06-19 08:13:19 UTC
Not seeing the error I was expecting.

Can you please grab a intel_reg_dumper with UXA and the wide configuration? And please grab a screenshot of the corruption? (The screenshot should read back the framebuffer and help isolate where in the process it becomes corrupted.)
Comment 31 Paul Donohue 2013-06-19 13:42:35 UTC
Created attachment 81071 [details]
Output of intel_reg_dumper using UXA
Comment 32 Paul Donohue 2013-06-19 13:44:21 UTC
I tried taking a screenshot of the corruption with `xwd -root`, but it does not appear corrupted in the screenshot.
Comment 33 Paul Donohue 2013-06-19 13:49:40 UTC
Created attachment 81072 [details]
Output of xwd while screen is corrupted
Comment 34 Chris Wilson 2013-06-19 14:22:50 UTC
Ok, that isolates the issue to the scanout configuration. The key difference between UXA and SNA here is that UXA does not enable tiling.
Comment 35 Paul Donohue 2013-06-19 14:46:20 UTC
Hmm... I tried 'Option "Tiling" "false"' and verified in Xorg.0.log that it was off:
[ 74407.915] (WW) intel(0): Tiling disabled, expect poor performance and increased power consumption.
But it still shows up corrupted.  However, I get messages like this too, so maybe it's not really off?
[ 74407.916] kgem_choose_tiling: TLB miss between lines 1920x1200 (pitch=7680), forcing tiling 1
Comment 36 Chris Wilson 2013-06-19 15:00:49 UTC
DSPBSTRIDE—Display B/Sprite Stride Register:
... When using tiled memory, the actual memory buffer stride is limited to a maximum of 16K bytes.

I was not expecting that limitation! For later generation, this limit matches the CRTC maximum.
Comment 37 Chris Wilson 2013-06-19 15:01:44 UTC
Option tiling is only for intermediate Pixmaps. For controlling the framebuffer, you want Option "LinearFramebuffer". Yes, that sucks.
Comment 38 Chris Wilson 2013-06-19 15:02:51 UTC
Should be fixed with:

commit cc08f6e0ef54744434fe0fd6d76348ee6099a62d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 19 15:50:01 2013 +0100

    sna: Apply scanout stride limits to tiling selection
    
    gen4 has a restricted DSPSTRIDE limit for tiled surfaces lower than the
    maximum supported size of the CRTC. So we need to double check
    whether tiling the scanout is supported before attempting to allocate a
    tiled scanout.
    
    Reported-by: Paul Donohue <freedesktop-bugs@PaulSD.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65099
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 39 Chris Wilson 2013-06-19 15:03:45 UTC
Created attachment 81074 [details] [review]
Detect invalid scanout pitches

And the missing layer of defense for the kernel.
Comment 40 Chris Wilson 2013-06-19 15:18:09 UTC
Bah, this should work better - I already had the infrastructure in place for bypassing scanout restrictions on the frontbuffer:

commit f165d2e21358703c5f4ed302a4a57219db482a59
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 19 16:15:32 2013 +0100

    sna: Switch to a per-crtc pixmap if the pitch exceeds scanout limitations
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=65099
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 41 Paul Donohue 2013-06-19 15:37:23 UTC
That did the trick!

Thank you so much for hunting this down!
Comment 42 Chris Wilson 2013-06-19 15:51:38 UTC
It was you who did all the leg work, so many thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.