Description
Paul Donohue
2013-05-28 21:25:55 UTC
Created attachment 79913 [details]
xrandr --verbose using 2.20.9-0ubuntu2.1
Created attachment 79916 [details]
xrandr --verbose using 2.21.6-0ubuntu4
Xorg.0.log, dmesg and photograph. Created attachment 79924 [details]
dmesg using 2.21.6-0ubuntu4
Created attachment 79925 [details]
Xorg.0.log using 2.21.6-0ubuntu4
I mis-spoke before ... I upgraded from Ubuntu Quantal to Raring, not Ubuntu Precise to Raring. This problem was apparently triggered by the switch from UXA to SNA. Compiling 2.20.9 from git or Ubuntu with '--with-default-accel=sna' has problems ; Compiling 2.21.6 from git or Ubuntu without '--with-default-accel=sna' works fine ; So it's not a bug introduced between those versions. I don't have a camera with me now - I'll try to get a photograph tomorrow. A few additional observations: Running `xrandr --output VGA1 --off` when the screen is corrupted causes LVDS1 to revert to normal and the external monitor to turn back off. (The corruption isn't permanent.) Using a real 1920x1200 monitor instead of the 3840x1200 virtual monitor via the Matrox box doesn't have any problems. Messing with the '--fb' setting on xrandr changes the behavior of the corruption but doesn't make it go away. Created attachment 79967 [details]
Desktop before extending onto external monitor
Created attachment 79968 [details]
Desktop after xrandr ... --right-of LVDS1
Created attachment 79969 [details]
xrandr --right-of LVDS1 --fb 7000x2000
Three photos containing the same desktop contents (a full-screen browser window and a terminal on the LVDS1 screen, and nothing on the VGA screens). In the first corrupted case (without --fb), the corrupted image changes significantly approx once per second (I have a clock displayed in the bottom left corner of the screen that counts seconds, so it may be triggered by that). In the second corrupted case (with --fb), the corrupted image is static and does not change. Passing different values into --fb causes different corruption patterns on the screen, but passing the same value in repeatedly results in similar (identical?) corruption patterns. In both cases it looks like ~25 vertical pixels across the top of the LVDS1 screen are correct (not corrupted), but the rest of the LVDS1 and the entire VGA output are corrupted. You need a new kernel: commit 4878cae22a2405b6d33318e2dc99a9c1367fee44 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Mon Feb 18 19:08:48 2013 +0200 drm/i915: Really wait for pending flips when panning which will be in v3.10. I tried compiling kernel 3.9.0 with the changes in 4878cae22a2405b6d33318e2dc99a9c1367fee44 applied, but that didn't help. Can you try the kernel package from ppa:mainline drm-intel-nightly to be sure? If the bug continues to persist, can you please build xf86-video-intel from scratch with --enable-sna --enable-debug=full? Still have the issue with: http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/linux-image-3.10.0-994-generic_3.10.0-994.201305300434_amd64.deb I'll try debug=full next. Created attachment 80045 [details]
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full
BTW, using xf86-video-intel master branch, if I enable my external monitor with xrandr --above or --below, I get a new type of corruption (this time on the external monitor only - the LVDS is unaffected). Looks like a pending flip comes in after changing the fb size, yet another variant of the same race as in the kernel. Can you please try: diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c index 2d59831..9b324e2 100644 --- a/src/sna/sna_display.c +++ b/src/sna/sna_display.c @@ -2686,6 +2686,7 @@ sna_mode_resize(ScrnInfoPtr scrn, int width, int height) visit.new = sna->front; TraverseTree(root(screen), sna_visit_set_window_pixmap, &visit); assert(screen->GetWindowPixmap(root(screen)) == sna->front); + sna_dri_destroy_window(root(screen)); } screen->SetScreenPixmap(sna->front); assert(screen->GetScreenPixmap(screen) == sna->front); X starts, my auto-start apps load, I can do stuff for a brief moment, but about a second or two later, the screen flashes, and I end up with a black screen with only a cursor on it. The cursor is frozen, but I can switch to a different tty and kill X off. I never get to the point where I can enable the external monitor to test it. Sounds like an assert failure, sadly not captured as part of Xorg.0.log - but should be on the vt where X was launched from (stderr) or captured by a login dm (e.g. /var/log/xdm.log, /var/log/lightdm/:0.log). Sorry, haven't had time to try it again ... I'm out on vacation from tomorrow until the 11th, so I won't get another chance to look at it until the 12th. No worries. I think the patch should be another step towards the final solution, and so will apply it in the meantime. When you have time, I'd like to finish resolving what is going wrong on your machine. Thanks. Changed my mind - the patch is wrong and should be unnecessary. The abort in the log is a misplaced assert(). When you get a chance, please do grab a fresh debug=full Xorg.log with xf86-video-intel.git. Last week was busy trying to catch up after my vacation, but I can work on this again now. I tried using http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/linux-image-3.10.0-994-generic_3.10.0-994.201306140422_amd64.deb and xf86-video-intel commit c3695c3c6b7bc13b5e642c9d92648e8228411bed ... emerald failed to start with the following error: X Error: BadWindow (invalid Window parameter) 3 Major opcode: 20 (X_GetProperty) Resource id: 0xe00005 Compiz also printed this message, although I'm not sure if it is related to the emerald failure: intel_do_flush_locked failed: No such file or directory Regardless, I was able to enable the external monitor (without a fully working window manager), but the output was still corrupted. I switched back to the stock 3.8.0-23.34 kernel (but still using xf86-video-intel commit c3695c3c6b7bc13b5e642c9d92648e8228411bed), and compiz and emerald were back to normal ... still the same problem with the external monitor. Created attachment 81033 [details]
Xorg.0.log using nightly kernel and xf86-video-intel master branch with debug=full
Created attachment 81034 [details]
Xorg.0.log using 3.8.0-23.34 kernel and xf86-video-intel master branch with debug=full
The bug in -nightly should be fixed by now. However, I am still not seeing a good explanation for the failure. The logs and assertions are all consistent with it behaving normally, so where it goes wrong on the way to the scanout is still a mystery. Can you please checkout intel-gpu-tools and run intel_reg_dumper when the output goes haywire? Created attachment 81044 [details]
Output of intel_reg_dumper during corruption
Created attachment 81045 [details]
Output of intel_reg_dumper using xrandr --above instead of --right-of (external monitor enabled but no corruption)
Not seeing the error I was expecting. Can you please grab a intel_reg_dumper with UXA and the wide configuration? And please grab a screenshot of the corruption? (The screenshot should read back the framebuffer and help isolate where in the process it becomes corrupted.) Created attachment 81071 [details]
Output of intel_reg_dumper using UXA
I tried taking a screenshot of the corruption with `xwd -root`, but it does not appear corrupted in the screenshot. Created attachment 81072 [details]
Output of xwd while screen is corrupted
Ok, that isolates the issue to the scanout configuration. The key difference between UXA and SNA here is that UXA does not enable tiling. Hmm... I tried 'Option "Tiling" "false"' and verified in Xorg.0.log that it was off: [ 74407.915] (WW) intel(0): Tiling disabled, expect poor performance and increased power consumption. But it still shows up corrupted. However, I get messages like this too, so maybe it's not really off? [ 74407.916] kgem_choose_tiling: TLB miss between lines 1920x1200 (pitch=7680), forcing tiling 1 DSPBSTRIDE—Display B/Sprite Stride Register: ... When using tiled memory, the actual memory buffer stride is limited to a maximum of 16K bytes. I was not expecting that limitation! For later generation, this limit matches the CRTC maximum. Option tiling is only for intermediate Pixmaps. For controlling the framebuffer, you want Option "LinearFramebuffer". Yes, that sucks. Should be fixed with: commit cc08f6e0ef54744434fe0fd6d76348ee6099a62d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 19 15:50:01 2013 +0100 sna: Apply scanout stride limits to tiling selection gen4 has a restricted DSPSTRIDE limit for tiled surfaces lower than the maximum supported size of the CRTC. So we need to double check whether tiling the scanout is supported before attempting to allocate a tiled scanout. Reported-by: Paul Donohue <freedesktop-bugs@PaulSD.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65099 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 81074 [details] [review] Detect invalid scanout pitches And the missing layer of defense for the kernel. Bah, this should work better - I already had the infrastructure in place for bypassing scanout restrictions on the frontbuffer: commit f165d2e21358703c5f4ed302a4a57219db482a59 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 19 16:15:32 2013 +0100 sna: Switch to a per-crtc pixmap if the pitch exceeds scanout limitations References: https://bugs.freedesktop.org/show_bug.cgi?id=65099 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That did the trick! Thank you so much for hunting this down! It was you who did all the leg work, so many thanks! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.