Bug 54707

Summary: SNA corrupted text on second output using ZaphodHeads and pixman glyph cache
Product: xorg Reporter: Stephen Liang <inteldriver>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Corrupted text on left side, completely fine on right.
none
screenshot of display 0.1
none
screenshot of display 0.0
none
log
none
screenshot of display 0.1 v2
none
xorg.0.log
none
xorg.0.log of affected x session none

Description Stephen Liang 2012-09-09 23:17:28 UTC
Created attachment 66898 [details]
Corrupted text on left side, completely fine on right.

This seems to affect any application containing text, including terminal. On Ubuntu 12.04  Linux x86_64 3.2.0-30-generic. If I move the application to LVDS1, the text is rendered correctly. If it's on HDMI1, it is corrupted.

Using version intel driver 2.20.7 from git

Hardware: intel(0): Integrated Graphics Chipset: Intel(R) Sandybridge
Mobile (GT2)

HDMI1 is connected through DisplayPort to a Displayport adapter to DVI.
Comment 1 Stephen Liang 2012-09-09 23:19:57 UTC
Forgot to mention that this is configured through xorg.conf with two screens using ZaphodHead to configure LVDS1 and HDMI1.
Comment 2 Chris Wilson 2012-09-10 07:32:28 UTC
Please also attach your Xorg.log so that we are not missing anything else.
Comment 3 Chris Wilson 2012-09-10 12:58:17 UTC
Ok, this should do the trick:

commit b0d14071f7b60729c223af925935227393fbd3ee
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 10 13:53:45 2012 +0100

    sna: Workaround issue with global glyph privates and shared ZaphodHeads
    
    Under ZaphodHeads we end up with multple screens accessing the common
    sna_glyph_key and so cause conflicting updates and erroneous references
    into the screen-local texture atlases.
    
    Two approaches can be tried here. Transition to a screen-specific
    private key introduced with xorg-1.13, or to move the glyph cache (and
    the rest of the gpu state tracker) down into the device private rather
    than screen private. This is neither of those, but a workaround to avoid
    reusing the incorrect entries from shared screens.
    
    Reported-by: Stephen Liang <inteldriver@angrywalls.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54707
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 4 m 2014-03-09 04:46:37 UTC
I am experiencing this issue on arch linux with xf86-video-intel-2.99.910-1.

text on display 0.1 becomes progressively display more and more artifacts, as if a cache is slowly getting more corrupted. I don't this left or right, really matters, what appears to matter is which display is configured as 0.1

I posted details on the arch forum:

https://bbs.archlinux.org/viewtopic.php?id=177988

and am happy to assist with any debugging/logging to assist resolving this bug.

Thanks

Mat
Comment 5 m 2014-03-09 04:48:49 UTC
that last comment should have said "text on display 0.1 becomes progressively display affected, displaying more and more artifacts.."
Comment 6 Chris Wilson 2014-03-09 07:21:45 UTC
If you can grab a screenshot that would help to confirm if it is the cache being corrupted or something else. The issue with ZaphodHeads is that the glyph cache is global, so if it was the glyphs being corrupted the effects should be visible on both displays (depending upon the exact nature of the corruption).

Anyway screenshot and an Xorg.0.log would be useful. Also compiling with --enable-debug would help rule out the most idiotic mistakes (by crashing the Xserver with an assertion!).
Comment 7 m 2014-03-09 09:27:42 UTC
Created attachment 95399 [details]
screenshot of display 0.1

The corruption has just begun and only a few characters are affected, some of the digit characters
Comment 8 m 2014-03-09 09:29:21 UTC
Created attachment 95400 [details]
screenshot of display 0.0

the corruption is not identical on display 0.0, however a small error can be seen, the '9' character in the tab is the wrong size
Comment 9 m 2014-03-09 09:32:32 UTC
Created attachment 95402 [details]
log

there are more examples of the mis-sized 9 here, and some blank lines in the list of themes, where text has not been written, If i move over these lines or change focus to another window, it will cause the window to re-draw, and these blank lines will be written
Comment 10 m 2014-03-09 09:33:32 UTC
Comment on attachment 95402 [details]
log

mis post - please delete
Comment 11 m 2014-03-09 09:35:25 UTC
Created attachment 95403 [details]
screenshot of display 0.1 v2

more mis-sized nines are visible. Also blank rows in the list where text has not been written. When window focus changes or if I hover over these rows the text will be written.
Comment 12 m 2014-03-09 09:36:25 UTC
Comment on attachment 95402 [details]
log

test
Comment 13 m 2014-03-09 09:38:47 UTC
Created attachment 95404 [details]
xorg.0.log

xorg.0.log of zaphodheads session with text corruption
Comment 14 m 2014-03-09 09:44:18 UTC
I've attached screenshots and the xorg.0.log.

It may turn out to be relevant that display 0.1 is set to dpi 192, while 0.0 is left at dpi 96. I have also tried the reverse arrangement, with the dpi 192 screen set as display 0.0. In both cases the corruption occured mostly, but not exclusively on 0.1 regardless of which screen was 0.1.

I will trying compiling the drives with the suggested flag and see what happens.
Comment 15 Chris Wilson 2014-03-09 17:35:39 UTC
One thing you can try for me is:

diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c
index c72c5e5..a88d29c 100644
--- a/src/sna/sna_glyphs.c
+++ b/src/sna/sna_glyphs.c
@@ -2051,7 +2051,7 @@ glyphs_via_image(struct sna *sna,
        }
 
        memset(pixmap->devPrivate.ptr, 0, pixmap->devKind*height);
-#if HAS_PIXMAN_GLYPHS
+#if HAS_PIXMAN_GLYPHS && 0
        if (sna->render.glyph_cache) {
                pixman_glyph_t stack_glyphs[N_STACK_GLYPHS];
                pixman_glyph_t *pglyphs = stack_glyphs;
Comment 16 m 2014-03-10 10:24:34 UTC
It looks like you've identified the location of the bug.

I have built the driver with your patch.
So far I haven't seen any corruption, I think I would have seen some at this point if the error was occuring. I'm going to run this session for a few days to see if the errors re-occur and will post an update.

Thank you, please let me know if I can test any other patches for you.
Comment 17 Chris Wilson 2014-03-10 22:25:58 UTC
I am still puzzled. I haven't spotted anything that looks strange in that path for glyph rendering via pixman, so I fear some subtly with the global caches being accessed from multiple screens. :|
Comment 18 m 2014-03-11 21:21:14 UTC
I might try going back to the unpatched driver to test whether the different dpi setting on the two displays is an issue
Comment 19 m 2014-03-11 21:22:09 UTC
though I can also confirm, after a day of using the pathced driver I have had no text corruption
Comment 20 Chris Wilson 2014-03-12 14:20:48 UTC
One thing we should do is run ZaphodHeads with xf86-video-intel compiled with ./configure --enable-debug. There just may be a very silly bug already being detected.
Comment 21 Chris Wilson 2014-03-12 14:31:54 UTC
The only difference between the two paths is that the old path uses the glyph image directly (but it is still cached per glyph, not per glyph per screen) and the new path uses the pixman_glyph_cache on top. I am more confident that the error lies inside the pixman glyph routines rather than in our use of it.
Comment 22 Chris Wilson 2014-03-12 17:44:09 UTC
Yet another thing to try is updating to 

commit 4368a74b1c904625158f0b6baf98ef2a60bb54d4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 12 17:37:01 2014 +0000

    sna: Add DBG around using pixman's glyph cache

and applying

diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c
index 563bbb4..93f564b 100644
--- a/src/sna/sna_glyphs.c
+++ b/src/sna/sna_glyphs.c
@@ -67,6 +67,9 @@
 #include "sna_render_inline.h"
 #include "fb/fbpict.h"
 
+#undef DBG
+#define DBG(x) ErrorF x
+
 #define FALLBACK 0
 #define NO_GLYPH_CACHE 0
 #define NO_GLYPHS_TO_DST 0

then attach the Xorg.0.log when we start to see corruption.
Comment 23 m 2014-04-06 09:06:18 UTC
Hello Chris,
sorry to take so long to respond,
I'm patching this now and will get back to you.
Comment 24 m 2014-04-21 08:06:18 UTC
Created attachment 97667 [details]
xorg.0.log of affected x session

first 10000 lines of xorg.0.log of a session with the patched driver which exihibited the glyph corruption
Comment 25 m 2014-04-21 08:11:09 UTC
Hello Chris, I compiled with your logging patch and with 
./configure --enable-debug
glyph errors occured as expected when run in zaphodheads mode

I have included the first 10000 lines, beyond that that the log seems to be very repetitive for the next 38 meg

cheers

Mat
Comment 26 m 2014-04-21 08:25:44 UTC
also, I misunderstood how the Arch Build System works, so I have missed any of your recent commits. cheers mat
Comment 27 m 2014-04-21 09:08:18 UTC
the patch was applied to arch repository version of the driver xf86-video-intel-2.99.910
Comment 28 Chris Wilson 2014-04-22 08:06:35 UTC
Ok, I can reproduce this. So far, everything points back to the pixman glyph cache.
Comment 29 Chris Wilson 2014-04-22 08:45:48 UTC
commit f6bc0e390bcba97f7aacab4d87251867ca309c17
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 22 09:41:21 2014 +0100

    sna: Use a global pixman glyph cache
    
    In Zaphod mode, we use a common pool of glyph images but insert them
    individually into a cache for each head. However, we only remove the
    image from the first cache, leaving a stale slot in the second head.
    Upon subsequent reuse of the glyph id, the second head renders garbage.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54707
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 30 m 2014-04-24 11:46:07 UTC
great work chris, thanks for the help

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.