Bug 15700

Summary: Black Boxes with Geode on X.org 1.5
Product: xorg Reporter: Warren Togami <wtogami>
Component: Driver/geodeAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: benjamin, bernie, frankr.huang, jordan
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log with geode 2.8.0
none
Properly drawn button
none
Improperly drawn button
none
Python script that exposes the issue (using cairo)
none
Patch to fix mask transforms
none
a Xlib program to do the rendering work
none
Updated rendering test application
none
Result with the rendering application on Radeon X1200
none
Result with the rendering application on Geode LX
none
the src picture of a gtk progressbar program drawing process on Geode
none
the dst picture of a gtk progressbar program drawing process on Geode
none
the src picture of a gtk progressbar program drawing process on Radeon
none
the dst picture of a gtk progressbar program drawing process on Radeon none

Description Warren Togami 2008-04-24 14:58:48 UTC
xorg-x11-drv-geode-2.8.0-3.fc9.i386
xorg-x11-server-Xorg-1.4.99.901-23.20080415.fc9.i386

http://people.redhat.com/wtogami/temp/geode-bad-redraw.png
Some parts of the screen are drawing as black boxes like this.  Some buttons in dialogs (Logout button in the Logout dialog) and Nautilus text labels are appearing black as well.  It is usually only one thing at a time that appears black.

Downgrading to geode 2.7.7.7 and xorg-server from early March 2008 didn't help.
Comment 1 Warren Togami 2008-04-24 22:22:16 UTC
Created attachment 16171 [details]
Xorg log with geode 2.8.0
Comment 2 Warren Togami 2008-04-26 07:22:14 UTC
Option "EXANoComposite" "yes"

This option works around the problem.
Comment 3 Andreas K. Förster 2008-07-03 05:35:54 UTC
On my system the problem was not black boxes, but in Firefox the background color of boxes where white, or extremely lightened up.

The same workaround solved the problem, so I think it's the same bug.

(In reply to comment #2)
> Option "EXANoComposite" "yes"
> 
> This option works around the problem.
> 

Comment 4 Andreas K. Förster 2008-07-05 05:22:25 UTC
After some testing, it turned out that the best solution for my problems was the following option as the only option:

Option  "MigrationHeuristic"  "greedy"

See manpage exa(4).
Comment 5 Warren Togami 2008-07-08 19:41:36 UTC
Option  "MigrationHeuristic"  "greedy"

This seems to make the black rectangles go away for me too.

I talked with ajax earlier today about this.  He says this means there is a bug in the Geode driver because the migration heuristic should not affect correctness.  By "greedy" seeming to make the visual glitches go away, it is actually going away most of the time, but we are likely to see some glitches on rare occasions due to the bug that was exposed by the default "always".
Comment 6 Jordan Crouse 2008-10-09 15:10:43 UTC
Okay, I've been playing with this a little bit.  The "Greedy" heuristic works, because it seems to skip the EXA composite step.  The situation here is that that the GTK theme is trying to draw a pretty gradient using an alpha mask over the button (or panel in this case).   In the composite stage, the Geode driver is given an alpha mask for the button and a color.  In this case, the color is #000507 which is pretty nearly black.  I think that typically the alpha mask would be full of pretty low alpha values which would let more of the background show through (the background being #ece9e9).  In our case, the mask is full of ffs, which results in the solid black color.

So the driver is behaving correctly, and rendering what it is given.  The question is - why is it being given something clearly bogus?  One possibility might be that in an earlier step, EXA wants the driver to try to construct a PICT_a8 image - we refuse, because the Geode can't do 8 bit alpha operations.  Its possible that the software render that we punt back to is not building the mask correctly.

Investigations continue.


Comment 7 Andreas K. Förster 2008-10-10 01:04:48 UTC
(In reply to comment #6)
> In our case, the mask is full of ffs, which results in the solid black color.
> So the driver is behaving correctly, and rendering what it is given.  The
> question is - why is it being given something clearly bogus?  One possibility

Just an idea: maybe it is not a "mask". Maybe you shouldn't interpret it as "what to filter out", but as "what to let through". So maybe you simply have to use the reverse value to make it a mask.
Comment 8 Andreas K. Förster 2008-10-10 03:34:01 UTC
BTW. With "reverse value" I don't mean the binary inverse, but more "value = maximum - value;". The name "mask" may also be okay, but it's a "positive mask" then.
But again: this is just a theory.
Comment 9 Jordan Crouse 2008-10-10 08:45:47 UTC
We're not allowed to put our own interpretation on the bytes.  This is a composite OVER operation applying a source color to the destination according
to the mask.  There really isn't any room for creativity there.

Here is the top left corner of the mask that is being uploaded to video memory:

4b e4 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
e3 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

By examining the results of the operation, you can clearly see that the driver is rendering the operation correctly.  I have uploaded screen shots of a good button (drawn with the greedy heuristic) and a bad button drawn with composite enabled.  You can see that the above mask is being applied faithfully in the bad example.  Notice the upper left corner - the rendering results match the mask I posted above.

So for now, I'm sticking with my theory that the mask is wrong - I wanted to blame pixman for a while, but if pixman was broken, it wouldn't work in any case.
Comment 10 Jordan Crouse 2008-10-10 08:47:52 UTC
Created attachment 19572 [details]
Properly drawn button
Comment 11 Jordan Crouse 2008-10-10 08:48:17 UTC
Created attachment 19573 [details]
Improperly drawn button
Comment 12 Michel Dänzer 2008-10-10 09:04:42 UTC
As it doesn't seem to affect everything, it could be some kind of coherency issue between the CPU copying the mask data to the offscreen area and the accelerator reading it for the composite operation, or something along those lines.

Out of curiosity, are composite operations to PICT_a8 pictures rejected in the CheckComposite or PrepareComposite hook?
Comment 13 Michel Dänzer 2008-10-10 09:07:51 UTC
(In reply to comment #12)
> As it doesn't seem to affect everything, it could be some kind of coherency
> issue between the CPU copying the mask data to the offscreen area and the
> accelerator reading it for the composite operation, or something along those
> lines.

Hmm scratch that, in that case you should have captured different mask data...
Comment 14 Jordan Crouse 2008-10-10 10:54:15 UTC
(In reply to comment #12) 
> Out of curiosity, are composite operations to PICT_a8 pictures rejected in the
> CheckComposite or PrepareComposite hook?

In CheckComposite. 

I've tried everything I can think of on the driver side.  Interestingly enough, in Rawhide with X server 1.5.1 this problem seems to only be restricted to buttons.  The theme has gradients all over the place - for example, pygtk-demo displays a gradient on the tab buttons, that renders fine.   It seems to just be this one operation that is freaking us out, and I have no idea why.  I guess its time to start hacking on the X server.


Comment 15 Michel Dänzer 2008-10-11 03:36:38 UTC
FWIW, this problem looks related to text rather than gradients.

It might be interesting to try and replicate the problem with other drivers, e.g. by making them also reject PICT_a8 destinations in the CheckComposite hook. Actually, it looks like xf86-video-intel doesn't support PICT_a8 destinations on i8xx either.
Comment 16 Michel Dänzer 2008-10-12 16:32:04 UTC
BTW, does this still happen with xserver Git master?
Comment 17 Jordan Crouse 2008-10-13 08:12:37 UTC
(In reply to comment #15)
> FWIW, this problem looks related to text rather than gradients.

There is also a text problem that is definitely tied to PICT_a8 that several other drivers are grappling with.  I am not convinced that this is the same problem but neither am I sure that it isn't.

I'll try GIT head today and see if it makes a difference.  Up to now, I've only been using the Rawhide RPMs (up to 1.5.1).

Comment 18 Michel Dänzer 2008-10-13 11:19:00 UTC
(In reply to comment #17)
> There is also a text problem that is definitely tied to PICT_a8 that several
> other drivers are grappling with.

Any pointers to more information about that?
Comment 19 Benjamin Berg 2008-12-28 10:17:15 UTC
I ran into an issue that is probably related. What happens is that drawing a A8 mask works without any issue, but as soon as the mask is scaled, it all breaks. 

I'll attach a small python script that exposes the issue. The problem goes away if you set the scale to 1 in the script or set any of the options that were mentioned earlier.
Comment 20 Benjamin Berg 2008-12-28 10:20:04 UTC
Created attachment 21519 [details]
Python script that exposes the issue (using cairo)
Comment 21 Michel Dänzer 2008-12-29 01:55:44 UTC
(In reply to comment #19)
> I ran into an issue that is probably related. What happens is that drawing a A8
> mask works without any issue, but as soon as the mask is scaled, it all breaks. 
> 
> I'll attach a small python script that exposes the issue. The problem goes away
> if you set the scale to 1 in the script or set any of the options that were
> mentioned earlier.

You didn't specify the expected output, but using the radeon driver, I'm getting one red square and one green square at either scale, so I'm assuming that's it. Maybe the geode driver doesn't handle transforms correctly or something like that.

BTW, the text problem mentioned in comment #17 was a glyph cache bug which is fixed on xserver master and server-1.6-branch.
Comment 22 Benjamin Berg 2008-12-29 02:30:49 UTC
(In reply to comment #21)
> You didn't specify the expected output, but using the radeon driver, I'm
> getting one red square and one green square at either scale, so I'm assuming
> that's it. Maybe the geode driver doesn't handle transforms correctly or
> something like that.

Yes, the red and green square is the expected output. The scale can be changed in line 20.
Comment 23 Mart Raudsepp 2009-01-09 10:05:28 UTC
Created attachment 21836 [details] [review]
Patch to fix mask transforms

This patch fixes the test case in attachment #21519 [details], but not the initial reported issue with buttons, and some other clearlooks drawn widgets (progress bars and scrollbars for me).
Comment 24 Mart Raudsepp 2010-03-16 05:46:28 UTC
(In reply to comment #23)
> Created an attachment (id=21836) [details]
> Patch to fix mask transforms

As an update to the casual reader here, that patch to fix mask transforms was applied to the driver a very long time ago, but the bug is still tracking the original issue about blacked out buttons (vertical gradients or whatever), as this wasn't the root cause afterall.

The original bug is making the driver unusable when allowed to do any acceleration on a modern desktop, so I guess I'll tweak the severity in one go as well. Current everyone is basically forced to disable acceleration through EXANoComposite and/or greedy migration heuristics, which might be at present the best idea anyway, as many things are unaccelerated and pixmap migration ping-pong is killing any performance gains hw acceleration might give.
Comment 25 Bernie Innocenti 2010-03-25 13:17:18 UTC
Confirmed, the work-around fixes "blacked out buttons" on the XO1 with Fedora 11.

Can we make MigrationHeuristic greedy by default in the driver, for the time being?
Comment 26 Michel Dänzer 2010-04-09 00:42:51 UTC
Does this still happen with xserver 1.7.x or newer? EXA now uses an a8r8g8b8 mask for text rendering if the driver can't render to a8.
Comment 27 Mart Raudsepp 2010-04-19 19:30:48 UTC
(In reply to comment #26)
> Does this still happen with xserver 1.7.x or newer? EXA now uses an a8r8g8b8
> mask for text rendering if the driver can't render to a8.

I don't think this bug here has anything to do with text rendering?

I have not been able to get completely black boxes myself with my themes, but other rendering issues I believe to possibly be the same issue still remain with xorg-server-1.8 (so I have considering this bug to cover this until I can prove they are unrelated by fixing my reproducable issue not fixing black buttons):


http://dev.gentoo.org/~leio/xorg/geode-rendering-bug_buttons.png (border issues)

http://dev.gentoo.org/~leio/xorg/geode-rendering-bug_scrollbar-progressbar.png (scrollbar has too thick and dark border, progress bar is black for one of the colors)
Comment 28 Michel Dänzer 2010-04-20 02:49:19 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > Does this still happen with xserver 1.7.x or newer? EXA now uses an a8r8g8b8
> > mask for text rendering if the driver can't render to a8.
> 
> I don't think this bug here has anything to do with text rendering?

You're probably right, I was getting mislead - if the problem was related to text, it shouldn't affect the button outside of the text.

Does

	Option		"EXAOptimizeMigration" "off"

make any difference for this?

Otherwise, it's basically about narrowing down what kind of composite operations are affected. BTW, AFAICT more failure cases could already be detected in the CheckComposite hook, might be interesting to try if that makes any difference.

P.S. AFAICT the driver would probably crash when passed in source-only (solid or gradient) pictures, where pSrc/pMask->pDrawable and pxSrc/Msk == NULL.
Comment 29 frank huang 2010-04-27 20:17:39 UTC
Keep my focus on this bug with help from Mart. 
Check this bug trace from top to the bottom. Is this bug has transfered from the "black button" to other rendering issue?
Comment 30 Mart Raudsepp 2010-04-28 01:53:18 UTC
I'm hoping that fixing the scrollbar/progress bar/button border problem will fix the black button problem as well. At least the progress bar and black button seem on the outside like the same thing - some vertical gradient drawing is rendered fully black instead, both in that progress bar case, and black button - except with my clearlooks version I haven't been able to reproduce the black button issue myself.
So if we find a fix for the progress bar, hopefully Warren can test if this fixes the black button case as well. Doesn't really matter to me if we handle it in the same bug, or make separate ones for scrollbars and button borders while we still think they might be the same bug. I would suggest definitely creating a new bug for any further rendering issues and once we find out that black button fix doesn't fix scrollbars and button borders as well as demonstrate in my screenshots.

Has anyone tested Option "EXAOptimizeMigration" "off" yet? I suspect it won't help, as this sounds like a newer xorg-server EXA option, and we have had these black button/progressbars since xorg-server-1.5 or even earlier
Comment 31 frank huang 2010-05-11 04:33:29 UTC
Created attachment 35565 [details]
a Xlib program to do the rendering work

We can use this instead a gtk program to trace the rendering work on geode. Simple XRenderComposite() is called.
Comment 32 frank huang 2010-05-11 04:34:40 UTC
This program creates a 100x100 window(pDst) on the screen , a 40x40 picture(pSrc, green rectangle, PICT_x8r8g8b8) and a 20x20 picture(pMask, do the mask with src, PICT_a8) to do the XRenderComposite(). The server  will use exaTryDriverComposite() to reply. If lx_check_composite() and lx_prepare_composite() can satisfy the rendering condition for Geode(return true), the real HW lx_do_composite() will be called. I use the XFlush() to directly send the package via network.  The result(png file) is attached.  Actually, the HW lx_do_composite() is not called because of  the code in lx_prepare_composite() below:
		/* FIXME: what to do here? */
		If(pSrc->pDrawable->width != 1 || pSrc->pDrawable->height != 1)
			return FALSE;
You can find that in the source code, there is still a question left here. Why the geode driver need the pSrc->pDrawable->width and pSrc->pDrawable->height to be all ‘1’? Otherwise, the lx_do_composite() will not triggered. After remarking this part code and recompiled driver, the Xlib application can call the lx_do_composite() function. But the result is not correct.  I still work to see the reason.

Frank
Comment 33 frank huang 2010-05-12 02:20:38 UTC
Update the test application
            What the application does is as below :
1)It creates a 100x100 window (color: white) for the destination picture

2)It creates a picture of 1x1 with the format PICT_x8r8g8b8(color: green) for the source picture, the source picture has the repeat attribute.
3)It creates a picture of 20x20 with the format PICT_a8(only alpha value to do alpha blend) for the mask picture.
4)Call the XRenderComposite() to trigger. The dst_x and dst_y are all 50, width and height are all 40. the mask_x and mask_y are all 5. So the alpha blend green region is 15x15
See the result of Radeon_X1200.png. It is the standard result. Run on my RS690/SB600 workstation. The rending function FUNC_NAME(RadeonComposite) on X1200 will be called. 
Because our driver can only support limit rendering, I choose this test case. This will use lx_do_composite() in the geode LX. I don’t know why our driver must require the pSrc->width and pSrc->height with value of 1 in lx_prepare_composite(). I follow this requirement in this application. You can see the HW rendering result with the picture of Geode_lx.png on geode platform.Apparently, there must be some HW rendinging bug in our driver. I will use this program instead of the progressbar gtk application to go on debugging.
  Source code is in render_test.zip. 
  You can compiled it with “gcc –g –v –Wall –o render main.c ops.c tests_10x10.c –Lxxx –lX11 –lXrender”
  With the help of this application’s debug, we will close to the root cause of this long-long ago historical rendering bug on geode platform.
Comment 34 frank huang 2010-05-12 02:22:21 UTC
Created attachment 35589 [details]
Updated rendering test application

Updated rendering test application
Comment 35 frank huang 2010-05-12 02:23:35 UTC
Created attachment 35590 [details]
Result with the rendering application on Radeon X1200

Result with the rendering application on Radeon X1200
Comment 36 frank huang 2010-05-12 02:24:00 UTC
Created attachment 35591 [details]
Result with the rendering application on Geode LX

Result with the rendering application on Geode LX
Comment 37 frank huang 2010-05-12 23:39:17 UTC
This program use the pMask value to do the composite, I have add the following line in lx_prepare_composite() 
+  if(pMsk)
+      return FALSE;
Right now, the ugly progressbar and scrollbar display quite normal. That is to say that our render part with mask value to do the rendering in composite has problem.
In our driver, we can only handle the PICT_a8 and PICT_a4 with the mask format. Let me check deeply what happens with this alpha mask picture in our driver.


Thanks,
Frank
Comment 38 frank huang 2010-05-24 02:26:35 UTC
Attache the picture of the progressbar drawing process on two platform. One is on geode while the other is on RS690. Please do a comparison.
Comment 39 frank huang 2010-05-24 02:30:10 UTC
Created attachment 35814 [details]
the src picture of a gtk progressbar program drawing process on Geode
Comment 40 frank huang 2010-05-24 02:31:55 UTC
Created attachment 35815 [details]
the dst picture of a gtk progressbar program drawing process on Geode
Comment 41 frank huang 2010-05-24 02:32:38 UTC
Created attachment 35816 [details]
the src picture of a gtk progressbar program drawing process on Radeon
Comment 42 frank huang 2010-05-24 02:33:23 UTC
Created attachment 35817 [details]
the dst picture of a gtk progressbar program drawing process on Radeon
Comment 43 frank huang 2010-05-27 23:33:01 UTC
I have fixed the Src operation on blend and composite operation. 
And right now the most urgent bug in rendering is Over operation. Most of the rendering issue are arosed by this. Seems our driver does not handle this operation correctly.
You can add the following lines in lx_check_composite() to avoid the rendering bug.
		If(op == PictOpOver)
			return false;
I’ll fix this bug.


Thanks,
Frank
Comment 44 Bernie Innocenti 2010-06-12 06:30:30 UTC
The patch applied to git for #28352 fixed this bug.

Thank you very much, Frank.
Comment 45 Martin-Éric Racine 2010-06-27 10:04:28 UTC
Bernie said in comment 44 that the patch used for #28352 fixed it, but it seems that it might not be the case after all. 

Bernie, could you clarify exactly which patch worked for you, so that we may test on other hardware and, if successful, merge it in GIT for the 2.11.9 release?
Comment 46 Martin-Éric Racine 2010-06-27 10:23:51 UTC
According to Bernie, https://bugs.freedesktop.org/attachment.cgi?id=36092 fixes it. However, the formatting of this patch is not suitable, so I would hope that we can find an updated version that would be suitable for committing to GIT.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.