Bug 15513 - r200's render accel broken
r200's render accel broken
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Server/Acceleration/EXA
git
x86 (IA32) Linux (All)
: medium normal
Assigned To: Xorg Project Team
Xorg Project Team
:
: 15610 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-14 20:43 UTC by Andrew Randrianasulu
Modified: 2010-12-22 02:53 UTC (History)
5 users (show)

See Also:


Attachments
xorg.conf (with "virtual 2048 2048" line) (6.42 KB, text/plain)
2008-04-14 20:45 UTC, Andrew Randrianasulu
no flags Details
X.org log (42.98 KB, text/plain)
2008-04-14 20:45 UTC, Andrew Randrianasulu
no flags Details
both kde3 and gtk2 programs are affected (141.28 KB, image/jpeg)
2008-04-15 09:26 UTC, Andrew Randrianasulu
no flags Details
desktop being messed up with renderaccell and virtual screen enabled (184.38 KB, image/jpeg)
2008-04-16 05:32 UTC, Christian Schmitt
no flags Details
second shot (172.35 KB, image/jpeg)
2008-04-16 05:38 UTC, Christian Schmitt
no flags Details
Some minor offscreen area eviction improvements (3.94 KB, patch)
2008-04-20 03:14 UTC, Michel Dänzer
no flags Details | Splinter Review
Snapshot from F9 KDE4 desktop with konsole & Dolphin (621.62 KB, image/png)
2008-04-20 03:32 UTC, Stefan Becker
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Randrianasulu 2008-04-14 20:43:04 UTC
hw:

01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 SE] (rev 01) (prog-if 00 [VGA])
        Subsystem: Hightech Information System Ltd. Excalibur 9200SE VIVO 128M
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
        Memory at d0000000 (32-bit, prefetchable) [size=128M]
        I/O ports at c800 [size=256]
        Memory at dfef0000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at dfec0000 [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
        Capabilities: [50] Power Management version 2

software:

kernel - 2.6.24.1 (also tested: 2.6.21.7 and 2.6.23.14)
libpixman - commit 53882228c9bbd50609e2858502b9bc087ca76903
xf86-video-ati - commit 1286fe5ce1c77453d57817b9b26b1bdb32ca7bc8 (6.8.0 also affected)
xserver - commit 35982bc109d424c464551ab22ec90af69908c884


start X with huge virtual screen (2048x2048) or create a lot of windows - and soon you will see some artefacts, mostly on small things like glyphs.

reverting  93d876891dbba41b920a9a29a5de77f647f43928 and 8074676d2df8d577b443e3fa5e22d7c71c944bd1 ["EXA: Improve the algorithm used for tracking offscreen pixmap use." , "EXA: Optimize the eviction scanning loop in exaOffscreenAlloc."] seems to fix this. And Option "RenderAccel"   "0" fix it too. Bug was not visible with latest nouveau and GF2-MX400 (32Mb vram).

Playing with AGP settings, AccelDFS and even completely disabling  AGP doesn't help here.
Comment 1 Andrew Randrianasulu 2008-04-14 20:45:06 UTC
Created attachment 15919 [details]
xorg.conf (with "virtual 2048 2048" line)
Comment 2 Andrew Randrianasulu 2008-04-14 20:45:43 UTC
Created attachment 15920 [details]
X.org log
Comment 3 Michel Dänzer 2008-04-15 02:01:32 UTC
Does the problem still occur if you revert just 8074676d2df8d577b443e3fa5e22d7c71c944bd1? Does not enabling AccelDFS really make no difference?

One thing I notice now is that the new algorithm may assign a non-zero eviction cost to available areas, but I'm not sure how that could cause corruption... it may just be the different (de-)allocation patterns exposing driver bugs.

Any other ideas, Fredrik?
Comment 4 Andrew Randrianasulu 2008-04-15 03:14:03 UTC
If i revert only 8074676d2df8d577b443e3fa5e22d7c71c944bd1 - bug still here. And option AccelDFS "0" makes no difference.
Comment 5 Michel Dänzer 2008-04-15 07:54:05 UTC
Can you attach a screenshot showing the problem?
Comment 6 Fredrik Höglund 2008-04-15 08:52:40 UTC
I have seen occasional glyph corruption with an R200, but that's with EXAOptimizeMigration, so I didn't interpret that as a regression in the new algorithm.

One thought though is that the new algorithm favors evicting small pixmaps, and that means glyphs.
Comment 7 Andrew Randrianasulu 2008-04-15 09:26:35 UTC
Created attachment 15929 [details]
both kde3 and gtk2 programs are affected
Comment 8 Michel Dänzer 2008-04-15 10:19:52 UTC
(In reply to comment #6)
> I have seen occasional glyph corruption with an R200, but that's with
> EXAOptimizeMigration, so I didn't interpret that as a regression in the new
> algorithm.

If you're referring to the manpage description for EXAOptimizeMigration, that's a different issue, mostly affecting existing windows that get redirected when a compositing manager like compiz starts, or 'special' windows such as the xterm context menus.

I did see exactly one corrupted glyph before, but I thought it was a fluke... until now. Again it's just exactly one glyph though, which gets misrendered in most but not all applications of my session. Interestingly, while it's always the same glyph, the corruption varies between different rendering 'passes'. So it seems not to be some kind of random corruption of the glyph pixmap data, but something more systematic such as corruption of the pixmap data structures. Weird...

> One thought though is that the new algorithm favors evicting small pixmaps, and
> that means glyphs.

Right, the new pattern may just expose a bug somewhere else...

P.S. I'm not sure the Eterm corruption in the screenshot is the same problem - AFAICT it doesn't even use the RENDER extension for text rendering.
Comment 9 James Cloos 2008-04-15 11:36:23 UTC
I'm having a similar problem on r100 (7500 m7; pci 1002:4c57).

I've tried several variations of the config, and the only one which avoided the rendering anomalies was too slow to use.

My next test is to enable EXAOptimizeMigration.

(It has been a while, but I *think* I recall that MigrationHeuristic greedy was fastest but with more anomalies, smart has a few and is almost as fast as radeon was before libpciaccess, and the default is too slow.)

Most of the time I only see the bug with urxvt, and with smart I generally only see it when moving the cursor around in a command line or in top(1)'s display.

I did, however, lose the 'f' glyph the other day in the font I use for icewm's icon row and 'zilla's tabs.  Switching to another vt and back to X fixed that.

With the settings which gave the best speed Emacs also had some corruption.

In all cases this is with RGBA subpixel and black text on an off-white background.

(I had to stay back for a while when pciaccess was merged because of the bug which was hit by things like icewm's menus and which locked the server; I did the switch after the render accel commit was posted to the commits list, but I had made so many changes trying to avoid the input bug before I found out its cause that I can't compare what I had been using config-wise to my current config....  But it is a lot slower than before, with X using a lot more CPU.)
Comment 10 Christian Schmitt 2008-04-16 03:31:44 UTC
Let me report my findings too:

r300, xorg and MESA from git. The corruption makes the machine unuseable when I enable RenderAccel and a virtual screen (2304x1024). Disabling RenderAccel makes it useable again, disabling the virtual screen and enabling RenderAccel, too. I still get some slight corruptions from time to time, but nothing compared to the mess when enabling both options.

I could post a screenshot if you want me to.


Comment 11 Michel Dänzer 2008-04-16 03:57:18 UTC
(In reply to comment #10)
> I could post a screenshot if you want me to.

Please do; I haven't been able to reproduce more than corruption of a single glyph.
Comment 12 Christian Schmitt 2008-04-16 05:32:37 UTC
Created attachment 15949 [details]
desktop being messed up with renderaccell and virtual screen enabled

Voila. The screenshot shows a medium corruption. Sometimes I get text in a terminal being rendered in many different colors, too.
Comment 13 Christian Schmitt 2008-04-16 05:38:52 UTC
Created attachment 15950 [details]
second shot
Comment 14 Michel Dänzer 2008-04-16 06:37:42 UTC
(In reply to comment #9)
> I did, however, lose the 'f' glyph the other day in the font I use for icewm's
> icon row and 'zilla's tabs.  Switching to another vt and back to X fixed that.

Interesting, can others confirm that switching VTs fixes the corrupted glyphs?

> In all cases this is with RGBA subpixel and black text on an off-white
> background.

Does subpixel AA vs. none seem to make any difference? I'm not using it.
Comment 15 Fredrik Höglund 2008-04-16 10:09:48 UTC
(In reply to comment #8)
> I did see exactly one corrupted glyph before, but I thought it was a fluke...
> until now. Again it's just exactly one glyph though, which gets misrendered in
> most but not all applications of my session. Interestingly, while it's always
> the same glyph, the corruption varies between different rendering 'passes'. So
> it seems not to be some kind of random corruption of the glyph pixmap data, but
> something more systematic such as corruption of the pixmap data structures.
> Weird...

I've seen it happen with more than one glyph, maybe up to 3 or 4.
I can also confirm the corruption sometimes changing between rendering passes.

To me it looks as if it's rendering the wrong pixmap with the wrong pitch, because with subpixel AA the color still looks reasonably uniform. In one case I've actually seen an 'M' replaced by an 'O', and thought it was a typo at first.

Another possibility is that the pixmap data isn't uploaded when the pixmap is migrated in, so it renders what was in the area before.
Comment 16 Michel Dänzer 2008-04-19 10:22:39 UTC
*** Bug 15610 has been marked as a duplicate of this bug. ***
Comment 17 Michel Dänzer 2008-04-20 03:14:54 UTC
Created attachment 16051 [details] [review]
Some minor offscreen area eviction improvements

Would this change happen to help at all for this by any chance? It mostly prevents available areas from contributing non-zero eviction cost.
Comment 18 Stefan Becker 2008-04-20 03:32:30 UTC
Created attachment 16053 [details]
Snapshot from F9 KDE4 desktop with konsole & Dolphin

My findings:

 - it can happen with multiple glyphs. It gets worse over time

 - broken glyphs can have random contents or replace contents, e.g. "3" instead of "u"

 - vt switch does not fix it

 - a window refresh can fix it

 - I can reproduce it most of the time by opening a Dolphin window and dragging the mouse over file icons and the tool & menu bar (causes redraws)
Comment 19 Christian Schmitt 2008-04-20 05:25:41 UTC
(In reply to comment #17)
> Created an attachment (id=16051) [details]
> Some minor offscreen area eviction improvements
> 
> Would this change happen to help at all for this by any chance? It mostly
> prevents available areas from contributing non-zero eviction cost.
> 

I just applied the patch and so far it looks like it helped indeed. I re-enabled the virtual screen and could not see any corruptions so far.
Comment 20 Andrew Randrianasulu 2008-04-20 13:43:36 UTC
(In reply to comment #17)
> Created an attachment (id=16051) [details]
> Some minor offscreen area eviction improvements
> 
> Would this change happen to help at all for this by any chance? It mostly
> prevents available areas from contributing non-zero eviction cost.
> 

xserver from git 14396fdebac1868df17559220ed7aaa34c34251e +  exa-offscreen-eviction.diff = no corruption. Thanks!
Comment 21 Stefan Becker 2008-04-20 23:57:30 UTC
xserver patch seems to help. Didn't see any glyph corruption since yesterday...

When will you commit it to the xserver git?
Comment 22 Michel Dänzer 2008-04-21 02:12:00 UTC
Patch pushed to the master branch, thanks for testing. Unforunately, I think I've seen the single glyph corruption even with this change, so there may still be another problem that just happens to be more or less visible depending on the eviction pattern.
Comment 23 James Cloos 2008-04-22 23:48:16 UTC
A followup to my earlier comment:

It is MigrationHeuristic greedy which is slow but without corruption;
using smart is faster but some corruption occurs — primarily whenever
glyphs are overwritten in apps like rxvt-unicode.  But also on rare
occasion where there is no overwriting.

(By overwriting, I mean things like backspacing through a command line,
or top(1)’s display.  In the former case the corruption is limited to
just the glyph currently right of the cursor (and thus most recently
rewritten) but the latter case hits every instance of the glyph
currently displayed.)

Corruption when there was no overwriting always hits every instance of
the given glyph.

The recent patch to xserver did not help; I’m currently at git commit
8e3c1dfc48930c455529313a42efa35e3b9071b2.
Comment 24 Michel Dänzer 2008-04-23 02:43:04 UTC
(In reply to comment #23)
> The recent patch to xserver did not help;

But your problems disappear if you go back before the commits mentioned in the original report here? If not, they should probably be tracked separately.

Either way though, please provide your full xorg.conf and Xorg.0.log files.
Comment 25 James Cloos 2008-04-23 12:15:32 UTC
> Either way though, please provide your full xorg.conf and Xorg.0.log files.

I'll attach the current ones; these will be with greedy.  I also have a
log with smart from a few days ago, but there is no significant difference.
Comment 26 Michel Dänzer 2009-02-11 06:26:44 UTC
Is this still an issue with current xserver and driver?
Comment 27 Bernie Innocenti 2009-05-11 09:28:35 UTC
I think it's not r200 specific.  These screenshots come from intel_drv 2.7.0 on F-11:

 http://www.codewiz.org/pub/xorg_intel_font_and_pixmap_mess.png
 http://www.codewiz.org/pub/xorg_intel_font_mess.png
 http://www.codewiz.org/pub/xorg_intel_stripes_bug.png

If you agree it's the same bug, please update the bug summary to reflect this.
Comment 28 Michel Dänzer 2009-05-12 02:26:46 UTC
(In reply to comment #27)
> I think it's not r200 specific.  These screenshots come from intel_drv 2.7.0 on
> F-11:

I doubt it's the same bug, but even if it was, you're using UXA (right?) so you have to track it separately anyway.
Comment 29 Bernie Innocenti 2009-05-12 03:06:14 UTC
(In reply to comment #28)
> I doubt it's the same bug, but even if it was, you're using UXA (right?) so you
> have to track it separately anyway.

Thanks Michel, will do.
Comment 30 Michel Dänzer 2010-12-22 02:53:37 UTC
Assuming that any core EXA issues here have been fixed. Any remaining similar symptoms should probably be tracked against the drivers at least initially.