Bug 20023 - XDamageAdd problem (with Gallium3D)
Summary: XDamageAdd problem (with Gallium3D)
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-09 12:43 UTC by Pekka Paalanen
Modified: 2009-02-11 11:06 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot of the corruption (895.53 KB, image/png)
2009-02-09 12:43 UTC, Pekka Paalanen
no flags Details

Description Pekka Paalanen 2009-02-09 12:43:31 UTC
Created attachment 22728 [details]
screenshot of the corruption

This bug is mostly for myself as a devel to keep track of the issue.
It appears on both my nv20 and nv28. Current testing is on nv28.

Nouveau Gallium3D nv20 driver has recently got e.g. trivial/tri to work. Trivial/tri-repeat is a version of trivial/tri, that cycles the colors and renders as fast as possible (RAFAP). Another RAFAP application is glxgears, which still shows only a black window.

When a RAFAP application is running, moving focus to a terminal window (aterm, xterm) sometimes paints a rectangle filled with a solid color into the terminal window. Examples of this are seen in the attached screenshot. The color of the solid color fill (SCF) is not random: the aterm cursor is a yellow block, the irc client has a blue status bar that updates occasionally, and gray is the basic color in the terminal. I have no idea where the red comes from.

When SCF first appears, it is a single complete solid rectangle, properly clipped by other windows, if they overlap. The holes seen in the screenshot are due to the terminal repainting some of its contents. The SCF seems to respect the terminal window borders, which does not make any sense. Sometimes the SCF stops at the same x-coordinate as the tri-repeat window, for instance.

The composite extension is disabled in the X server, so the terminal windows should not have a backing pixmap in VRAM. If they had, it would be easy to hypothesise that NV20 Gallium is overwriting some VRAM with solid color, and separate backing pixmaps would explain respecting the window borders and the clipping to other windows.

The fact, that the SCF is a rectangle, suggests that whatever draws it, has the "correct" pitch. Otherwise it would not have straight vertical edges.
Comment 1 Pekka Paalanen 2009-02-09 12:51:52 UTC
Hypothesis: Gallium hw ctx and EXA hw ctx are leaking state, incomplete context switching.

Test: disable all EXA acceleration.

NV04EXAPrepareSolid(), NV04EXAPrepareCopy(), NV10EXACheckComposite() were modified to return FALSE immediately. Additionally UTS and DFS were disabled via xorg.conf.

Result: no impact. The SCF appears as before.

Even though EXA should not be using hw acceleration, the DRM and EXA hw ctxs still exist. However, it is unlikely for the SCF to be caused by state leaking from a context to another.
Comment 2 Pekka Paalanen 2009-02-09 12:52:53 UTC
Hypothesis: NV2X_GRCTX_SIZE too small, which leads to contexts corrupting each other.

Doubled it, no change.
Comment 3 Pekka Paalanen 2009-02-09 13:16:01 UTC
The SCF appears only, when a RAFAP application is running, and a terminal window draws something. The "drawing something" is most easily triggered in aterm by switching the focus between a RAFAP window and the aterm window. Aterm fills the text cursor with yellow when it receives focus, and this fill may "blow up".

The SCF seems to be drawn directly onto the framebuffer, any rendering whatsoever will fix it for the rendered area. Except fluxbox window borders while moving windows (no window content shown while moving). Switching virtual desktops is a good way to clear it. SCF seems to never touch the background or the root window.

Having three trivial/tri-repeat applications running at the same time works fine, and it seems that the SCF happens more frequently/easily.

Todo:
- confirm it's not happening with swrast
- try disabling all 2D rendering (fill, clear, copy) in NV20 Gallium one by one

Ideas:
- could nv2x_graph_context_init (DRM) be incorrect?
- buggy hw ctx switching? Hope not! But nvidia did use sw ctx switching for some reason.
Comment 4 Pekka Paalanen 2009-02-09 13:44:09 UTC
Yay for malc0!

http://lists.freedesktop.org/archives/xorg/2009-February/043604.html
The program in that email triggers the same SCF as trivial/tri-repeat.
Compile it with:
 gcc -W -Wall -O2 Xdamage.c -o Xdamage -lX11 -lXfixes -lXdamage

Trivial/tri-repeat with swrast does not trigger the SCF.
Comment 5 Pekka Paalanen 2009-02-09 14:03:39 UTC
Trying to move this bug to Xorg.

Added Hellström to CC, since he made the minimal program that reproduces this issue. Change the summary according to Thomas' email.

Correction: I'm not adding Hellström to CC, since bugzilla does not allow it.
Comment 6 Michel Dänzer 2009-02-10 09:16:13 UTC
Bug 20037 could explain this, please try the fix referenced there.
Comment 7 Pekka Paalanen 2009-02-10 12:28:57 UTC
I updated xorg-server from git master, and then also evdev and mesa from git master. Thomas' test app no longer triggers the SCF.

When visuals get fixed and Mesa/OpenGL works again, I'll test with NV20 Gallium, and then close this bug as fixed.
Comment 8 Pekka Paalanen 2009-02-11 11:06:17 UTC
Okay, the visuals got fixed, and the SCF problem is fixed, too, in xorg-server master.

The damage accounting does not seem to fully work with partially occluded GL windows. If I have the top left corner of trivial/tri-repeat occluded by another window and then raise the GL window, fluxbox does not redraw the GL window decorations. But this is a completely different matter.

Closing as fixed. This bug really worried me, since I thought NV20 hw context switching might be broken. Phew.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.