Bug 6046

Summary: r300 problem when moving 3d window
Product: DRI Reporter: Benjamin Herrenschmidt <benh>
Component: GeneralAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: high CC: glisse, schnake
Version: XOrg git   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Fix aliasing bug none

Description Benjamin Herrenschmidt 2006-02-26 18:05:23 UTC
On my rv350 machine , when I launch glxgears, it displays fine.

Then, if I start moving the window only a few pixels (whatever window manager is
used, even a raw X with twm), suddenly, the content of the window "snaps" left
by about half of the window widh. That is, only the right part of the wheels
gets displayed in the left part of the window. This is the "snap" thing. From
there, you can continue moving the window, and the wheels won't move along. They
stay in their "snapped" position and thus only the part of the wheels covered by
the new window position is displayed (they do get properly clipped)

For example, before snap:

 |------|
 |  O o |
 |------|

After snap (moving  the window maybe one pixel)

 |------|
 |) o   |
 |------|

Then, if I move the window right a bit

  |------|
  | o    |
  |------|

The same built works fine on an rv250

In the end, things like full screen apps work fine, but windowed apps are
broken. This was verified with both EXA and XAA. Everything is from CVS HEAD as
of today (server, r300 DRI/Mesa source, kernel DRM). Machine is a ppc laptop
(bug endian). The chip is an rv350 (4e50)
Comment 1 Benjamin Herrenschmidt 2006-02-27 15:45:24 UTC
an additional detail is while moving the window in any direction causes the
initial "snap to left" to happen, the gears will properly follow the window
vertically, it's only the horizontal offset that stays locked to the left.
Comment 2 Benjamin Herrenschmidt 2006-02-27 16:12:40 UTC
In fact, more than just moving the window around causes the "snap to left". For
example here, just covering partially the window will cause it to happen... I
would appreciate some pointers to where the various window offsets are
calculated and used in the DRI to try to track down what's going on wrong... I
suspect something we aren't properly re-initializing when swapping contexts but
I'm not too sure...
Comment 3 Benjamin Herrenschmidt 2006-02-27 16:25:10 UTC
Ok, by randomly poking around, I found a workaround which may help somebody who
understand that code better find out precisely what is wrong:

In r300InvalidateState(), if I comment out the #ifndef CB_DPATH so that it does
call r300ResetHwState(), then the bug is gone.

I suspect there is some state we properly init when doing that reset thing but
that we don't properly restore later on, but what preciesly is beyond my
understanding at this point.
Comment 4 Aapo Tahkola 2006-02-28 00:24:56 UTC
(In reply to comment #3)
> Ok, by randomly poking around, I found a workaround which may help somebody who
> understand that code better find out precisely what is wrong:
> 
> In r300InvalidateState(), if I comment out the #ifndef CB_DPATH so that it does
> call r300ResetHwState(), then the bug is gone.

Try calling r300UpdateWindow in r300InvalidateState.

If that helps, try:
	R300_FIREVERTICES(rmesa);
	R300_STATECHANGE(rmesa, vpt);

Comment 5 Benjamin Herrenschmidt 2006-02-28 09:22:57 UTC
Ok, so:

 - Adding r300UpdateWindow(ctx); instead of r300ResetHwState(r300); (at the same
location) inside r300InvalidateState() does fix the problem. Good.

 - However, just doing 

       R300_FIREVERTICES(r300);
       R300_STATECHANGE(r300, vpt);

instead does _NOT_ fix it.

 - Now, if I duplicate r300UpdateWindow(), calling the new one
r300UpdateWindow2() which does exactly as r300UpdateWindow() does _without_ the
2 lines above (that is basically keep only the part that updates the various
rmesa->hw.vpt.cmd[] entries). The problem is fixed too.

 - I finally went all the way down to isolating which bit actually fixes it and it's

	rmesa->hw.vpt.cmd[R300_VPT_XOFFSET] = r300PackFloat32(tx);

Just that line fixes the problem.

So for some reason, that XOFFSET thing gets corrupt.... I don't know enough of
what's going on in there, maybe it's normal that some or all of these gets
clobbered and they indeed should be restored... or maybe not, I'll let you guys
decide on the proper fix. That routine seems to be called fairly often, at least
in a windowed environment, so it seems performance critical enough to try to
optimize it.

While I was there, I also noticed an unrelated bit=:

r300Viewport() does

	R300_FIREVERTICES(R300_CONTEXT(ctx));
	r300UpdateWindow(ctx);

Isn't the first line redundant ? (That is, r300UpdateWindow does the
FIREVERTICES thing already).







Comment 6 Benjamin Herrenschmidt 2006-02-28 09:43:21 UTC
So I added some more debug and it seems that R300_VPT_XOFFSET is regulary
getting clobbered with the value 0xf729860 (always the same) when I move the
window, or cover it, or whatever ...

I'll try to investigate what is causing this later.
Comment 7 Benjamin Herrenschmidt 2006-02-28 17:54:14 UTC
Ok, that's getting fun :)

So building with -O1 -> no bug

Rebuilding with -O3 -> a different bug is back (the wheel completely disappear
when moving the window, calling r300UpdateWindow still fixes it, thus I think
another field gets corrupted in there).

Smells like either a compiler bug or a bug in the driver or Mesa vs. strict
aliasing. I'll investigate more.
Comment 8 Felix Kühling 2006-03-01 03:12:50 UTC
(In reply to comment #7)
> Ok, that's getting fun :)
> 
> So building with -O1 -> no bug
> 
> Rebuilding with -O3 -> a different bug is back (the wheel completely disappear
> when moving the window, calling r300UpdateWindow still fixes it, thus I think
> another field gets corrupted in there).
> 
> Smells like either a compiler bug or a bug in the driver or Mesa vs. strict
> aliasing. I'll investigate more.

I'm building the binary snapshots with -fno-strict-aliasing because there were
problems, I think in mach64, related to clipping. This is probably a different
issue, though.
Comment 9 Benjamin Herrenschmidt 2006-03-01 04:47:55 UTC
OK, I verified that indeed, just rebuilding r300_state.c with
-fno-strict-aliasing fixes it. I'll have to dig through that file now to figure
out where the aliasing bug is though, probably a dodgy cast.
Comment 10 Benjamin Herrenschmidt 2006-03-01 09:41:40 UTC
Created attachment 4784 [details] [review]
Fix aliasing bug
Comment 11 Benjamin Herrenschmidt 2006-03-01 10:39:44 UTC
Fixed in CVS, I leave the bug open until we have audited radeon, r200 and mach64
for similar bugs
Comment 12 Michael Schnake 2006-04-04 19:54:37 UTC
FYI, I have exactly the same "half screen" problem with glxgears (and others) 
here on a ThinkPad A30 / Radeon Mobility M6 LY / Xorg 7.0 / xf86-video-ati 
6.5.7.3 / Gentoo

Additionally, I found an effect described (and showed) for XGL at 
http://gentoo-wiki.com/HOWTO_XGL:Troubleshooting#Half-screen_problem that looks 
quite similar.
Comment 13 Roland Scheidegger 2006-04-04 23:56:33 UTC
(In reply to comment #12)
> FYI, I have exactly the same "half screen" problem with glxgears (and others) 
> here on a ThinkPad A30 / Radeon Mobility M6 LY / Xorg 7.0 / xf86-video-ati 
> 6.5.7.3 / Gentoo
I have fixed the dodgy casts in the viewport updates for radeon and r200, so if
it's caused by that the problem should no longer be present with Mesa CVS (or
the 6.5 release). That said, there are still potential aliasing problems in the
code, I'd suggest you compile with -fno-strict-aliasing for the time being.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.