Bug 5876 - big xvideo issues with ATI Radeon RV100 family (Mobility M6 LY, ...)
Summary: big xvideo issues with ATI Radeon RV100 family (Mobility M6 LY, ...)
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: git
Hardware: x86 (IA32) All
: high major
Assignee: xf86-video-ati maintainers
QA Contact:
URL:
Whiteboard:
Keywords: regression
: 8938 (view as bug list)
Depends on:
Blocks: xorg-7.2
  Show dependency treegraph
 
Reported: 2006-02-13 23:48 UTC by Marc Espie
Modified: 2007-02-22 02:42 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
crude patch disabling the erratas all together (961 bytes, patch)
2006-09-21 05:59 UTC, Matthieu Herrb
no flags Details | Splinter Review

Description Marc Espie 2006-02-13 23:48:24 UTC
mplayer and ogle show badly broken video on mpeg2, mpeg4 (avi), realmedia
at least, when xvideo is used.

Some chunks of the video are flickering, and the video freezes for a few 
seconds from time to time.

This is very specific to this Radeon model.  This has been shown to be 
reproducible by both Bernd Ahlers (bernd@openbsd.org) and Robert Nagy 
(robert@openbsd.org) on three distinct laptops: two thinkpads, and a
 Dell model.

This is a regression from the pre-6.9.0 radeon driver.  Replacing the driver 
modules with the old driver, compiled with the newer XOrg framework works just 
fine, without changing anything else.

You can ask Matthieu Herb (matthieu@openbsd.org) for specific details about X 
integration in OpenBSD.
Comment 1 Alex Deucher 2006-02-14 06:47:20 UTC
this is probably related to the radeon gatos code that got merged.
Comment 2 Alex Deucher 2006-04-01 01:22:07 UTC
can you try with cvs HEAD or the ati-1.0-branch?  I think Roland fixed this.
Comment 3 Marc Espie 2006-04-01 22:38:35 UTC
(In reply to comment #2)
> can you try with cvs HEAD or the ati-1.0-branch?  I think Roland fixed this.

Tried it with matthieu's help, no change. Still broken in the exact same way.
Comment 4 Michel Dänzer 2006-07-02 13:17:25 UTC
Can you try isolating the change that caused the regression with git bisect?
Comment 5 Erik Andren 2006-07-28 10:11:25 UTC
Did the bisect generate any results?
Comment 6 Matthieu Herrb 2006-09-21 04:40:38 UTC
I finally got access to Marc's hardware, and after some manual CVS dissect,
it's this commit which broke things: 5be4bf9000bdf58584a10a6b8e285d0f173304fa
Comment 7 Matthieu Herrb 2006-09-21 05:48:51 UTC
I also tried current drivers from git both master and agd5f-superpatch still
mis-behave on this hardware (Mobility M6).
Comment 8 Matthieu Herrb 2006-09-21 05:59:21 UTC
Created attachment 7109 [details] [review]
crude patch disabling the erratas all together

The attached crude patch effectively fixes the problem. So now, the question
is, how to disable those errata only on Mobility M6 chipsets ?
Comment 9 Matthieu Herrb 2006-10-08 06:09:47 UTC
Could this be fixed before 7.2? 
Comment 10 Alex Deucher 2006-10-08 15:22:37 UTC
(In reply to comment #8)
> Created an attachment (id=7109) [edit]
> crude patch disabling the erratas all together
> 
> The attached crude patch effectively fixes the problem. So now, the question
> is, how to disable those errata only on Mobility M6 chipsets ?

The problem is that the errata are need on M6 chips to avoid lockups or clock
problems IIRC.  Can you try adding a usleep of various periods after the OUTPLL
calls in radeon_video.c?  If that doesn't help you might try forcing the overlay
clock on as we do in for the IGP chips in RADEONAllocAdaptor().
Comment 11 Daniel Stone 2006-11-04 09:41:26 UTC
marc/matthieu: ping?
Comment 12 Roland Scheidegger 2006-12-03 15:10:52 UTC
*** Bug 8938 has been marked as a duplicate of this bug. ***
Comment 13 Roland Scheidegger 2006-12-03 15:27:26 UTC
So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could
you try that out by not reverting the patch but just commenting it out there
(the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so,
there isn't really a reason we do this every video frame. Could just recalc the
ecp div and only submit to hw if it actually changes - just need to be careful
the value still gets fixed up after console switching etc.
Comment 14 Roland Scheidegger 2006-12-03 15:54:55 UTC
Also, what framerate do you get with the test program (from #8938) with that
OUTPLL removed (or with the patch disabling the pll patches, if it makes any
difference)? I'm just asking because I'm wondering what's up with that 50fps
limitation - the driver just waits 5ms on a OUTPLL, which should still be enough
for 200fps...
Comment 15 Alex Deucher 2006-12-03 18:33:30 UTC
(In reply to comment #13)
> So, is the slow video only caused by that OUTPLL in radeonDisplayVideo? Could
> you try that out by not reverting the patch but just commenting it out there
> (the other OUTPLL calls in radeon_video.c shouldn't be problematic)? If so,
> there isn't really a reason we do this every video frame. Could just recalc the
> ecp div and only submit to hw if it actually changes - just need to be careful
> the value still gets fixed up after console switching etc.
> 

I don't think it's the OUTPLL itself per se, but the additional stuff introduced
by the OUTPLL errata hander.
Comment 16 Pierre Ossman 2006-12-03 22:38:53 UTC
I'll do your test later tonight, but until then some other insight:

The test program now gives around 300 fps (previously 50).

Also, the bug isn't gone now, just harder to trigger. So I suspect that the real
bug is that double buffering is slightly broken, and these bugs are just
instances of running into that problem.

I don't know how double buffering on the radeon works, but I assume that the hw
has two buffers (A and B), and switches between them on vsync blank. But what
makes sure that if A is displayed and we're writing into B, that the switch
doesn't occur until we've written an entire frame to B?
Comment 17 Michel Dänzer 2006-12-04 03:14:02 UTC
(In reply to comment #16)
> 
> I don't know how double buffering on the radeon works, but I assume that the hw
> has two buffers (A and B), and switches between them on vsync blank. But what
> makes sure that if A is displayed and we're writing into B, that the switch
> doesn't occur until we've written an entire frame to B?

The driver only initiates the flip after the upload of the new frame is complete
(with DMAForXv/DRI disabled, anyway...). One thing it doesn't do yet though is
wait for the previous flip to complete before uploading a new frame, which could
result in the upload overwriting the frame that's being displayed. Patch #7940
from bug 8938 was an attempt at fixing that, does it make any difference (for
better or worse) now?
Comment 18 Pierre Ossman 2006-12-04 03:51:42 UTC
It is my intention to test that patch, but I haven't had the time yet. Hopefully
this evening.
Comment 19 Pierre Ossman 2006-12-04 10:10:17 UTC
Unfortunately, the patch had no real effect this time either... :/
Comment 20 Pierre Ossman 2006-12-04 10:43:40 UTC
Removing just that call to OUTPLL is insufficient, so there is something more to it.

I've been trying to get my test program to provoke the bug, but no luck so far.

It seems the movies that trigger the problem are also those heavily compressed,
and hence CPU hungry. So perhaps there is some race in the driver where you
check some register, do something else, then act on that register. And in the
cases where X doesn't get enough CPU time, that action is now wrong because the
register has changed.

I can't really decrypt the register jumble going on in these routines so I'm
having trouble checking this myself. :)
Comment 21 Pierre Ossman 2006-12-04 11:07:52 UTC
I think I've managed to prove that this is a not a problem with the hardware
flipping buffers at least (provided my test is correct).

What I've done is add a large usleep() (or several in a for loop since something
seems to abort the sleeps) in RADEONPutImage() and/or RADEONDisplayVideo(). The
behaviour I get is consistent with the initial problem, but I can now produce it
on both cards.

My big discovery here is that I can see that the tear stays present for an
entire frame. So it's not a matter of sync problem between RAMDAC and buffer
shuffling, but that the buffer is indeed incorrect for some reason.
Comment 22 Pierre Ossman 2006-12-04 11:13:22 UTC
This gives me almost identical behaviour on both cards (including the
intermittent "hang"):

diff --git a/src/radeon_driver.c b/src/radeon_driver.c
diff --git a/src/radeon_video.c b/src/radeon_video.c
index 5138ce1..ffc6429 100644
--- a/src/radeon_video.c
+++ b/src/radeon_video.c
@@ -2810,6 +2810,7 @@ RADEONPutImage(
    int top, left, npixels, nlines, bpp;
    BoxRec dstBox;
    CARD32 tmp;
+   int i;
 
    /*
     * s2offset, s3offset - byte offsets into U and V plane of the
@@ -2956,6 +2957,9 @@ RADEONPutImage(
        break;
     }
 
+    for (i = 0;i < 2;i++)
+       usleep(10000);
+
     /* update cliplist */
     if(!REGION_EQUAL(pScrn->pScreen, &pPriv->clip, clipBoxes))
     {
Comment 23 Roland Scheidegger 2006-12-04 12:39:26 UTC
(In reply to comment #20)
> Removing just that call to OUTPLL is insufficient, so there is something more
to it.
Are you sure you got the right OUTPLL? I've verified that no other PLL reads or
writes occur other than there when playing a video.
I've also tried enabling the chip errata on my chip rv250. Guess what happens -
nothing. Well actually something does happen, cpu time will be LOWER. The likely
reason being that X won't consume cpu cycles during the usleep, otherwise it
probably would have had to wait for RADEONWaitForFIFO anyway since the gpu is
still busy uploading the data (this effect probably could only be seen with dma
for xv)...
And I was wrong assuming you could still get 200fps even with those pll erratas.
INPLL causes this wait just as well, so the max would be 100fps.
Comment 24 Pierre Ossman 2006-12-04 12:50:47 UTC
I commented out this thing:

    OUTPLL(pScrn, RADEON_VCLK_ECP_CNTL,
	   (INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8));
Comment 25 Roland Scheidegger 2006-12-04 12:56:03 UTC
(In reply to comment #24)
> I commented out this thing:
> 
>     OUTPLL(pScrn, RADEON_VCLK_ECP_CNTL,
> 	   (INPLL(pScrn, RADEON_VCLK_ECP_CNTL) & 0xfffffCff) | (ecp_div << 8));
Well ok then (I just asked because it's easy to get the wrong one as the same
code can be found in another function...). I've no explanation why it wouldn't
work even without that OUTPLL/INPLL pair.
Comment 26 Pierre Ossman 2006-12-04 13:09:58 UTC
How about the delay patch? Does it seem reasonable? Since it is right after the
data copy, I assume it should be something the driver can handle (but it isn't).
Comment 27 Roland Scheidegger 2006-12-04 15:38:42 UTC
I actually think now the behaviour is correct. At least the "tearing" I got when
making the driver wait wasn't tearing, but simply the result of the asynchronous
X protocol. What happens if the driver is too slow is that somewhere those
XvPutImage requests get queued up. Since mplayer only uses 2 different buffers,
and the driver just does all requests in-order, it will end up copying buffers
which no longer exist (i.e. have been replaced with newer data), if it gets
enough cpu time (that is, if it had 20 requests queued up, and would now get
enough cpu time it would copy the newest 2 buffers for 10 times till the queue
is empty), which will obviously look broken as the image gets back in time...
Comment 28 Pierre Ossman 2006-12-04 22:24:19 UTC
So mplayer assumes that by the time it wants to fill that buffer again, X has
finished copying it. That sounds very broken, and not really consistent with
observed behaviour.

If mplayer didn't pay attention to what X is doing with the buffer, then it
wouldn't be able to detect and whine about "your machine is too slow to play this".

Right, in libvo/vo_xv.c we can find this:

static void flip_page(void)
{
    put_xvimage( xvimage[current_buf] );

    /* remember the currently visible buffer */
    visible_buf = current_buf;

    if (num_buffers > 1)
    {
        current_buf =
            vo_directrendering ? 0 : ((current_buf + 1) % num_buffers);
        XFlush(mDisplay);
    } else
        XSync(mDisplay, False);
    return;
}
Comment 29 Roland Scheidegger 2006-12-05 02:43:46 UTC
(In reply to comment #28)
> So mplayer assumes that by the time it wants to fill that buffer again, X has
> finished copying it. That sounds very broken, and not really consistent with
> observed behaviour.
> 
> If mplayer didn't pay attention to what X is doing with the buffer, then it
> wouldn't be able to detect and whine about "your machine is too slow to play
this".
> 
> Right, in libvo/vo_xv.c we can find this:
> 
> static void flip_page(void)
> {
>     put_xvimage( xvimage[current_buf] );
> 
>     /* remember the currently visible buffer */
>     visible_buf = current_buf;
> 
>     if (num_buffers > 1)
>     {
>         current_buf =
>             vo_directrendering ? 0 : ((current_buf + 1) % num_buffers);
>         XFlush(mDisplay);
>     } else
>         XSync(mDisplay, False);
>     return;
> }
So I can't see where it would wait or notice X hasn't finished the last request.
XFlush does nothing, and XSync isn't called. It will never show the "too slow"
message unless mplayer itself does not get enough cpu time to decode a picture
or the event queue would be full (I think - not sure there what happens in that
case). Yes this looks a bit broken, but under the assumption that the decoding
time of mplayer is larger than what the driver needs, and the schedular is at
least somewhat fair, it's only going to be a problem if the box is too slow
anyway, so who cares.



Comment 30 Michel Dänzer 2006-12-05 03:10:56 UTC
(In reply to comment #23)
> The likely reason being that X won't consume cpu cycles during the usleep,
> otherwise it probably would have had to wait for RADEONWaitForFIFO anyway
> since the gpu is still busy uploading the data

BTW, most if not all of these RADEONWaitForFIFO() calls are probably bogus. I
doubt the overlay registers go through the FIFO, but even if they do,
RADEONWaitForFIFO() is useless while the CP is busy, as it would likely fill up
any available FIFO slots anyway.

> And I was wrong assuming you could still get 200fps even with those pll erratas.
> INPLL causes this wait just as well, so the max would be 100fps.

That's assuming the delays are accurate... which I suspect is rather unlikely,
although in both directions due to scheduling latency and signals.
Comment 31 Pierre Ossman 2006-12-05 03:49:26 UTC
(In reply to comment #29)
> So I can't see where it would wait or notice X hasn't finished the last request.
> XFlush does nothing, and XSync isn't called. It will never show the "too slow"
> message unless mplayer itself does not get enough cpu time to decode a picture
> or the event queue would be full (I think - not sure there what happens in that
> case).

Actually, XFlush is only for double buffering, which is disabled by default. So
during my tests it has been calling XSync.
Comment 32 Roland Scheidegger 2006-12-05 04:38:45 UTC
(In reply to comment #31)
> Actually, XFlush is only for double buffering, which is disabled by default. So
> during my tests it has been calling XSync.
Umm, mplayer certainly does double buffering by default. The manpage explicitly
mentions -nodouble is only meant for debugging...
Comment 33 Pierre Ossman 2006-12-05 04:46:09 UTC
Seems you're right. I did a grep for vo_doublebuffering and got "int
vo_doublevuffering=0;". What I failed to notice was that it was in the file
mencoder.c. In libvo/video_out.c we can find "int vo_doublebuffering = 1;".

So if we assume the problem is entirely mplayer at this point, running mplayer
with -nodouble should result in the tearing going away, right? The driver should
still be doing its double buffering (based on XV_DOUBLE_BUFFER).
Comment 34 Roland Scheidegger 2006-12-05 05:00:45 UTC
(In reply to comment #33)
> So if we assume the problem is entirely mplayer at this point, running mplayer
> with -nodouble should result in the tearing going away, right? The driver should
> still be doing its double buffering (based on XV_DOUBLE_BUFFER).
should work, if mplayer leaves the XV_DOUBLE_BUFFER alone.
Comment 35 Pierre Ossman 2006-12-05 14:27:33 UTC
Ok, I have now confirmed that there is no problem with tearing if you run
mplayer with -nodouble. That seems to confirm that there is no inherent tearing
problem with the radeon driver's double buffering.

So I guess this bug is really that the errata fix slows down video to the point
that many movies fail to render at a reasonable speed. But I don't think we have
a fix that everyone is satisfied with?

Out of curiosity, is there a way to do mplayer's double buffering properly?
I.e., can you check if the pixmap you're about to overwrite has been copied yet?
Comment 36 Roland Scheidegger 2006-12-05 14:56:24 UTC
(In reply to comment #35)
> So I guess this bug is really that the errata fix slows down video to the point
> that many movies fail to render at a reasonable speed.
Indeed.
> But I don't think we have
> a fix that everyone is satisfied with?
No, as backing out the erratas is not an option, and since apparently backing
out only the outpll/inpll in radeondisplayvideo doesn't do anything neither (I
think it would be a good idea to do that anyway though, of course with making
sure we still get the correct ecp div) I'm at a loss what else should be done
(or even what's causing the slowdown).

> Out of curiosity, is there a way to do mplayer's double buffering properly?
> I.e., can you check if the pixmap you're about to overwrite has been copied yet?
It should be possible to do this with XSync, just don't call it right after the
XVPutImage (pointless doing double buffering...), and probably not right before
uploading neither (due to the possible high latency it might have for instance
over network) but somewhere in the middle of decoding a new frame (but do it
from a different thread so you don't need to wait in the decoding thread?), so
that in the normal case X should already be finished and no unneeded waiting
happens. Or maybe use XPending or friends. Dunno.
Comment 37 Pierre Ossman 2006-12-06 12:35:17 UTC
I've spent the night fiddling back and forth with the driver, and I have some
good news. Removing the OUTPLL has the same results as removing the entire errata.

My earlier report was based on too few samples, so it wasn't the complete truth.
I've been running both versions during several tests here, and the results are
the same when you even things out. The difference is just a bit of dumb luck
whether X or mplayer is the one getting shafted CPU wise.

As for a solution, as long as I run X at nice -20, the tearing goes away. So if
you can kill of that OUTPLL then I'll be a happy camper. :)
Comment 38 Benjamin Herrenschmidt 2006-12-06 13:24:39 UTC
That makes sense... look at the implementation of OUTPLL.. It has specific
workarounds for chip errata, which on the M6, afaik, ends up doing a
usleep(5000); on every access. That might definitely cause performance problems
for Xv.
Comment 39 Roland Scheidegger 2006-12-07 18:04:29 UTC
I've commited a fix to avoid the INPLL/OUTPLL pair causing a 10ms delay, though
I'm not sure it fixes all problems. After all, for 24fps video this would "only"
have caused a 240ms additional max delay (in case of dri/dmaforxv less because
the chip is busy anyway there).
Comment 40 Pierre Ossman 2006-12-07 22:40:18 UTC
Great, I'll give it a try.

As for the delay, it actually seems reasonable. At 300 fps (OUTPLL removed), it
takes 3 ms to push an image to the card. At 50 fps (with OUTPLL), it is 20 ms
per frame, or about 8 ms extra delay per sleep.

Considering that we tell the kernel to sleep for _at least_ 5 ms, and the kernel
has a HZ of 250, it must sleep for at least two ticks, which is precisely 8 ms.
Comment 41 Pierre Ossman 2006-12-18 11:30:58 UTC
Terribly sorry, I completely forgot about this issue.

I've tested the patch and I can verify that it works nicely. Feel free to close
the bug.
Comment 42 Timo Jyrinki 2007-02-22 02:42:16 UTC
Reported to be fixed, closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.