Bug 8938 - [ATI/radeon] Xv tearing
Summary: [ATI/radeon] Xv tearing
Status: RESOLVED DUPLICATE of bug 5876
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: xf86-video-ati maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-07 13:41 UTC by Pierre Ossman
Modified: 2006-12-03 15:10 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (3.34 KB, text/plain)
2006-11-11 15:00 UTC, Pierre Ossman
no flags Details
Xorg.0.log (76.90 KB, text/plain)
2006-11-11 15:01 UTC, Pierre Ossman
no flags Details
Test program (9.27 KB, text/plain)
2006-11-12 06:12 UTC, Pierre Ossman
no flags Details
patch for using native planar yuv format (10.95 KB, patch)
2006-11-30 08:09 UTC, Roland Scheidegger
no flags Details | Splinter Review
improved planar video patch... (10.84 KB, patch)
2006-11-30 10:23 UTC, Roland Scheidegger
no flags Details | Splinter Review
Wait for previous flip to take effect before uploading data (1.12 KB, patch)
2006-12-01 07:36 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Pierre Ossman 2006-11-07 13:41:47 UTC
I recently bought a vintage radeon card, and I'm having trouble getting decent
video from it. When doing Xv, I get severe problems with tearing.

The card is a PCI Radeon 7000. The same machine also has an AGP Radeon 7200
which works fine, so I'm suspecting it's a bandwidth issue (either on-card
memory, or the bus).

I noticed something very strange when I tried a test program I hacked together.
The program tries to output two alternating Xv images as fast as possible. On
the 7200, the images are consumed at a very high rate, but on the 7000 images go
at a steady rate of 50 fps. Very strange considering it claims to be running the
monitor at 60 Hz and should not even be waiting for that.

Ideas on what might be wrong and what I can do about it?
Comment 1 Michel Dänzer 2006-11-11 08:58:25 UTC
Please attach config and log files.
Comment 2 Pierre Ossman 2006-11-11 15:00:27 UTC
Created attachment 7747 [details]
xorg.conf
Comment 3 Pierre Ossman 2006-11-11 15:01:06 UTC
Created attachment 7748 [details]
Xorg.0.log
Comment 4 Michel Dänzer 2006-11-12 04:42:47 UTC
(In reply to comment #0)
> The card is a PCI Radeon 7000. The same machine also has an AGP Radeon 7200
> which works fine, so I'm suspecting it's a bandwidth issue (either on-card
> memory, or the bus).

Yeah, depending on the resolution of the video, the PCI bus might be the
bottleneck. Can you try starting X on the PCI card only so the DRI gets enabled
and see if that makes any difference?

> I noticed something very strange when I tried a test program I hacked together.
> The program tries to output two alternating Xv images as fast as possible.

Can you attach the test program?

BTW, it looks like your driver is patched, does this also happen with the stock
driver?
Comment 5 Pierre Ossman 2006-11-12 05:37:08 UTC
(In reply to comment #4)
> 
> Yeah, depending on the resolution of the video, the PCI bus might be the
> bottleneck.

Shouldn't it just be losing frames at that point? It's the tearing that is
driving me nuts.

> Can you try starting X on the PCI card only so the DRI gets enabled
> and see if that makes any difference?

The machine crashes when I try to enable DRI on the PCI card. I'll see if
letting BIOS boot the PCI card can get it running.

> 
> Can you attach the test program?
> 

Will do.

> BTW, it looks like your driver is patched, does this also happen with the stock
> driver?

Yes it does. The only patch I have is the [in]famous TV output patch for radeon.
Comment 6 Pierre Ossman 2006-11-12 06:12:57 UTC
Created attachment 7753 [details]
Test program

This is a bit crude, so you need to set autopaint color key.
Comment 7 Pierre Ossman 2006-11-12 13:04:47 UTC
Btw, should I be shifting back to ASSIGNED myself when I think I've provided you
with what you need or do you want to do that yourself? :)
Comment 8 Pierre Ossman 2006-11-12 13:24:50 UTC
Ok, using just the PCI card made things a bit better, but still not very good.

One test video is 608x336 at 30 fps and mplayer claims to be using YV12 for it.
So by my calculations, this should result in about 74 Mbps of bandwidth. That's
not even a tenth of the theoretical PCI bandwidth...
Comment 9 Pierre Ossman 2006-11-19 05:25:27 UTC
*ping*

Any ideas?
Comment 10 Pierre Ossman 2006-11-22 13:20:43 UTC
Some more info... When this issue is really bad (high res movies), the playback
stops for a few seconds now and then. top reveals that it is just X that is
eating CPU (~40%), but nothing in the log. Also, no complaints from mplayer
strangely enough, it just stops...
Comment 11 Michel Dänzer 2006-11-22 22:29:38 UTC
Can you try profiling this with sysprof or oprofile?
Comment 12 Pierre Ossman 2006-11-22 22:40:17 UTC
Sure, I can do that tonight.

Do you have any insight into why it is blocking at 50 fps? It seems to me like
Xv is a very fast operation that completes immediately (and doesn't wait for the
actual transfer), and throws away any frames it doesn't have time to handle.
Comment 13 Pierre Ossman 2006-11-23 13:03:55 UTC
Ok, I've now done a profile of the freeze. I.e. I did "opcontrol --start" when I
saw the freeze, and "opcontrol --stop" when it resumed. Nothing to revealing though:

CPU: PIII, speed 864.552 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit
mask of 0x00 (No unit mask) count 100000
samples  %        image name               app name                 symbol name
34652    67.1850  radeon_drv.so            radeon_drv.so            RADEONPutImage
5632     10.9196  no-vmlinux               no-vmlinux               (no symbols)
5023      9.7388  libavcodec.so.51.22.0    libavcodec.so.51.22.0    (no symbols)
1805      3.4996  .nfs00a8c06300000090     .nfs00a8c06300000090     (no symbols)
1344      2.6058  libpython2.4.so.1.0      libpython2.4.so.1.0      (no symbols)
Comment 14 Michel Dänzer 2006-11-28 06:10:20 UTC
I suspect the compiler inlines RADEONCopy*() and the time is actually spent
there. Maybe you can play around a little to confirm that.
Comment 15 Pierre Ossman 2006-11-28 13:13:30 UTC
Any suggestions on how best to do that? :)

And isn't the radeon driver using this newfangled DMA technology for Xv these
days? ;)
Comment 16 Michel Dänzer 2006-11-29 02:31:35 UTC
(In reply to comment #15)
> Any suggestions on how best to do that? :)

Some ideas:

* Make the RADEONCopy*() functions non-static and profile again.
* Print the gettimeofday() values at the beginning and end of RADEONPutImage()
and RADEONCopy*().
* ...

> And isn't the radeon driver using this newfangled DMA technology for Xv these
> days? ;)

grep DMA /var/log/Xorg.0.log

should say whether it does, but even if it does, it still has to copy the data
into GART memory.
Comment 17 Pierre Ossman 2006-11-29 03:41:41 UTC
(In reply to comment #16)
> 
> grep DMA /var/log/Xorg.0.log
> 
> should say whether it does, but even if it does, it still has to copy the data
> into GART memory.

Oh, so it can't do any "normal" DMA? This is a PCI card after all, so there
shouldn't be any GART involved.
Comment 18 Michel Dänzer 2006-11-29 06:57:26 UTC
(In reply to comment #17)
> 
> This is a PCI card after all, so there shouldn't be any GART involved.

The GPU needs a linear view of the source for the blit, it has an internal GART
for that. The source data could be mapped into the GART dynamically with
something like the TTM (if its base address and pitch are aligned suitably for
the GPU), but the overhead of page table and TLB manipulations might not always
make that a clear win either.
Comment 19 Pierre Ossman 2006-11-29 13:02:24 UTC
Ok, we can rule out DMA as it seems to depend on DRI, which I cannot enable when
both cards are active:

    if ( info->directRenderingEnabled && info->DMAForXv )

(in RADEONCopyMungeData())

That function is also the one eating all the CPU:

33964    74.2253  radeon_drv.so            radeon_drv.so           
RADEONCopyMungedData

Not too surprising I suppose.
Comment 20 Michel Dänzer 2006-11-30 06:59:06 UTC
(In reply to comment #19)
> Ok, we can rule out DMA as it seems to depend on DRI, [...]

Yes, hence my suggestion from comment #4.

> 33964    74.2253  radeon_drv.so            radeon_drv.so           
> RADEONCopyMungedData
> 
> Not too surprising I suppose.

Right. You may want to double-check in /proc/mtrr that write-combining is
enabled for the framebuffer of the PCI card. If that's not the problem, another
random idea would be playing with PCI settings in the BIOS setup. Also, I'm
adding Roland Scheidegger to the CC list, he may have a pointer to a patch for
native planar YUV support, which might help somewhat.
Comment 21 Roland Scheidegger 2006-11-30 08:09:13 UTC
Created attachment 7927 [details] [review]
patch for using native planar yuv format

Here's a cleaned up version of the patch. Unfortunately, it stubbornly refuses
to run correctly and has even more bugs than the previous version.
Anyway, this should give you an idea what you could expect performance-wise.
However, since it looks like writing to the fb over the pci bus is the limiting
factor, this is unlikely to make much of a difference - bandwidth required will
only drop roughly 25% to your calculated 75mbps (the packed yuv the driver was
using previously needs more space).
Comment 22 Roland Scheidegger 2006-11-30 10:23:06 UTC
Created attachment 7928 [details] [review]
improved planar video patch...

works much better without stupid typos...
though moving a video window beyond screen border is still broken, but I should
be able to fix that up.
Comment 23 Pierre Ossman 2006-11-30 13:13:05 UTC
mtrr was ok, so I'm testing the patch.

Spot the error: ;)

-    left >>= 1; src_w >>= 1;
+    src_w >>= 1; left >> 1;
Comment 24 Pierre Ossman 2006-11-30 13:28:36 UTC
Patch made things slightly better, but again not very good. The hangs are gone
(at least for the test movie I have), but the tearing is still present.

The tearing is now also very funky. Since it is a planar format, I constantly
get the luma and chroma out of sync. Looks like some mushroom bearing hippies
have invaded my machine. :)

From my point of view, there are three unanswered questions (which might all be
related):

A. Why is there a bandwidth issue and where? There should be enough bandwidth
over both the PCI bus and internally on the card.

B. Why is there such a speed difference in pushing data to the AGP card (over 10
times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an
old machine as this.

C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't
there be a cache of at least one image on the card that we flip to?
Comment 25 Roland Scheidegger 2006-11-30 16:06:56 UTC
(In reply to comment #23)
> Spot the error: ;)
> 
> -    left >>= 1; src_w >>= 1;
> +    src_w >>= 1; left >> 1;
Ah very good find. Unfortunately left is typically always 0 anyway so it doesn't
change much. If you move the video window beyond left screen edge it will get
bogus, if you move beyond the upper edge X will finally segfault. Moreover, if
the source is only 16-pixel aligned but not 32-pixel, only garbage will be shown.
Comment 26 Roland Scheidegger 2006-11-30 16:51:46 UTC
(In reply to comment #24)
> Patch made things slightly better, but again not very good. The hangs are gone
> (at least for the test movie I have), but the tearing is still present.
> 
> The tearing is now also very funky. Since it is a planar format, I constantly
> get the luma and chroma out of sync. Looks like some mushroom bearing hippies
> have invaded my machine. :)
> 
> From my point of view, there are three unanswered questions (which might all be
> related):
> 
> A. Why is there a bandwidth issue and where? There should be enough bandwidth
> over both the PCI bus and internally on the card.
Not sure. You said that mtrr are correct, so there doesn't seem to be many
possibilities left. What sort of chipset is that? There exist indeed some which
have a very weak pci implementation where you can hardly expect more than
ISA-like performance... The card could be problematic, there exist versions with
only 64bit sdr ram, while in theory those 10MB/s you need to upload the video
aren't really much things tend to fall apart with those cards if they are too
bandwidth-limited.

> B. Why is there such a speed difference in pushing data to the AGP card (over 10
> times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an
> old machine as this.
It shouldn't be that much faster, then again there are strange slowdowns with
dri with pci rv250-like chips, way beyond what you'd expect (say factor 5 or so,
even when compared to AGP 1x). That is probably a different problem though as it
seems related to the cp fetching things over the pci bus, which shouldn't be an
issue here.

> C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't
> there be a cache of at least one image on the card that we flip to?
Not sure. Looks like it shouldn't happen. Maybe a frame could be completely
missed so you'd upload a new frame to the current buffer.
Comment 27 Pierre Ossman 2006-11-30 23:40:52 UTC
(In reply to comment #26)
> (In reply to comment #24)
> > A. Why is there a bandwidth issue and where? There should be enough bandwidth
> > over both the PCI bus and internally on the card.
> Not sure. You said that mtrr are correct, so there doesn't seem to be many
> possibilities left. What sort of chipset is that? There exist indeed some which
> have a very weak pci implementation where you can hardly expect more than
> ISA-like performance...

I recently changed motherboards on this machine, and the problem was on both.
The previous board was an Intel based (unsure exactly which chipset), and the
new is a VIA VT82-something.

> The card could be problematic, there exist versions with
> only 64bit sdr ram, while in theory those 10MB/s you need to upload the video
> aren't really much things tend to fall apart with those cards if they are too
> bandwidth-limited.
> 

This is a Radeon 7000, which according to wikipedia is one with only 64-bits for
the memory bus.

> > B. Why is there such a speed difference in pushing data to the AGP card (over 10
> > times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an
> > old machine as this.
> It shouldn't be that much faster, then again there are strange slowdowns with
> dri with pci rv250-like chips, way beyond what you'd expect (say factor 5 or so,
> even when compared to AGP 1x). That is probably a different problem though as it
> seems related to the cp fetching things over the pci bus, which shouldn't be an
> issue here.
> 

Also, DRI is disabled here.

> > C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't
> > there be a cache of at least one image on the card that we flip to?
> Not sure. Looks like it shouldn't happen. Maybe a frame could be completely
> missed so you'd upload a new frame to the current buffer.

Any way to determine what's going on here?
Comment 28 Roland Scheidegger 2006-12-01 02:25:49 UTC
(In reply to comment #27)
> I recently changed motherboards on this machine, and the problem was on both.
> The previous board was an Intel based (unsure exactly which chipset), and the
> new is a VIA VT82-something.
Ok if it happened with both that probably isn't the issue.

> > The card could be problematic, there exist versions with
> > only 64bit sdr ram, while in theory those 10MB/s you need to upload the video
> > aren't really much things tend to fall apart with those cards if they are too
> > bandwidth-limited.
> > 
> 
> This is a Radeon 7000, which according to wikipedia is one with only 64-bits for
> the memory bus.
Sure but most of them have ddr sdram and not sdr. The log should tell you that.

> Also, DRI is disabled here.
Well, writing fb shouldn't be that slow without dma (though reading sure is).

> > > C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't
> > > there be a cache of at least one image on the card that we flip to?
> > Not sure. Looks like it shouldn't happen. Maybe a frame could be completely
> > missed so you'd upload a new frame to the current buffer.
> 
> Any way to determine what's going on here?
Dunno.
Comment 29 Michel Dänzer 2006-12-01 07:36:41 UTC
Created attachment 7940 [details] [review]
Wait for previous flip to take effect before uploading data

Does this patch make any difference for the tearing?

AFAIK it's quite common to get much less than the theoretical bandwidth out of
a PCI bus, especially with multiple devices accessing it concurrently.
Comment 30 Pierre Ossman 2006-12-01 08:38:04 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > This is a Radeon 7000, which according to wikipedia is one with only 64-bits for
> > the memory bus.
> Sure but most of them have ddr sdram and not sdr. The log should tell you that.
> 

Ah, didn't know that. Working AGP card:

(--) RADEON(0): Mapped VideoRAM: 32768 kByte (128 bit SDR SDRAM)

Buggy PCI card:

(--) RADEON(1): Mapped VideoRAM: 65536 kByte (64 bit DDR SDRAM)
Comment 31 Pierre Ossman 2006-12-01 08:45:20 UTC
Tried the patch and the results aren't good. If anything, things got worse... :(
Comment 32 Michel Dänzer 2006-12-01 09:04:50 UTC
(In reply to comment #31)
> Tried the patch and the results aren't good. If anything, things got worse... :(

Can you be more specific? What changed?
Comment 33 Pierre Ossman 2006-12-01 09:11:14 UTC
The tearing got more frequent.
Comment 34 Pierre Ossman 2006-12-01 09:11:48 UTC
I'm currently on #xorg if you want to have more live conversation.
Comment 35 Roland Scheidegger 2006-12-01 15:46:01 UTC
After doing some more tests, writing to the framebuffer without dma probably
really is just slow. At least on this box here, when playing a full hd (h.264)
video, mplayer eats up between 40 and 70% of the cpu time, and X cpu time barely
registers with xv dma, but when not using dma cpu usage is roughly split between
mplayer and X, with framedrops, lots of tearing, and sometimes video just stops
completely for some seconds... Sounds a lot like what you're experiencing.
Granted, we're talking roughly 65MB/s here, but with AGP1x (it's a AGP4x but
without fast writes those writes only happen at AGP1x speed) that's only 1/4th
the bus limit.
(As a side note, I'm actually surprised how well those h.264 full hd videos
play. That's just with a lonely A64 2Ghz, no dual-core, no fancy next-gen
graphic card - though you could say it's cheated due to the downscaling the
graphic chip has to do for full hd video...)
Comment 36 Pierre Ossman 2006-12-02 14:28:23 UTC
DRI causes the machine to lock up when I use both cards, so that isn't an option
right now... I could yank out the AGP card and do some tests on just the PCI
card though.
Comment 37 Roland Scheidegger 2006-12-02 16:57:26 UTC
(In reply to comment #36)
> DRI causes the machine to lock up when I use both cards, so that isn't an option
> right now... I could yank out the AGP card and do some tests on just the PCI
> card though.

Just thought, could that be similar to this bug,
https://bugs.freedesktop.org/show_bug.cgi?id=5876 ?
Comment 38 Pierre Ossman 2006-12-03 04:27:05 UTC
Sounds similar enough. I'll try the patch on that page.
Comment 39 Pierre Ossman 2006-12-03 13:37:23 UTC
Ok, I tried the patch on that page and it solved all problems. So start reverting ;)

I guess this bug should be closed as duplicate then.
Comment 40 Roland Scheidegger 2006-12-03 15:10:52 UTC

*** This bug has been marked as a duplicate of 5876 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.