I recently bought a vintage radeon card, and I'm having trouble getting decent video from it. When doing Xv, I get severe problems with tearing. The card is a PCI Radeon 7000. The same machine also has an AGP Radeon 7200 which works fine, so I'm suspecting it's a bandwidth issue (either on-card memory, or the bus). I noticed something very strange when I tried a test program I hacked together. The program tries to output two alternating Xv images as fast as possible. On the 7200, the images are consumed at a very high rate, but on the 7000 images go at a steady rate of 50 fps. Very strange considering it claims to be running the monitor at 60 Hz and should not even be waiting for that. Ideas on what might be wrong and what I can do about it?
Please attach config and log files.
Created attachment 7747 [details] xorg.conf
Created attachment 7748 [details] Xorg.0.log
(In reply to comment #0) > The card is a PCI Radeon 7000. The same machine also has an AGP Radeon 7200 > which works fine, so I'm suspecting it's a bandwidth issue (either on-card > memory, or the bus). Yeah, depending on the resolution of the video, the PCI bus might be the bottleneck. Can you try starting X on the PCI card only so the DRI gets enabled and see if that makes any difference? > I noticed something very strange when I tried a test program I hacked together. > The program tries to output two alternating Xv images as fast as possible. Can you attach the test program? BTW, it looks like your driver is patched, does this also happen with the stock driver?
(In reply to comment #4) > > Yeah, depending on the resolution of the video, the PCI bus might be the > bottleneck. Shouldn't it just be losing frames at that point? It's the tearing that is driving me nuts. > Can you try starting X on the PCI card only so the DRI gets enabled > and see if that makes any difference? The machine crashes when I try to enable DRI on the PCI card. I'll see if letting BIOS boot the PCI card can get it running. > > Can you attach the test program? > Will do. > BTW, it looks like your driver is patched, does this also happen with the stock > driver? Yes it does. The only patch I have is the [in]famous TV output patch for radeon.
Created attachment 7753 [details] Test program This is a bit crude, so you need to set autopaint color key.
Btw, should I be shifting back to ASSIGNED myself when I think I've provided you with what you need or do you want to do that yourself? :)
Ok, using just the PCI card made things a bit better, but still not very good. One test video is 608x336 at 30 fps and mplayer claims to be using YV12 for it. So by my calculations, this should result in about 74 Mbps of bandwidth. That's not even a tenth of the theoretical PCI bandwidth...
*ping* Any ideas?
Some more info... When this issue is really bad (high res movies), the playback stops for a few seconds now and then. top reveals that it is just X that is eating CPU (~40%), but nothing in the log. Also, no complaints from mplayer strangely enough, it just stops...
Can you try profiling this with sysprof or oprofile?
Sure, I can do that tonight. Do you have any insight into why it is blocking at 50 fps? It seems to me like Xv is a very fast operation that completes immediately (and doesn't wait for the actual transfer), and throws away any frames it doesn't have time to handle.
Ok, I've now done a profile of the freeze. I.e. I did "opcontrol --start" when I saw the freeze, and "opcontrol --stop" when it resumed. Nothing to revealing though: CPU: PIII, speed 864.552 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name app name symbol name 34652 67.1850 radeon_drv.so radeon_drv.so RADEONPutImage 5632 10.9196 no-vmlinux no-vmlinux (no symbols) 5023 9.7388 libavcodec.so.51.22.0 libavcodec.so.51.22.0 (no symbols) 1805 3.4996 .nfs00a8c06300000090 .nfs00a8c06300000090 (no symbols) 1344 2.6058 libpython2.4.so.1.0 libpython2.4.so.1.0 (no symbols)
I suspect the compiler inlines RADEONCopy*() and the time is actually spent there. Maybe you can play around a little to confirm that.
Any suggestions on how best to do that? :) And isn't the radeon driver using this newfangled DMA technology for Xv these days? ;)
(In reply to comment #15) > Any suggestions on how best to do that? :) Some ideas: * Make the RADEONCopy*() functions non-static and profile again. * Print the gettimeofday() values at the beginning and end of RADEONPutImage() and RADEONCopy*(). * ... > And isn't the radeon driver using this newfangled DMA technology for Xv these > days? ;) grep DMA /var/log/Xorg.0.log should say whether it does, but even if it does, it still has to copy the data into GART memory.
(In reply to comment #16) > > grep DMA /var/log/Xorg.0.log > > should say whether it does, but even if it does, it still has to copy the data > into GART memory. Oh, so it can't do any "normal" DMA? This is a PCI card after all, so there shouldn't be any GART involved.
(In reply to comment #17) > > This is a PCI card after all, so there shouldn't be any GART involved. The GPU needs a linear view of the source for the blit, it has an internal GART for that. The source data could be mapped into the GART dynamically with something like the TTM (if its base address and pitch are aligned suitably for the GPU), but the overhead of page table and TLB manipulations might not always make that a clear win either.
Ok, we can rule out DMA as it seems to depend on DRI, which I cannot enable when both cards are active: if ( info->directRenderingEnabled && info->DMAForXv ) (in RADEONCopyMungeData()) That function is also the one eating all the CPU: 33964 74.2253 radeon_drv.so radeon_drv.so RADEONCopyMungedData Not too surprising I suppose.
(In reply to comment #19) > Ok, we can rule out DMA as it seems to depend on DRI, [...] Yes, hence my suggestion from comment #4. > 33964 74.2253 radeon_drv.so radeon_drv.so > RADEONCopyMungedData > > Not too surprising I suppose. Right. You may want to double-check in /proc/mtrr that write-combining is enabled for the framebuffer of the PCI card. If that's not the problem, another random idea would be playing with PCI settings in the BIOS setup. Also, I'm adding Roland Scheidegger to the CC list, he may have a pointer to a patch for native planar YUV support, which might help somewhat.
Created attachment 7927 [details] [review] patch for using native planar yuv format Here's a cleaned up version of the patch. Unfortunately, it stubbornly refuses to run correctly and has even more bugs than the previous version. Anyway, this should give you an idea what you could expect performance-wise. However, since it looks like writing to the fb over the pci bus is the limiting factor, this is unlikely to make much of a difference - bandwidth required will only drop roughly 25% to your calculated 75mbps (the packed yuv the driver was using previously needs more space).
Created attachment 7928 [details] [review] improved planar video patch... works much better without stupid typos... though moving a video window beyond screen border is still broken, but I should be able to fix that up.
mtrr was ok, so I'm testing the patch. Spot the error: ;) - left >>= 1; src_w >>= 1; + src_w >>= 1; left >> 1;
Patch made things slightly better, but again not very good. The hangs are gone (at least for the test movie I have), but the tearing is still present. The tearing is now also very funky. Since it is a planar format, I constantly get the luma and chroma out of sync. Looks like some mushroom bearing hippies have invaded my machine. :) From my point of view, there are three unanswered questions (which might all be related): A. Why is there a bandwidth issue and where? There should be enough bandwidth over both the PCI bus and internally on the card. B. Why is there such a speed difference in pushing data to the AGP card (over 10 times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an old machine as this. C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't there be a cache of at least one image on the card that we flip to?
(In reply to comment #23) > Spot the error: ;) > > - left >>= 1; src_w >>= 1; > + src_w >>= 1; left >> 1; Ah very good find. Unfortunately left is typically always 0 anyway so it doesn't change much. If you move the video window beyond left screen edge it will get bogus, if you move beyond the upper edge X will finally segfault. Moreover, if the source is only 16-pixel aligned but not 32-pixel, only garbage will be shown.
(In reply to comment #24) > Patch made things slightly better, but again not very good. The hangs are gone > (at least for the test movie I have), but the tearing is still present. > > The tearing is now also very funky. Since it is a planar format, I constantly > get the luma and chroma out of sync. Looks like some mushroom bearing hippies > have invaded my machine. :) > > From my point of view, there are three unanswered questions (which might all be > related): > > A. Why is there a bandwidth issue and where? There should be enough bandwidth > over both the PCI bus and internally on the card. Not sure. You said that mtrr are correct, so there doesn't seem to be many possibilities left. What sort of chipset is that? There exist indeed some which have a very weak pci implementation where you can hardly expect more than ISA-like performance... The card could be problematic, there exist versions with only 64bit sdr ram, while in theory those 10MB/s you need to upload the video aren't really much things tend to fall apart with those cards if they are too bandwidth-limited. > B. Why is there such a speed difference in pushing data to the AGP card (over 10 > times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an > old machine as this. It shouldn't be that much faster, then again there are strange slowdowns with dri with pci rv250-like chips, way beyond what you'd expect (say factor 5 or so, even when compared to AGP 1x). That is probably a different problem though as it seems related to the cp fetching things over the pci bus, which shouldn't be an issue here. > C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't > there be a cache of at least one image on the card that we flip to? Not sure. Looks like it shouldn't happen. Maybe a frame could be completely missed so you'd upload a new frame to the current buffer.
(In reply to comment #26) > (In reply to comment #24) > > A. Why is there a bandwidth issue and where? There should be enough bandwidth > > over both the PCI bus and internally on the card. > Not sure. You said that mtrr are correct, so there doesn't seem to be many > possibilities left. What sort of chipset is that? There exist indeed some which > have a very weak pci implementation where you can hardly expect more than > ISA-like performance... I recently changed motherboards on this machine, and the problem was on both. The previous board was an Intel based (unsure exactly which chipset), and the new is a VIA VT82-something. > The card could be problematic, there exist versions with > only 64bit sdr ram, while in theory those 10MB/s you need to upload the video > aren't really much things tend to fall apart with those cards if they are too > bandwidth-limited. > This is a Radeon 7000, which according to wikipedia is one with only 64-bits for the memory bus. > > B. Why is there such a speed difference in pushing data to the AGP card (over 10 > > times as fast). AFAIK, the AGP bus isn't that much faster, especially on such an > > old machine as this. > It shouldn't be that much faster, then again there are strange slowdowns with > dri with pci rv250-like chips, way beyond what you'd expect (say factor 5 or so, > even when compared to AGP 1x). That is probably a different problem though as it > seems related to the cp fetching things over the pci bus, which shouldn't be an > issue here. > Also, DRI is disabled here. > > C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't > > there be a cache of at least one image on the card that we flip to? > Not sure. Looks like it shouldn't happen. Maybe a frame could be completely > missed so you'd upload a new frame to the current buffer. Any way to determine what's going on here?
(In reply to comment #27) > I recently changed motherboards on this machine, and the problem was on both. > The previous board was an Intel based (unsure exactly which chipset), and the > new is a VIA VT82-something. Ok if it happened with both that probably isn't the issue. > > The card could be problematic, there exist versions with > > only 64bit sdr ram, while in theory those 10MB/s you need to upload the video > > aren't really much things tend to fall apart with those cards if they are too > > bandwidth-limited. > > > > This is a Radeon 7000, which according to wikipedia is one with only 64-bits for > the memory bus. Sure but most of them have ddr sdram and not sdr. The log should tell you that. > Also, DRI is disabled here. Well, writing fb shouldn't be that slow without dma (though reading sure is). > > > C. Why am I getting tearing when I have XV_DOUBLE_BUFFER set to 1? Shouldn't > > > there be a cache of at least one image on the card that we flip to? > > Not sure. Looks like it shouldn't happen. Maybe a frame could be completely > > missed so you'd upload a new frame to the current buffer. > > Any way to determine what's going on here? Dunno.
Created attachment 7940 [details] [review] Wait for previous flip to take effect before uploading data Does this patch make any difference for the tearing? AFAIK it's quite common to get much less than the theoretical bandwidth out of a PCI bus, especially with multiple devices accessing it concurrently.
(In reply to comment #28) > (In reply to comment #27) > > This is a Radeon 7000, which according to wikipedia is one with only 64-bits for > > the memory bus. > Sure but most of them have ddr sdram and not sdr. The log should tell you that. > Ah, didn't know that. Working AGP card: (--) RADEON(0): Mapped VideoRAM: 32768 kByte (128 bit SDR SDRAM) Buggy PCI card: (--) RADEON(1): Mapped VideoRAM: 65536 kByte (64 bit DDR SDRAM)
Tried the patch and the results aren't good. If anything, things got worse... :(
(In reply to comment #31) > Tried the patch and the results aren't good. If anything, things got worse... :( Can you be more specific? What changed?
The tearing got more frequent.
I'm currently on #xorg if you want to have more live conversation.
After doing some more tests, writing to the framebuffer without dma probably really is just slow. At least on this box here, when playing a full hd (h.264) video, mplayer eats up between 40 and 70% of the cpu time, and X cpu time barely registers with xv dma, but when not using dma cpu usage is roughly split between mplayer and X, with framedrops, lots of tearing, and sometimes video just stops completely for some seconds... Sounds a lot like what you're experiencing. Granted, we're talking roughly 65MB/s here, but with AGP1x (it's a AGP4x but without fast writes those writes only happen at AGP1x speed) that's only 1/4th the bus limit. (As a side note, I'm actually surprised how well those h.264 full hd videos play. That's just with a lonely A64 2Ghz, no dual-core, no fancy next-gen graphic card - though you could say it's cheated due to the downscaling the graphic chip has to do for full hd video...)
DRI causes the machine to lock up when I use both cards, so that isn't an option right now... I could yank out the AGP card and do some tests on just the PCI card though.
(In reply to comment #36) > DRI causes the machine to lock up when I use both cards, so that isn't an option > right now... I could yank out the AGP card and do some tests on just the PCI > card though. Just thought, could that be similar to this bug, https://bugs.freedesktop.org/show_bug.cgi?id=5876 ?
Sounds similar enough. I'll try the patch on that page.
Ok, I tried the patch on that page and it solved all problems. So start reverting ;) I guess this bug should be closed as duplicate then.
*** This bug has been marked as a duplicate of 5876 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.