Summary: | RFE: Implement X{Get,Put}Image acceleration for EXA | ||
---|---|---|---|
Product: | xorg | Reporter: | Roland Mainz <roland.mainz> |
Component: | Server/Acceleration/EXA | Assignee: | Eric Anholt <eric> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | enhancement | ||
Priority: | high | CC: | alexdeucher, aplattner, aritger, leio, michel, roland.mainz, thomas |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Roland Mainz
2005-04-11 19:33:41 UTC
this should properly be done in the EXA layer, utilizing the DownloadFromScreen hook. the reverse operation should also be performed for XPutImage with UploadToScreen. Created attachment 3485 [details] [review] exa-accelerate-getputimage-2.patch work in progress, but this is the basic idea. taking Preliminary testing with the radeon driver and the -7 patch: The only big performance gains seem to be for the XY primitives, but the x11perf XY PutImage tests look incorrect. Some games like circuslinux also look incorrect, looks like some offset is taken into account incorrectly or not at all. michel, what hardware were your tests done with? was the patch at least no slower than without? (In reply to comment #5) > michel, what hardware were your tests done with? Titanium PowerBook IV with a Mobility 9000. > was the patch at least no slower than without? Only GetImage 100x100 and 500x500 were slightly slower, and only with MMIO. So it's still a win overall, just not as much as I'd hoped. Are you able to reproduce the incorrect rendering? Created attachment 4342 [details] [review] exa-accelerate-getputimage-7.patch just for checkpointing where i was at with this. was also available in my ~ but wanted to post it here too. i'll be taking another pass at this today. Hi! Thanks for implementing this. I have tested it on a Unichrome K8M800 Athlon64 which is particularly slow for system <-> framebuffer transfers, so this acceleration is very welcome. When HW accs kicks in, I get approx 4x speedup on uploads, more details to follow. First some comments on the patch: 1) The test (planeMask == -1) fails on 64 bit architectures where planeMask is an unsigned long, (and perhaps 16 bit depths?). I had to do ((planeMask & depthMask) == depthMask) where depthMask = (1 << bitsPerPixel) -1. This test precedes both uploads and downloads. 2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have uploaded pictures end up at the correct location. This seems correct as it is done also in the fallback. Is this needed also for download? 3) Sync after accelerated EXA copy. 4) Driver downloads may fail due to alignment violations. When this happens, it seems like exaGetImage does an accel copy to an offscreen pixmap and retries the download. This is good, since the rest of EXA seems to only be doing downloads from already aligned pixmaps, and this fix would otherwise have to be done in the driver. 5) At the FIME: point in upload, wouldn't it be sufficient to just set pScratch->devPrivate.ptr = bits; (and perhaps pScratch->devKind = pitch) to have exaPixmapUseScreen() do the job in the first run, and skip the subsequent UTS? 6) shmPutImage? /Thomas (In reply to comment #8) > When HW accs kicks in, I get approx 4x speedup on uploads, more details to > follow. First some comments on the patch: Nice! How does GetImage compare? > 1) The test (planeMask == -1) fails on 64 bit architectures where planeMask is > an unsigned long, (and perhaps 16 bit depths?). I had to do ((planeMask & > depthMask) == depthMask) where depthMask = (1 << bitsPerPixel) -1. This test > precedes both uploads and downloads. Good catch. > 2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have > uploaded pictures end up at the correct location. This seems correct as it is > done also in the fallback. Is this needed also for download? I would assume this is needed in both directions, yes. I get lost in the maze of offsets, so whatever makes xwd look correct I suppose. > 3) Sync after accelerated EXA copy. Right. > 5) At the FIME: point in upload, wouldn't it be sufficient to just set > pScratch->devPrivate.ptr = bits; (and perhaps > pScratch->devKind = pitch) > to have exaPixmapUseScreen() do the job in the first run, and skip the > subsequent UTS? That comment was written before anholt fixed dirty tracking for this case, so the code should be correct as written now. http://lists.freedesktop.org/archives/xorg-commit/2005-October/005535.html > 6) shmPutImage? ShmPutImage is implemented pretty strangely, iirc we only use the fbShmPutImage version, which goes through the ->CopyArea path which just totally overkills things. There is the ShmRegisterFuncs() hook, which exa could use to hook in, but it's not exported from the server yet. Still probably a good thing to look into, looks like xgl has an example of how to do that. > 2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have
> uploaded pictures end up at the correct location. This seems correct as it is
> done also in the fallback. Is this needed also for download?
I tested the patch unchanged and the only elements that don't get drawn at the
correct position are flash applets in firefox everything else on the desktop
works fine. (Tested on a radeon r300 based card)
Thomas, could you attach a patch of your current code for testing? Thanks. Created attachment 4434 [details]
Timings on unichrome K8M800 / AMD64
Unichrome K8M800 EXA upload / download timings. Note that this is on an AMD64
which is particularly slow for these types of transfer. Upload is actually
implemented by pipelining a memcopy to AGP and texture from there to
framebuffer, but this only gives an advantage on fast processors (not via C3).
Download is implemented using PCI SG DMA.
Upload turned out to be approx 3X faster than without the patch.
Download about 20X faster.
/Thomas
Created attachment 4435 [details] [review] Adam's patch 4342 with some minor fixes applied. The patch I used for the test. Basically identical to 4342 with some of the above fixes hacked in. Made against monolithic exa dir. I still get some rendering errors with some fallback paths, but this might be due to Unichrome's planemask problems? Haven't had time to look into it further. /Thomas Created attachment 4436 [details]
Timings on a Radeon 9500pro/AthlonXP 1600+
(In reply to comment #14) > Created an attachment (id=4436) [edit] > Timings on a Radeon 9500pro/AthlonXP 1600+ it should be noted that since the radeon driver just uses memcpy to implement DFS, the lack of change in GetImage numbers is totally expected. Created attachment 4871 [details] [review] Merge -7 to current EXA FWIW, I've been using the -7 patch for a while and haven't noticed any issues. Here's my merge of it to current EXA. Only compile-tested so far. The patch also removes two unused variables and fixes the move-in condition in EnableDisableFBAccess. I've just committed a patch for EXA to do GetImage acceleration for the only case (bitsPerPixel >= 8, ZPixmap, pm is like FB_ALLONES) that I think matters, based slightly on the attachment #4871 [details] [review]. I'm still interested in pulling out the interesting case of PutImage, and looking at the SHM questions. I'm really not interested in the XYPixmap (for example) support, particuarly given how many bugs I've noticed in it so far. I'll go ahead and take this bug as maintainer, at this point. And now I've committed some PutImage acceleration. It's a similar limited set of operations accelerated, but I think they happen to be the ones we care about. So, I think "Implement X{Get,Put}Image acceleration for EXA" is done at this point and I'll mark it resolved. While the original request to have this done for nv isn't complete, I think this bug has been sufficiently hijacked that nobody should want to continue talking about NV in here, and would just open a new bug to deal with actual nv exa-fication. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.