Bug 2986 - RFE: Implement X{Get,Put}Image acceleration for EXA
Summary: RFE: Implement X{Get,Put}Image acceleration for EXA
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Acceleration/EXA (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: high enhancement
Assignee: Eric Anholt
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-11 19:33 UTC by Roland Mainz
Modified: 2006-03-29 21:27 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
exa-accelerate-getputimage-2.patch (4.52 KB, patch)
2005-10-04 15:56 UTC, Adam Jackson
no flags Details | Splinter Review
exa-accelerate-getputimage-7.patch (8.32 KB, patch)
2006-01-14 07:25 UTC, Adam Jackson
no flags Details | Splinter Review
Timings on unichrome K8M800 / AMD64 (2.66 KB, text/plain)
2006-01-23 10:46 UTC, Thomas Hellström
no flags Details
Adam's patch 4342 with some minor fixes applied. (8.46 KB, patch)
2006-01-23 10:50 UTC, Thomas Hellström
no flags Details | Splinter Review
Timings on a Radeon 9500pro/AthlonXP 1600+ (2.50 KB, text/plain)
2006-01-23 13:33 UTC, Dennis Jacobfeuerborn
no flags Details
Merge -7 to current EXA (10.71 KB, patch)
2006-03-10 02:14 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Roland Mainz 2005-04-11 19:33:41 UTC
RFE: Implement accerlation for |XGetImage()| requests for the opensource "nv"
driver.
It may be a nice improvement for some applications to have XGetImage()-requests
accerlated like it is (AFAIK) currently done in the closed-source "nvidia" driver.
Comment 1 Adam Jackson 2005-09-27 23:43:35 UTC
this should properly be done in the EXA layer, utilizing the DownloadFromScreen
hook.  the reverse operation should also be performed for XPutImage with
UploadToScreen.
Comment 2 Adam Jackson 2005-10-04 15:56:09 UTC
Created attachment 3485 [details] [review]
exa-accelerate-getputimage-2.patch

work in progress, but this is the basic idea.
Comment 3 Adam Jackson 2005-11-08 09:29:29 UTC
taking
Comment 4 Michel Dänzer 2005-11-17 01:00:53 UTC
Preliminary testing with the radeon driver and the -7 patch: The only big
performance gains seem to be for the XY primitives, but the x11perf XY PutImage
tests look incorrect. Some games like circuslinux also look incorrect, looks
like some offset is taken into account incorrectly or not at all.
Comment 5 Adam Jackson 2005-12-01 09:29:25 UTC
michel, what hardware were your tests done with?  was the patch at least no
slower than without?
Comment 6 Michel Dänzer 2005-12-02 01:47:55 UTC
(In reply to comment #5)
> michel, what hardware were your tests done with?  

Titanium PowerBook IV with a Mobility 9000.

> was the patch at least no slower than without?

Only GetImage 100x100 and 500x500 were slightly slower, and only with MMIO. So
it's still a win overall, just not as much as I'd hoped.

Are you able to reproduce the incorrect rendering?
Comment 7 Adam Jackson 2006-01-14 07:25:02 UTC
Created attachment 4342 [details] [review]
exa-accelerate-getputimage-7.patch

just for checkpointing where i was at with this.  was also available in my ~
but wanted to post it here too.

i'll be taking another pass at this today.
Comment 8 Thomas Hellström 2006-01-16 19:59:20 UTC
Hi!

Thanks for implementing this. I have tested it on a Unichrome K8M800 Athlon64
which is particularly slow for system <-> framebuffer transfers, so this
acceleration is very welcome.

When HW accs kicks in, I get approx 4x speedup on uploads, more details to
follow. First some comments on the patch:

1) The test (planeMask == -1) fails on 64 bit architectures where planeMask is
an unsigned long, (and perhaps 16 bit depths?). I had to do ((planeMask &
depthMask) == depthMask) where depthMask = (1 << bitsPerPixel) -1. This test
precedes both uploads and downloads.

2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have
uploaded pictures end up at the correct location. This seems correct as it is
done also in the fallback. Is this needed also for download?

3) Sync after accelerated EXA copy.

4) Driver downloads may fail due to alignment violations. When this happens, it
seems like exaGetImage does an accel copy to an offscreen pixmap and retries the
download. This is good, since the rest of EXA seems to only be doing downloads
from already aligned pixmaps, and this fix would otherwise have to be done in
the driver.

5) At the FIME: point in upload, wouldn't it be sufficient to just set 
pScratch->devPrivate.ptr = bits; (and perhaps
pScratch->devKind = pitch)
to have exaPixmapUseScreen() do the job in the first run, and skip the
subsequent UTS? 

6) shmPutImage?

/Thomas
Comment 9 Adam Jackson 2006-01-21 07:38:48 UTC
(In reply to comment #8)
> When HW accs kicks in, I get approx 4x speedup on uploads, more details to
> follow. First some comments on the patch:

Nice!  How does GetImage compare?

> 1) The test (planeMask == -1) fails on 64 bit architectures where planeMask is
> an unsigned long, (and perhaps 16 bit depths?). I had to do ((planeMask &
> depthMask) == depthMask) where depthMask = (1 << bitsPerPixel) -1. This test
> precedes both uploads and downloads.

Good catch.

> 2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have
> uploaded pictures end up at the correct location. This seems correct as it is
> done also in the fallback. Is this needed also for download?

I would assume this is needed in both directions, yes.  I get lost in the maze
of offsets, so whatever makes xwd look correct I suppose.

> 3) Sync after accelerated EXA copy.

Right.

> 5) At the FIME: point in upload, wouldn't it be sufficient to just set 
> pScratch->devPrivate.ptr = bits; (and perhaps
> pScratch->devKind = pitch)
> to have exaPixmapUseScreen() do the job in the first run, and skip the
> subsequent UTS? 

That comment was written before anholt fixed dirty tracking for this case, so
the code should be correct as written now.

http://lists.freedesktop.org/archives/xorg-commit/2005-October/005535.html

> 6) shmPutImage?

ShmPutImage is implemented pretty strangely, iirc we only use the fbShmPutImage
version, which goes through the ->CopyArea path which just totally overkills
things.  There is the ShmRegisterFuncs() hook, which exa could use to hook in,
but it's not exported from the server yet.  Still probably a good thing to look
into, looks like xgl has an example of how to do that.
Comment 10 Dennis Jacobfeuerborn 2006-01-22 09:50:16 UTC
> 2) For uploads, I had to add xi += pDrawable->x and yi = pDrawable->y to have
> uploaded pictures end up at the correct location. This seems correct as it is
> done also in the fallback. Is this needed also for download?

I tested the patch unchanged and the only elements that don't get drawn at the
correct position are flash applets in firefox everything else on the desktop
works fine. (Tested on a radeon r300 based card)
Comment 11 Michel Dänzer 2006-01-23 00:47:03 UTC
Thomas, could you attach a patch of your current code for testing? Thanks.
Comment 12 Thomas Hellström 2006-01-23 10:46:28 UTC
Created attachment 4434 [details]
Timings on unichrome K8M800 / AMD64

Unichrome K8M800 EXA upload / download timings. Note that this is on an AMD64
which is particularly slow for these types of transfer. Upload is actually
implemented by pipelining a memcopy to AGP and texture from there to
framebuffer, but this only gives an advantage on fast processors (not via C3).
Download is implemented using PCI SG DMA.

Upload turned out to be approx 3X faster than without the patch.
Download about 20X faster.

/Thomas
Comment 13 Thomas Hellström 2006-01-23 10:50:30 UTC
Created attachment 4435 [details] [review]
Adam's patch 4342 with some minor fixes applied.

The patch I used for the test. Basically identical to 4342 with some of the
above fixes hacked in. Made against monolithic exa dir.
I still get some rendering errors with some fallback paths, but this might be
due to Unichrome's planemask problems? Haven't had time to look into it
further.

/Thomas
Comment 14 Dennis Jacobfeuerborn 2006-01-23 13:33:59 UTC
Created attachment 4436 [details]
Timings on a Radeon 9500pro/AthlonXP 1600+
Comment 15 Adam Jackson 2006-03-07 08:36:33 UTC
(In reply to comment #14)
> Created an attachment (id=4436) [edit]
> Timings on a Radeon 9500pro/AthlonXP 1600+

it should be noted that since the radeon driver just uses memcpy to implement
DFS, the lack of change in GetImage numbers is totally expected.
Comment 16 Michel Dänzer 2006-03-10 02:14:46 UTC
Created attachment 4871 [details] [review]
Merge -7 to current EXA

FWIW, I've been using the -7 patch for a while and haven't noticed any issues.
Here's my merge of it to current EXA. Only compile-tested so far. The patch
also removes two unused variables and fixes the move-in condition in
EnableDisableFBAccess.
Comment 17 Eric Anholt 2006-03-30 08:31:01 UTC
I've just committed a patch for EXA to do GetImage acceleration for the only
case (bitsPerPixel >= 8, ZPixmap, pm is like FB_ALLONES) that I think matters,
based slightly on the attachment #4871 [details] [review].  I'm still interested in pulling out the
interesting case of PutImage, and looking at the SHM questions.

I'm really not interested in the XYPixmap (for example) support, particuarly
given how many bugs I've noticed in it so far.  I'll go ahead and take this bug
as maintainer, at this point.
Comment 18 Eric Anholt 2006-03-30 15:27:59 UTC
And now I've committed some PutImage acceleration.  It's a similar limited set
of operations accelerated, but I think they happen to be the ones we care about.

So, I think "Implement X{Get,Put}Image acceleration for EXA" is done at this
point and I'll mark it resolved.  While the original request to have this done
for nv isn't complete, I think this bug has been sufficiently hijacked that
nobody should want to continue talking about NV in here, and would just open a
new bug to deal with actual nv exa-fication.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.