After trying various demos in Firefox 4 and Chromium, Firefox seems to lag behind a lot of the time. Mozilla devs are claiming this is an issue in radeon driver. See https://bugzilla.mozilla.org/show_bug.cgi?id=620065 for reference I don't see how this makes much sense since the driver is the same in both cases. Anyone care to comment?
lspci: 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145] (prog-if 00 [VGA controller]) Subsystem: Dell Device [1028:2003] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 1: I/O ports at ee00 [size=256] Region 2: Memory at efdf0000 (32-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at efd00000 [disabled] [size=128K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v1) Legacy Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0100c Data: 4189 Kernel driver in use: radeon Kernel modules: radeon
It's not clear to me whether this is about Firefox using OpenGL or XRender. If the bad performance is accompanied by high CPU usage, a profile from sysprof or oprofile might be illuminating.
Created attachment 44756 [details] sysprof output for HWACCEL demo Xrender. Attaching sysprof from HWACCEL run (http://demos.hacks.mozilla.org/openweb/HWACCEL/).
Created attachment 44757 [details] sysprof output for fluidSim demo fluidSim demo http://nerget.com/fluidSim/
Created attachment 44758 [details] sysprof output for jsnes emulator jsnes emulator http://benfirshman.com/projects/jsnes/
(In reply to comment #3) > Xrender. Attaching sysprof from HWACCEL run Indeed, this profile shows it hitting a software fallback for the XRender Composite operation. If you can rebuild the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa_shared.h, the debugging messages enabled by that might give an idea. The other profiles show most of the CPU usage in the Firefox process, so they may be separate problems elsewhere.
Created attachment 44764 [details] sysprof output for HWACCEL demo on chromium Okay, I'll try rebuilding the xorg-x11-drv-ati package (I'm on Fedora 14, if it makes any difference). Where should I look for the traces afterwards? Meanwhile, attaching sysprof output from running HWACCEL demo on Chromium - it runs much faster, maybe there are some clues there...
Got xorg-x11-drv-ati-6.13.1-0.4.20100705git37b348059.fc14.src.rpm -- but I don't see radeon_exa_shared.h in there. Is it just way too old? If I need to build/install from the GIT, is there a quick FAQ on how to get that started?
Created attachment 44765 [details] sysprof output for HWACCEL demo on bleeding edge driver I've built and installed the bleeding edge ati driver, attached is the sysprof output for it. It runs faster (6FPS vs. previous 1FPS) but still much slower than Chromium (which runs at 10FPS, regardless - guess they use internal sw renderer after all) Note: this sysprof was taken with #define RADEON_TRACE_FALL 0
Created attachment 44766 [details] Xorg.0.log for HWACCEL demo on bleeding edge driver with tracing enabled ran HWACCEL demo twice after logging in
(In reply to comment #9) > I've built and installed the bleeding edge ati driver, attached is the sysprof > output for it. It runs faster (6FPS vs. previous 1FPS) but still much slower > than Chromium (which runs at 10FPS, regardless - guess they use internal sw > renderer after all) Yes, Chromium doesn't use XRender, that's why they're fast.
I can only see two causes for fallbacks in the log file: R300CheckComposite: Component alpha not supported with source alpha and source value blending. R300CheckComposite: Source w/h too large (640,7760). The former is usually related to text sub-pixel anti-aliasing and harmless, as EXA can still accelerate that well in two passes. The latter shows a source picture being higher than the maximum supported by your GPU (4096). Though I suspect that even if this gets fixed in Firefox or the test, there might be more issues down the road. P.S. This test also runs very slowly (about 1 FPS) here on a Mac Pro running Mac OS X...
(In reply to comment #12) > The latter shows a source picture being higher than the maximum supported by > your GPU (4096). So if I understand this correctly, Firefox asks the hardware to render an image way off screen? How come there isn't some fast rejection in place to clip the part that won't be visible on screen anyway? > Though I suspect that even if this gets fixed in Firefox or > the test, there might be more issues down the road. Are you saying that this is a bug in Firefox? Because they claim it's a buggy XRender implementation... > P.S. This test also runs very slowly (about 1 FPS) here on a Mac Pro running > Mac OS X... So since other lemmings are jumping off a cliff.. :) I think that if Chromium is able to run this demo at 10FPS in userland software renderer, there must be some things that can be done to speed this up with hardware access...
(In reply to comment #13) > (In reply to comment #12) > > The latter shows a source picture being higher than the maximum supported by > > your GPU (4096). > > So if I understand this correctly, Firefox asks the hardware to render an > image way off screen? No. It's doing a composite operation with a source picture of height 7760, but the 3D engine of your GPU only supports textures up to height 4096.
So what's the status of the bug? Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or something else?
(In reply to comment #15) > So what's the status of the bug? > Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or > something else? The source images are larger than the hardware can handle: R300CheckComposite: Source w/h too large (640,7760). The max dimensions your card can handle are 4096x4096. 7760 is almost twice tall as the card's hardware limits.
(In reply to comment #16) > (In reply to comment #15) > > So what's the status of the bug? > > Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or > > something else? > > The source images are larger than the hardware can handle: > R300CheckComposite: Source w/h too large (640,7760). > The max dimensions your card can handle are 4096x4096. 7760 is almost twice > tall as the card's hardware limits. The website in question should use smaller images if it wants to work on a wider range of hardware.
If you look at the code on the page, it does indeed have a single image that's 640x7760 in size, but it only renders a 640x480 part of it on each draw request: ctx.drawImage(img, 0, offset, 640, 480, 0, 0, 64, 48); My guess is that since this is a single call, the driver rejects the operation from being done in hardware (because the source image is too large), and switches fully to the software rendering -- software then "crops" the image and performs all the necessary transformations. If that's the case, can't the transformation be done in hardware after the crop?
Okay, for the sake of argument I've modified the demo to use a 1920x1940 source image. When running it, I get this fallback warning, instead: R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source
(In reply to comment #19) > Okay, for the sake of argument I've modified the demo to use a 1920x1940 source > image. When running it, I get this fallback warning, instead: > > R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source Commented out the block that causes this and rendering speed went up to 24FPS in the modified demo. Is this a flag that Firefox should flip on its side (as per https://bugs.freedesktop.org/show_bug.cgi?id=27139#c11)?
(In reply to comment #20) > (In reply to comment #19) > > R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source > > Commented out the block that causes this and rendering speed went up to 24FPS > in the modified demo. Does that result in incorrect rendering? If not, Firefox/Cairo could probably use RepeatPad instead of RepeatNone in this case. Otherwise, the fallback could be avoided by using a source picture format with an alpha channel.
(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #19) > > > R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source > > > > Commented out the block that causes this and rendering speed went up to 24FPS > > in the modified demo. > > Does that result in incorrect rendering? If not, Firefox/Cairo could probably > use RepeatPad instead of RepeatNone in this case. Otherwise, the fallback could > be avoided by using a source picture format with an alpha channel. Didn't look obviously incorrect, though TBH it was moving so fast I wouldn't notice. I'm sure it can be slowed down and checked pixel-by-pixel. As for alpha channel -- is the caller (Firefox) made aware of this limitation, i.e. can it add the alpha channel automatically?
(In reply to comment #22) > Didn't look obviously incorrect, though TBH it was moving so fast I wouldn't > notice. I'm sure it can be slowed down and checked pixel-by-pixel. I'd expect the problem to be more obvious than that. Basically, something being (not) there when it's (not) supposed to around the edges of the transformed picture. > As for alpha channel -- is the caller (Firefox) made aware of this limitation, > i.e. can it add the alpha channel automatically? No, I'm afraid users of the RENDER extension have to actively avoid the quirks of its semantics.
One way to hardware accelerate it would be to treat the texture as argb, but first copy the alpha channel of the border pixels somewhere else, then replace the alpha channel with 0xff, then copy the old alpha channel back. Also, with: Option "ShadowFB" "true" Option "noaccel" and pixman master I get 13-18 FPS.
(In reply to comment #24) > One way to hardware accelerate it would be to treat the texture as argb, but > first copy the alpha channel of the border pixels somewhere else, then replace > the alpha channel with 0xff, then copy the old alpha channel back. > > Also, with: > > Option "ShadowFB" "true" > Option "noaccel" > > and pixman master I get 13-18 FPS. Hopefully that won't be necessary -- looks like Moz guys are planning to switch to RepeatPad But how come ShadowFB+noaccel is so much faster than fallback? (I see ~5FPS in that case)
Does the new generation hardware still have the source image size limits? I'm guessing they went up but not to infinity. I've created a test with 20x15,520 image -- it's obviously slow on my X1400m laptop but runs fast on a Windows desktop with a fairly old ATI card and another linux desktop with a slightly newer nvidia card and proprietary driver. There must be something wrong with the driver, after all. That is to say, looks like the proprietary drivers optimize for this somehow
(In reply to comment #24) > One way to hardware accelerate it would be to treat the texture as argb, but > first copy the alpha channel of the border pixels somewhere else, then replace > the alpha channel with 0xff, then copy the old alpha channel back. Or keep a shadow picture. Are you volunteering? :) (In reply to comment #25) > But how come ShadowFB+noaccel is so much faster than fallback? Alternating between hardware acceleration and software fallbacks incurs GPU<->CPU synchronization and memory migration overhead. (In reply to comment #26) > Does the new generation hardware still have the source image size limits? The texture size limits advertised via OpenGL or Direct3D usually reflect the hardware capabilities. > There must be something wrong with the driver, after all. That is to say, looks > like the proprietary drivers optimize for this somehow It's mostly a matter of (lack of) manpower to spend on such workarounds. At least in this case, it seems like it should be easy for Firefox / cairo to avoid the problem, which will also benefit already deployed X stacks.
(In reply to comment #26) > Does the new generation hardware still have the source image size limits? radeon family - max texture coord r1xx-r4xx - 2k r5xx - 4k r6xx-r7xx - 8k evergreen-ni - 16k Other vendors have similar limits for their chips in the same time periods.
(In reply to comment #27) > (In reply to comment #26) > > There must be something wrong with the driver, after all. That is to say, looks > > like the proprietary drivers optimize for this somehow > > It's mostly a matter of (lack of) manpower to spend on such workarounds. At > least in this case, it seems like it should be easy for Firefox / cairo to > avoid the problem, which will also benefit already deployed X stacks. I think they don't feel they have to do this since most major drivers are already optimized for it. How involved would a fix be? I assume it's probably somewhat harder than casting a max texture size onto the crazy-sized image... I'd be interested in helping out if you can tell me where to dig :) (In reply to comment #28) > (In reply to comment #26) > > Does the new generation hardware still have the source image size limits? > > radeon family - max texture coord > r1xx-r4xx - 2k > r5xx - 4k > r6xx-r7xx - 8k > evergreen-ni - 16k > > Other vendors have similar limits for their chips in the same time periods. Thanks! Should I make a 30k-long test instead of 15k? Then you can try it on an evergreen card and see how it's still slow :( Main problem is that the "design pattern" for web developers is copy/paste without understanding of the code. And those who don't do that, tend to optimize for the most common -- D2D/D3D on Windows, fglrx/nvidia on Linux Same for Firefox/Cairo guys, really - they'd rather call the driver "broken" and get on with something more fun.
(In reply to comment #29) > I'd be interested in helping out if you can tell me where to dig :) Basically, look at the cases where the driver is deciding to fall back to software, understand the reasons for that decision, and think about possible ways to work around them. Note that it may make more sense to do the workaround in EXA itself or the driver, depending on the situation.
(In reply to comment #30) > (In reply to comment #29) > > I'd be interested in helping out if you can tell me where to dig :) > > Basically, look at the cases where the driver is deciding to fall back to > software, understand the reasons for that decision, and think about possible > ways to work around them. Note that it may make more sense to do the workaround > in EXA itself or the driver, depending on the situation. Well, yeah.. I know that much myself and I've already checked out where the fallback happens (R300CheckComposite). Looks like exa calls CheckComposite in advance to prevent switching back and forth between hardware & software rendering. I was hoping you could tell me the most likely places to check and/or maybe even a suggestion for how you would go about fixing it. As for exa - would I need to recompile the whole X to make a change to it or can I play with it as a module?
(In reply to comment #31) > Looks like exa calls CheckComposite in advance to prevent switching back and > forth between hardware & software rendering. Not really, the CheckComposite hook allows the driver to quickly reject operations it won't be able to accelerate, without incurring any overhead required before the PrepareComposite hook, e.g. for migrating pixmap contents into GPU accessible memory. > I was hoping you could tell me the most likely places to check and/or maybe > even a suggestion for how you would go about fixing it. See comment #24 and comment #27 for some examples, but if there were clear, simple steps, somebody probably would have taken them... It would probably be better to take this to the xorg-devel mailing list. > As for exa - would I need to recompile the whole X to make a change to it or > can I play with it as a module? make -C exa && make -C hw/xfree86/exa gives you hw/xfree86/exa/.libs/libexa.so with any changes you made. Obviously, if the changes affect the ABI between EXA and the driver (which should be avoided if feasible for a final solution, but might be useful for prototyping) or the rest of the X server, those will need rebuilding as well.
I tried http://demos.hacks.mozilla.org/openweb/HWACCEL/ on 2 PCs with Firefox 5 Celeron 1200, Radeon 9600 XT: 1 fps Core 2 3200, Geforce 7300 GT: 12 fps
Firefox 6 switches to RepeatPad for canvas, which I think covers this demo. Firefox 7 is needed to completely avoid RepeatNone in all non-canvas images.
Well, actually, Firefox 7 avoids cairo's EXTEND_NONE (AFAICS), but cairo turns EXTEND_PAD into RepeatNone when it thinks the extend/repeat doesn't matter. http://cgit.freedesktop.org/cairo/tree/src/cairo-pattern.c?id=ae2b7b13cd5fdeaee44496056bb99f497346e262#n2428
(In reply to comment #26) > There must be something wrong with the driver, after all. That is to say, looks > like the proprietary drivers optimize for this somehow Some better than others I guess. (At least some) NVIDIA drivers "optimize" by implementing RepeatPad incorrectly by extending with black. https://bugzilla.mozilla.org/show_bug.cgi?id=636192
Ref. bug 43397 - claims bug is not in Cairo. Please discuss
Mass closure: This bug has been untouched for more than six years, and is not obviously still valid. Please reopen this bug or file a new report if you continue to experience issues with current releases.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.