Bug 35579 - Poor performance in Firefox 4
Summary: Poor performance in Firefox 4
Status: RESOLVED INVALID
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on: 43397
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-22 20:12 UTC by Konstantin Svist
Modified: 2018-06-12 19:08 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
sysprof output for HWACCEL demo (155.01 KB, application/x-bzip2)
2011-03-23 08:13 UTC, Konstantin Svist
no flags Details
sysprof output for fluidSim demo (90.06 KB, application/x-bzip2)
2011-03-23 08:15 UTC, Konstantin Svist
no flags Details
sysprof output for jsnes emulator (215.83 KB, application/x-bzip2)
2011-03-23 08:16 UTC, Konstantin Svist
no flags Details
sysprof output for HWACCEL demo on chromium (147.84 KB, application/x-bzip2)
2011-03-23 11:13 UTC, Konstantin Svist
no flags Details
sysprof output for HWACCEL demo on bleeding edge driver (117.65 KB, application/x-bzip2)
2011-03-23 13:48 UTC, Konstantin Svist
no flags Details
Xorg.0.log for HWACCEL demo on bleeding edge driver with tracing enabled (38.76 KB, application/x-bzip2)
2011-03-23 13:51 UTC, Konstantin Svist
no flags Details

Description Konstantin Svist 2011-03-22 20:12:50 UTC
After trying various demos in Firefox 4 and Chromium, Firefox seems to lag behind a lot of the time. Mozilla devs are claiming this is an issue in radeon driver.

See https://bugzilla.mozilla.org/show_bug.cgi?id=620065 for reference

I don't see how this makes much sense since the driver is the same in both cases.
Anyone care to comment?
Comment 1 Konstantin Svist 2011-03-22 20:14:23 UTC
lspci:

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145] (prog-if 00 [VGA controller])
	Subsystem: Dell Device [1028:2003]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 43
	Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
	Region 1: I/O ports at ee00 [size=256]
	Region 2: Memory at efdf0000 (32-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at efd00000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v1) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0100c  Data: 4189
	Kernel driver in use: radeon
	Kernel modules: radeon
Comment 2 Michel Dänzer 2011-03-23 01:43:30 UTC
It's not clear to me whether this is about Firefox using OpenGL or XRender. If the bad performance is accompanied by high CPU usage, a profile from sysprof or oprofile might be illuminating.
Comment 3 Konstantin Svist 2011-03-23 08:13:27 UTC
Created attachment 44756 [details]
sysprof output for HWACCEL demo

Xrender. Attaching sysprof from HWACCEL run (http://demos.hacks.mozilla.org/openweb/HWACCEL/).
Comment 4 Konstantin Svist 2011-03-23 08:15:00 UTC
Created attachment 44757 [details]
sysprof output for fluidSim demo

fluidSim demo http://nerget.com/fluidSim/
Comment 5 Konstantin Svist 2011-03-23 08:16:07 UTC
Created attachment 44758 [details]
sysprof output for jsnes emulator

jsnes emulator http://benfirshman.com/projects/jsnes/
Comment 6 Michel Dänzer 2011-03-23 09:14:41 UTC
(In reply to comment #3)
> Xrender. Attaching sysprof from HWACCEL run

Indeed, this profile shows it hitting a software fallback for the XRender Composite operation. If you can rebuild the driver with RADEON_TRACE_FALL defined to 1 in src/radeon_exa_shared.h, the debugging messages enabled by that might give an idea.


The other profiles show most of the CPU usage in the Firefox process, so they may be separate problems elsewhere.
Comment 7 Konstantin Svist 2011-03-23 11:13:39 UTC
Created attachment 44764 [details]
sysprof output for HWACCEL demo on chromium

Okay, I'll try rebuilding the xorg-x11-drv-ati package (I'm on Fedora 14, if it makes any difference). Where should I look for the traces afterwards?

Meanwhile, attaching sysprof output from running HWACCEL demo on Chromium - it runs much faster, maybe there are some clues there...
Comment 8 Konstantin Svist 2011-03-23 11:17:29 UTC
Got xorg-x11-drv-ati-6.13.1-0.4.20100705git37b348059.fc14.src.rpm -- but I don't see radeon_exa_shared.h in there. Is it just way too old?
If I need to build/install from the GIT, is there a quick FAQ on how to get that started?
Comment 9 Konstantin Svist 2011-03-23 13:48:50 UTC
Created attachment 44765 [details]
sysprof output for HWACCEL demo on bleeding edge driver

I've built and installed the bleeding edge ati driver, attached is the sysprof output for it. It runs faster (6FPS vs. previous 1FPS) but still much slower than Chromium (which runs at 10FPS, regardless - guess they use internal sw renderer after all)

Note: this sysprof was taken with #define RADEON_TRACE_FALL 0
Comment 10 Konstantin Svist 2011-03-23 13:51:02 UTC
Created attachment 44766 [details]
Xorg.0.log for HWACCEL demo on bleeding edge driver with tracing enabled

ran HWACCEL demo twice after logging in
Comment 11 Robert O'Callahan 2011-03-23 15:14:28 UTC
(In reply to comment #9)
> I've built and installed the bleeding edge ati driver, attached is the sysprof
> output for it. It runs faster (6FPS vs. previous 1FPS) but still much slower
> than Chromium (which runs at 10FPS, regardless - guess they use internal sw
> renderer after all)

Yes, Chromium doesn't use XRender, that's why they're fast.
Comment 12 Michel Dänzer 2011-03-24 00:28:20 UTC
I can only see two causes for fallbacks in the log file:

R300CheckComposite: Component alpha not supported with source alpha and source value blending.
R300CheckComposite: Source w/h too large (640,7760).

The former is usually related to text sub-pixel anti-aliasing and harmless, as EXA can still accelerate that well in two passes.

The latter shows a source picture being higher than the maximum supported by your GPU (4096). Though I suspect that even if this gets fixed in Firefox or the test, there might be more issues down the road.

P.S. This test also runs very slowly (about 1 FPS) here on a Mac Pro running Mac OS X...
Comment 13 Konstantin Svist 2011-03-24 01:20:38 UTC
(In reply to comment #12)
> The latter shows a source picture being higher than the maximum supported by
> your GPU (4096).

So if I understand this correctly, Firefox asks the hardware to render an image way off screen? How come there isn't some fast rejection in place to clip the part that won't be visible on screen anyway?


>  Though I suspect that even if this gets fixed in Firefox or
> the test, there might be more issues down the road.

Are you saying that this is a bug in Firefox? Because they claim it's a buggy XRender implementation...


> P.S. This test also runs very slowly (about 1 FPS) here on a Mac Pro running
> Mac OS X...

So since other lemmings are jumping off a cliff.. :)
I think that if Chromium is able to run this demo at 10FPS in userland software renderer, there must be some things that can be done to speed this up with hardware access...
Comment 14 Michel Dänzer 2011-03-24 01:25:48 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > The latter shows a source picture being higher than the maximum supported by
> > your GPU (4096).
> 
> So if I understand this correctly, Firefox asks the hardware to render an
> image way off screen?

No. It's doing a composite operation with a source picture of height 7760, but the 3D engine of your GPU only supports textures up to height 4096.
Comment 15 Konstantin Svist 2011-03-24 13:04:10 UTC
So what's the status of the bug?
Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or something else?
Comment 16 Alex Deucher 2011-03-24 14:56:48 UTC
(In reply to comment #15)
> So what's the status of the bug?
> Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or
> something else?

The source images are larger than the hardware can handle:
R300CheckComposite: Source w/h too large (640,7760).
The max dimensions your card can handle are 4096x4096.  7760 is almost twice tall as the card's hardware limits.
Comment 17 Alex Deucher 2011-03-24 14:57:52 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > So what's the status of the bug?
> > Is it an accepted bug of the driver? Or is it a bug in caller (Firefox)? Or
> > something else?
> 
> The source images are larger than the hardware can handle:
> R300CheckComposite: Source w/h too large (640,7760).
> The max dimensions your card can handle are 4096x4096.  7760 is almost twice
> tall as the card's hardware limits.

The website in question should use smaller images if it wants to work on a wider range of hardware.
Comment 18 Konstantin Svist 2011-03-24 15:41:21 UTC
If you look at the code on the page, it does indeed have a single image that's 640x7760 in size, but it only renders a 640x480 part of it on each draw request:
ctx.drawImage(img, 0, offset, 640, 480, 0, 0, 64, 48);

My guess is that since this is a single call, the driver rejects the operation from being done in hardware (because the source image is too large), and switches fully to the software rendering -- software then "crops" the image and performs all the necessary transformations.
If that's the case, can't the transformation be done in hardware after the crop?
Comment 19 Konstantin Svist 2011-03-24 17:20:57 UTC
Okay, for the sake of argument I've modified the demo to use a 1920x1940 source image. When running it, I get this fallback warning, instead:

R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source
Comment 20 Konstantin Svist 2011-03-24 22:39:36 UTC
(In reply to comment #19)
> Okay, for the sake of argument I've modified the demo to use a 1920x1940 source
> image. When running it, I get this fallback warning, instead:
> 
> R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source

Commented out the block that causes this and rendering speed went up to 24FPS in the modified demo. Is this a flag that Firefox should flip on its side (as per https://bugs.freedesktop.org/show_bug.cgi?id=27139#c11)?
Comment 21 Michel Dänzer 2011-03-25 04:21:28 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source
> 
> Commented out the block that causes this and rendering speed went up to 24FPS
> in the modified demo.

Does that result in incorrect rendering? If not, Firefox/Cairo could probably use RepeatPad instead of RepeatNone in this case. Otherwise, the fallback could be avoided by using a source picture format with an alpha channel.
Comment 22 Konstantin Svist 2011-03-25 05:34:57 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > R300CheckCompositeTexture: REPEAT_NONE unsupported for transformed xRGB source
> > 
> > Commented out the block that causes this and rendering speed went up to 24FPS
> > in the modified demo.
> 
> Does that result in incorrect rendering? If not, Firefox/Cairo could probably
> use RepeatPad instead of RepeatNone in this case. Otherwise, the fallback could
> be avoided by using a source picture format with an alpha channel.


Didn't look obviously incorrect, though TBH it was moving so fast I wouldn't notice. I'm sure it can be slowed down and checked pixel-by-pixel.
As for alpha channel -- is the caller (Firefox) made aware of this limitation, i.e. can it add the alpha channel automatically?
Comment 23 Michel Dänzer 2011-03-25 11:10:12 UTC
(In reply to comment #22)
> Didn't look obviously incorrect, though TBH it was moving so fast I wouldn't
> notice. I'm sure it can be slowed down and checked pixel-by-pixel.

I'd expect the problem to be more obvious than that. Basically, something being (not) there when it's (not) supposed to around the edges of the transformed picture.

> As for alpha channel -- is the caller (Firefox) made aware of this limitation,
> i.e. can it add the alpha channel automatically?

No, I'm afraid users of the RENDER extension have to actively avoid the quirks of its semantics.
Comment 24 Søren Sandmann Pedersen 2011-03-25 17:56:16 UTC
One way to hardware accelerate it would be to treat the texture as argb, but first copy the alpha channel of the border pixels somewhere else, then replace the alpha channel with 0xff, then copy the old alpha channel back.

Also, with:

        Option "ShadowFB" "true"
        Option "noaccel"

and pixman master I get 13-18 FPS.
Comment 25 Konstantin Svist 2011-03-25 18:22:15 UTC
(In reply to comment #24)
> One way to hardware accelerate it would be to treat the texture as argb, but
> first copy the alpha channel of the border pixels somewhere else, then replace
> the alpha channel with 0xff, then copy the old alpha channel back.
> 
> Also, with:
> 
>         Option "ShadowFB" "true"
>         Option "noaccel"
> 
> and pixman master I get 13-18 FPS.

Hopefully that won't be necessary -- looks like Moz guys are planning to switch to RepeatPad

But how come ShadowFB+noaccel is so much faster than fallback? (I see ~5FPS in that case)
Comment 26 Konstantin Svist 2011-03-25 23:28:07 UTC
Does the new generation hardware still have the source image size limits? I'm guessing they went up but not to infinity. I've created a test with 20x15,520 image -- it's obviously slow on my X1400m laptop but runs fast on a Windows desktop with a fairly old ATI card and another linux desktop with a slightly newer nvidia card and proprietary driver.

There must be something wrong with the driver, after all. That is to say, looks like the proprietary drivers optimize for this somehow
Comment 27 Michel Dänzer 2011-03-26 01:42:05 UTC
(In reply to comment #24)
> One way to hardware accelerate it would be to treat the texture as argb, but
> first copy the alpha channel of the border pixels somewhere else, then replace
> the alpha channel with 0xff, then copy the old alpha channel back.

Or keep a shadow picture. Are you volunteering? :)


(In reply to comment #25)
> But how come ShadowFB+noaccel is so much faster than fallback?

Alternating between hardware acceleration and software fallbacks incurs GPU<->CPU synchronization and memory migration overhead.


(In reply to comment #26)
> Does the new generation hardware still have the source image size limits?

The texture size limits advertised via OpenGL or Direct3D usually reflect the hardware capabilities.

> There must be something wrong with the driver, after all. That is to say, looks
> like the proprietary drivers optimize for this somehow

It's mostly a matter of (lack of) manpower to spend on such workarounds. At least in this case, it seems like it should be easy for Firefox / cairo to avoid the problem, which will also benefit already deployed X stacks.
Comment 28 Alex Deucher 2011-03-26 09:38:57 UTC
(In reply to comment #26)
> Does the new generation hardware still have the source image size limits? 

radeon family - max texture coord
r1xx-r4xx     - 2k
r5xx          - 4k
r6xx-r7xx     - 8k
evergreen-ni  - 16k

Other vendors have similar limits for their chips in the same time periods.
Comment 29 Konstantin Svist 2011-03-26 10:56:46 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > There must be something wrong with the driver, after all. That is to say, looks
> > like the proprietary drivers optimize for this somehow
> 
> It's mostly a matter of (lack of) manpower to spend on such workarounds. At
> least in this case, it seems like it should be easy for Firefox / cairo to
> avoid the problem, which will also benefit already deployed X stacks.

I think they don't feel they have to do this since most major drivers are already optimized for it.
How involved would a fix be? I assume it's probably somewhat harder than casting a max texture size onto the crazy-sized image...
I'd be interested in helping out if you can tell me where to dig :)


(In reply to comment #28)
> (In reply to comment #26)
> > Does the new generation hardware still have the source image size limits? 
> 
> radeon family - max texture coord
> r1xx-r4xx     - 2k
> r5xx          - 4k
> r6xx-r7xx     - 8k
> evergreen-ni  - 16k
> 
> Other vendors have similar limits for their chips in the same time periods.

Thanks! Should I make a 30k-long test instead of 15k? Then you can try it on an evergreen card and see how it's still slow :(

Main problem is that  the "design pattern" for web developers is copy/paste without understanding of the code. And those who don't do that, tend to optimize for the most common -- D2D/D3D on Windows, fglrx/nvidia on Linux
Same for Firefox/Cairo guys, really - they'd rather call the driver "broken" and get on with something more fun.
Comment 30 Michel Dänzer 2011-03-29 06:10:05 UTC
(In reply to comment #29)
> I'd be interested in helping out if you can tell me where to dig :)

Basically, look at the cases where the driver is deciding to fall back to software, understand the reasons for that decision, and think about possible ways to work around them. Note that it may make more sense to do the workaround in EXA itself or the driver, depending on the situation.
Comment 31 Konstantin Svist 2011-03-29 13:09:25 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > I'd be interested in helping out if you can tell me where to dig :)
> 
> Basically, look at the cases where the driver is deciding to fall back to
> software, understand the reasons for that decision, and think about possible
> ways to work around them. Note that it may make more sense to do the workaround
> in EXA itself or the driver, depending on the situation.

Well, yeah.. I know that much myself and I've already checked out where the fallback happens (R300CheckComposite). Looks like exa calls CheckComposite in advance to prevent switching back and forth between hardware & software rendering.
I was hoping you could tell me the most likely places to check and/or maybe even a suggestion for how you would go about fixing it.

As for exa - would I need to recompile the whole X to make a change to it or can I play with it as a module?
Comment 32 Michel Dänzer 2011-03-30 00:35:59 UTC
(In reply to comment #31)
> Looks like exa calls CheckComposite in advance to prevent switching back and
> forth between hardware & software rendering.

Not really, the CheckComposite hook allows the driver to quickly reject operations it won't be able to accelerate, without incurring any overhead required before the PrepareComposite hook, e.g. for migrating pixmap contents into GPU accessible memory.


> I was hoping you could tell me the most likely places to check and/or maybe
> even a suggestion for how you would go about fixing it.

See comment #24 and comment #27 for some examples, but if there were clear, simple steps, somebody probably would have taken them...

It would probably be better to take this to the xorg-devel mailing list.


> As for exa - would I need to recompile the whole X to make a change to it or
> can I play with it as a module?

make -C exa && make -C hw/xfree86/exa

gives you hw/xfree86/exa/.libs/libexa.so with any changes you made. Obviously, if the changes affect the ABI between EXA and the driver (which should be avoided if feasible for a final solution, but might be useful for prototyping) or the rest of the X server, those will need rebuilding as well.
Comment 33 wbrana 2011-07-13 04:53:26 UTC
I tried http://demos.hacks.mozilla.org/openweb/HWACCEL/
on 2 PCs with Firefox 5
Celeron 1200, Radeon 9600 XT: 1 fps
Core 2 3200, Geforce 7300 GT: 12 fps
Comment 34 Karl Tomlinson 2011-07-13 14:37:44 UTC
Firefox 6 switches to RepeatPad for canvas, which I think covers this demo.
Firefox 7 is needed to completely avoid RepeatNone in all non-canvas images.
Comment 35 Karl Tomlinson 2011-08-21 21:15:51 UTC
Well, actually, Firefox 7 avoids cairo's EXTEND_NONE (AFAICS), but cairo turns EXTEND_PAD into RepeatNone when it thinks the extend/repeat doesn't matter.

http://cgit.freedesktop.org/cairo/tree/src/cairo-pattern.c?id=ae2b7b13cd5fdeaee44496056bb99f497346e262#n2428
Comment 36 Karl Tomlinson 2011-08-21 21:28:55 UTC
(In reply to comment #26)
> There must be something wrong with the driver, after all. That is to say, looks
> like the proprietary drivers optimize for this somehow

Some better than others I guess.  (At least some) NVIDIA drivers "optimize" by implementing RepeatPad incorrectly by extending with black.
https://bugzilla.mozilla.org/show_bug.cgi?id=636192
Comment 37 Konstantin Svist 2011-12-05 20:38:51 UTC
Ref. bug 43397 - claims bug is not in Cairo. Please discuss
Comment 38 Adam Jackson 2018-06-12 19:08:13 UTC
Mass closure: This bug has been untouched for more than six years, and is not
obviously still valid. Please reopen this bug or file a new report if you continue to experience issues with current releases.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.