Bug 6814 - regression since rotation support added
Summary: regression since rotation support added
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915 (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-02 20:24 UTC by Dave Airlie
Modified: 2006-05-08 02:20 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
log 7.0.0 (61.21 KB, text/plain)
2006-05-03 08:27 UTC, Lukas Hejtmanek
Details

Description Dave Airlie 2006-05-02 20:24:20 UTC
The i915 driver seems to have had a major speed regression since rotation

Using Xorg 7.0 and Mesa 3D just before rotation support gets 1110 FPS.
With the latest driver and Mesa just after rotation support I only 780FPS.
Comment 1 Dave Airlie 2006-05-02 20:26:53 UTC
btw that's with glxgears... but it has been reported in other apps.
Comment 2 Alan Hourihane 2006-05-02 20:44:26 UTC
The problem here is batchbuffers.

Due to rotation being able to shuffle memory around it wasn't possible to
support batchbuffers in agp space as the 2D driver could rip up memory
allocation upon a rotation event. 

So the 3D driver falls back to the cmdbuffer path from system memory (rather
than AGP memory) which can be guaranteed.

To support batchbuffers again we'll need the new memory manager.
Comment 3 Alan Hourihane 2006-05-02 20:50:22 UTC
to activate the batchbuffer path again for now though, you should be able to do...

INTEL_BATCH=1 glxgears

Comment 4 Alan Hourihane 2006-05-02 20:51:18 UTC
This will undoubtably crash things if you rotate though.
Comment 5 Dave Airlie 2006-05-02 21:12:02 UTC
well I've no intention of rotating anything :-),

okay with that I get glxgears back up to 890FPS, which still isn't the 1190  I
was getting before, but better than the non-batch.
Comment 6 Alan Hourihane 2006-05-02 21:27:01 UTC
Might be worth checking MTRR's.
Comment 7 Dave Airlie 2006-05-02 21:43:56 UTC
Wierdly on my Xorg 7.0 build I don't have mtrr's that work, so I don't get mtrrs
at all when I get the 1100 FPS, however I do get MTRRs okay for the newer code... 

If I disable MTRRs on the latest trees, I get 694 FPS instead of the 888 with
batchbuffers enabled.

I'll see what I can discover, I've got a day or two to look into this for a project.
Comment 8 Dave Airlie 2006-05-02 22:49:21 UTC
there is still another problem here...

I'll track down where the FPS go, bit by bit,

The changes to the batchbuffer code seem to be suspect for me at the moment,
adding the code for emit_invarient stuff I lose 100FPS from my 1100...

Comment 9 Lukas Hejtmanek 2006-05-03 02:24:05 UTC
(In reply to comment #2)

> To support batchbuffers again we'll need the new memory manager.

Generally the new memory manager or something that someone is working on? 
If the former case is true then I think there is a place for config option
whether to enable rotation at all (as I believe many ordinary users will prefer
faster 3D than rotation).
Comment 10 Keith Whitwell 2006-05-03 07:22:41 UTC
(In reply to comment #9)
> (In reply to comment #2)
> 
> > To support batchbuffers again we'll need the new memory manager.
> 
> Generally the new memory manager or something that someone is working on? 
> If the former case is true then I think there is a place for config option
> whether to enable rotation at all (as I believe many ordinary users will prefer
> faster 3D than rotation).

It's also worth checking if tiling is being set up correctly for the depth and
back buffers.  Unfortunately I'm not able to do more than make suggestions at
this point...
Comment 11 Lukas Hejtmanek 2006-05-03 08:00:39 UTC
(In reply to comment #10)

> It's also worth checking if tiling is being set up correctly for the depth and
> back buffers.  Unfortunately I'm not able to do more than make suggestions at
> this point...

Looks like they are:

(II) I810(0): MakeTiles failed for the FRONT buffer
(II) I810(0): Activating tiled memory for the back buffer.
(II) I810(0): Activating tiled memory for the depth buffer.

However this may be related. 
Using the same config, Xorg 6.9.0 allocates memory like this:
(II) I810(0): Allocating at least 1280 scanlines for pixmap cache
(II) I810(0): Initial framebuffer allocation size: 8192 kByte
...
(II) I810(0): Allocated 3072 kB for the back buffer at 0xf800000.
(II) I810(0): Allocated 3072 kB for the depth buffer at 0xf400000.
(II) I810(0): Allocated 32 kB for the logical context at 0xf3f8000.
(II) I810(0): Allocated 50432 kB for textures at 0x880000

while Xorg 7.x.x allocates memory like this:
(II) I810(0): Allocating at least 1248 scanlines for pixmap cache
(II) I810(0): Initial framebuffer allocation size: 20224 kByte
...
(II) I810(0): Allocated 32 kB for the logical context at 0xffe2000.
(II) I810(0): Allocated 6400 kB for the back buffer at 0xf000000.
(II) I810(0): Allocated 6400 kB for the depth buffer at 0xe800000.
(II) I810(0): Allocated 31104 kB for textures at 0xc9a0000
Comment 12 Alan Hourihane 2006-05-03 08:19:00 UTC
Sounds to me like you may be running different resolutions between those two runs.
Comment 13 Lukas Hejtmanek 2006-05-03 08:27:49 UTC
Created attachment 5550 [details]
log 7.0.0
Comment 14 Lukas Hejtmanek 2006-05-03 08:38:35 UTC
(In reply to comment #12)
> Sounds to me like you may be running different resolutions between those two runs.

Yes, sorry.

However, setting the same resolution (using BIOS hack) I got:
6.9.0: (II) I810(0): Allocating at least 1248 scanlines for pixmap cache
       (II) I810(0): Initial framebuffer allocation size: 16384 kByte 
7.0.0: (II) I810(0): Allocating at least 1248 scanlines for pixmap cache
       (II) I810(0): Initial framebuffer allocation size: 20224 kByte

(althought, this can be caused by non HDTV XV support in 6.9.0)
Comment 15 Dave Airlie 2006-05-03 09:10:28 UTC
it's not any of the obvious things, I've been messing with these drivers for
long enough to know that :-), 

I'm not getting back to this until tomorrow at the earliest, but on my system
when I left it last night, I had an 1100FPS Mesa tree running on the rotated DDX
driver, I've just ported over the necessary interface changes to get gears to
run with batch buffers,

However the last piece of the patch I applied last night was the code to
emit_invarient_state and allocate the batch buffer differently, once I applied
that I lost 100FPS, I'm expecting I'll find another 100 tomorrow at some point...
Comment 16 Dave Airlie 2006-05-04 13:12:37 UTC
Okay I invalidated some of my previous test results due to me being a dumbass,
and having some drm debug turned on for some of them,

But I tracked it down, Lukas can you test the latest Mesa tree but comment out

intel_batchbuffer.c:768
      /* KW: temporary - this make crashes & lockups more frequent, so
       * leave in until they are solved.
       */
      //intel->alloc.size = 8 * 1024;

Is what I have, that plus INTEL_BATCH has gotten me back most of my FPS in gears..

Comment 17 Keith Whitwell 2006-05-04 14:34:48 UTC
Well, I guess you don't get more guilty than that.  Sorry for the hassle and
thanks Dave for tracking it down...
Comment 18 Lukas Hejtmanek 2006-05-04 17:13:25 UTC
(In reply to comment #16)
> But I tracked it down, Lukas can you test the latest Mesa tree but comment out
> 
> intel_batchbuffer.c:768
>       /* KW: temporary - this make crashes & lockups more frequent, so
>        * leave in until they are solved.
>        */
>       //intel->alloc.size = 8 * 1024;
> 
> Is what I have, that plus INTEL_BATCH has gotten me back most of my FPS in gears..

I can configm glxgears are back to 1100FPS but ppracer still does only 12FPS :(

Is there additional copy of the frame buffer? (compared to 6.9.0 version)
Comment 19 Dave Airlie 2006-05-04 17:47:12 UTC
Okay that's one thing back, I'm not sure what affects ppracer, I've got ppracer
running at 800x600x32-bit at 16FPS without batch and about 20 with batch...

Comment 20 Lukas Hejtmanek 2006-05-04 17:56:51 UTC
(In reply to comment #19)
> Okay that's one thing back, I'm not sure what affects ppracer, I've got ppracer
> running at 800x600x32-bit at 16FPS without batch and about 20 with batch...

I'm running at 1024x768@32bit.
Stencil buffer enabled, Show UI Snow, Reflections, Shadows, FPS, Fog.

Progress bar is disabled.

For me, it's about 11FPS without batch and 13FPS with batch.

Anyway, thanks for tracking down the back buffer issue.

Comment 21 Dave Airlie 2006-05-04 18:21:33 UTC
okay tomorrow I'll go track down the ppracer issue..

I'm seeing ppracer with 1024x768-32 Xorg 7.0 + Mesa 6.4.2 + MTRR enabled by hand
=  about 35FPS.

same setup with latest trees is 14FPS..

I need to sort out these issues for a customer app anyways..
Comment 22 Lukas Hejtmanek 2006-05-04 18:43:55 UTC
(In reply to comment #21)
> okay tomorrow I'll go track down the ppracer issue..
> 
> I'm seeing ppracer with 1024x768-32 Xorg 7.0 + Mesa 6.4.2 + MTRR enabled by hand
> =  about 35FPS.

This should be correct speed, it's the same I got using Xorg 6.9.0 + Mesa 6.4.1.

Btw, did you use INTEL_BATCH quirks for it or Xorg 7.0 + Mesa 6.4.2 + MTRR works
just fine?
Comment 23 Dave Airlie 2006-05-04 18:53:26 UTC
> 
> This should be correct speed, it's the same I got using Xorg 6.9.0 + Mesa 6.4.1.
> 
> Btw, did you use INTEL_BATCH quirks for it or Xorg 7.0 + Mesa 6.4.2 + MTRR works
> just fine?

Xorg 7.0 + Mesa 6.4.2 + hand adding the MTRR, the INTEL_BATCH stuff only matters
when after rotation support was added before that it was the default....

I've got a nice git tree with the Mesa stuff in it and I can move between
branches very easily... 

Comment 24 Lukas Hejtmanek 2006-05-04 19:35:47 UTC
(In reply to comment #23)
> Xorg 7.0 + Mesa 6.4.2 + hand adding the MTRR, the INTEL_BATCH stuff only matters
> when after rotation support was added before that it was the default....
> 
> I've got a nice git tree with the Mesa stuff in it and I can move between
> branches very easily... 

Ah, I thought that rotation support has been included in Xorg since 7.0.
Comment 25 Dave Airlie 2006-05-05 15:36:49 UTC
Okay I eventually found the ppracer regression thanks to the power of git, I
went down a couple of bad alleys in my bisections but I tracked it down to 
Comment 26 Dave Airlie 2006-05-05 15:37:52 UTC
Okay I eventually found the ppracer regression thanks to the power of git, I
went down a couple of bad alleys in my bisections but I tracked it down to 

http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/main/stencil.c?r1=1.34&r2=1.35

causing most of the regression, perhaps Brian or Keith can comment and suggest a
fix?

literally this patch halves the performance of ppracer on i915.
Comment 27 Dave Airlie 2006-05-05 15:39:59 UTC
also compiler optimisation on HEAD are turned off for some reason, it looks like
an accidental commit by Brian should they be on?
Comment 28 Dave Airlie 2006-05-05 16:14:49 UTC
maybe something to do with the check = 0xff in intel_ioctl.c??
Comment 29 Dave Airlie 2006-05-05 16:54:02 UTC
Okay I've checked in two fixes to i915 driver in CVS, one removes Keiths
temporary batchbuffer comment/fix and the other fixes the stencil thing to use
~0U instead of 0xff.

Comment 30 Dave Airlie 2006-05-05 17:07:47 UTC
I think SiS and unichrome might be broken similarly... but I've no hw to test
those on..
Comment 31 Alan Hourihane 2006-05-05 17:30:43 UTC
Nice catch Dave. Maybe worth sending an email off to the mesa3d lists, as people
might not be subscribed to dri-devel.
Comment 32 Lukas Hejtmanek 2006-05-05 17:57:13 UTC
(In reply to comment #27)
> also compiler optimisation on HEAD are turned off for some reason, it looks like
> an accidental commit by Brian should they be on?

Personally, I tweak compiler options like this:
-O3 -fomit-frame-pointer -msse2 -mmmx -ffast-math -march=pentium4 -mfpmath=sse

Thanks for the tracking the bug.
Comment 33 Lukas Hejtmanek 2006-05-05 18:47:31 UTC
(In reply to comment #29)
> Okay I've checked in two fixes to i915 driver in CVS, one removes Keiths
> temporary batchbuffer comment/fix and the other fixes the stencil thing to use
> ~0U instead of 0xff.

I can confirm that ppracer FPS is doubled. 

However, it's still 2-3FPS slower than 6.9.0 version, but this can be due to
rotation overhead or something like that.
Comment 34 Keith Whitwell 2006-05-05 18:59:49 UTC
(In reply to comment #26)
> Okay I eventually found the ppracer regression thanks to the power of git, I
> went down a couple of bad alleys in my bisections but I tracked it down to 
> 
> http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/main/stencil.c?r1=1.34&r2=1.35
> 
> causing most of the regression, perhaps Brian or Keith can comment and suggest a
> fix?

The code you've committed looks good, though I think for generality it's better
to do something like

#define I915_STENCIL_MASK 0xff

and use the condition:

if ((ctx->Stencil.Xyz & I915_STENCIL_MASK) == I915_STENCIL_MASK)

The trouble with directly testing against ~0u is that an application might
specify 0xff explicitly for that value (as it knows how deep the stencil buffer
is) and we want the same behaviour in either case - the hardware only cares
about the low 8 bits.

You're correct that several other drivers are affected by the change also.
Comment 35 Brian Paul 2006-05-06 04:41:28 UTC
Dave, I'm not sure which accidental commit you're referring to.  I
reordered/cleaned-up CFLAGS in a few places but I don't think I made any net
changes to the flags.

Go ahead and fix whatever you think needs to be done.
Comment 36 Dave Airlie 2006-05-06 13:14:44 UTC
well maybe it wasn't accidental just the checkin message didn't reflect the
checkin :)

http://webcvs.freedesktop.org/mesa/Mesa/configs/linux-dri?r1=1.34&r2=1.35

you dropped the -O from the OPT_FLAGS... the checkin doesn't mention that...

(In reply to comment #35)
> Dave, I'm not sure which accidental commit you're referring to.  I
> reordered/cleaned-up CFLAGS in a few places but I don't think I made any net
> changes to the flags.
> 
> Go ahead and fix whatever you think needs to be done.
> 

Comment 37 Brian Paul 2006-05-07 08:40:15 UTC
OK, I've restored the -O flag.  I totally didn't see that.  Thanks, Dave.
Comment 38 Angka H. K. 2006-05-08 14:22:14 UTC
(In reply to comment #37)
> OK, I've restored the -O flag.  I totally didn't see that.  Thanks, Dave.
> 

Does this patch already commited to the cvs ? Or the OPT_FLAGS really effect the
result other then speed ?
Cause my source was updated on May 5 and I still got the bug. My FreeBSD box
with xorg 7, Mesa , drm from cvs, crash when exiting from Xorg session. The chip
is i915 and freebsd 7.
Comment 39 Dave Airlie 2006-05-08 18:31:22 UTC
> 
> Does this patch already commited to the cvs ? Or the OPT_FLAGS really effect the
> result other then speed ?
> Cause my source was updated on May 5 and I still got the bug. My FreeBSD box
> with xorg 7, Mesa , drm from cvs, crash when exiting from Xorg session. The chip
> is i915 and freebsd 7.

please don't add a bug to this bug, this bug never mentions Xorg crashing on
exit, so I don't know how you think anything on this bug can cause/fix it....

I'll close this when Keith checks in the stencil fixes.

Comment 40 Dave Airlie 2006-05-08 19:20:31 UTC
Okay keith has checked in his fixes... there might be a minor regression, but
I'm not sure it is worth tracking down


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.