Bug 14696 - [i965 Openarena] openarena performance drops by 22%
[i965 Openarena] openarena performance drops by 22%
Status: VERIFIED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/other
XOrg git
Other Linux (All)
: high normal
Assigned To: Thomas Hellström
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-26 21:08 UTC by Shuang He
Modified: 2008-03-11 05:45 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg conf (3.59 KB, text/plain)
2008-02-26 21:17 UTC, Shuang He
no flags Details
xorg log (18.57 KB, text/plain)
2008-02-26 21:18 UTC, Shuang He
no flags Details
Replace multiple large kmallocs with a single vmalloc. (3.73 KB, patch)
2008-02-28 00:29 UTC, Thomas Hellström
no flags Details | Splinter Review
Don't invalidate the presumed_offset flag if buffer has moved. (801 bytes, patch)
2008-02-28 00:32 UTC, Thomas Hellström
no flags Details | Splinter Review
Fix-check patch (10.24 KB, patch)
2008-02-28 03:25 UTC, Thomas Hellström
no flags Details | Splinter Review
Helps pipelining copy-backs and validation / relocation waits. (4.74 KB, patch)
2008-03-01 08:10 UTC, Thomas Hellström
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Shuang He 2008-02-26 21:08:00 UTC
System Environment:
--------------------------

--Platform: Q965

--Architecture(32-bit,64-bit,compatiblity): 32-bit 

--2D driver:
commit 6935c732c351585f31e2094c4201a00e3d5529b3

--3D driver:
commit 9bd2cb7f90ac434ec5be7d388f899976bf902dc1

--Xserver:
commit aec0d06469a2fa7440fdd5ee03dc256a68704e77

--Drm:
commit e87cec19687089f9f268ec0eb81b57e6fb8de6a9

--Kernel:
2.6.23


Bug detailed description:
-------------------------
The last time I run Openarena benchmark, it get performance at about 104 FPS
but now it is at about 81 FPS, drops by about 22%
and seems it's introduced by following 2 DRM commits:

commit e87cec19687089f9f268ec0eb81b57e6fb8de6a9
Author: Thomas Hellstrom <thomas-at-tungstengraphics-dot-com>
Date:   Tue Feb 26 10:47:05 2008 +0100

    [i915] Relocation fixes.

commit 56bb29cf37c27b283efcd1a32d3583393e5208ea
Author: Thomas Hellstrom <thomas-at-tungstengraphics-dot-com>
Date:   Tue Feb 26 00:01:09 2008 +0100

    Make the execbuffer code reasonably safe against errors.

    In particular -EAGAINs, which should be common during Xserver operation.
    Also handle the fence creation failure case.

Reproduce steps:
----------------
1. start X
2. run openarena benchmark according to http://dri.freedesktop.org/wiki/Benchmarking



Current result:
----------------
the performance now is at about 81 FPS


Expected result:
----------------
the performance now should be at about 104 FPS
Comment 1 Shuang He 2008-02-26 21:17:19 UTC
Created attachment 14609 [details]
xorg conf
Comment 2 Shuang He 2008-02-26 21:18:19 UTC
Created attachment 14610 [details]
xorg log
Comment 3 Thomas Hellström 2008-02-27 07:17:24 UTC
Can you check which of the commits listed is causing this?

/Thomas
Comment 4 Thomas Hellström 2008-02-27 10:52:41 UTC
Can you check with latest git HEAD?
Comment 5 Eric Anholt 2008-02-27 12:43:16 UTC
Offhand guess, moving the wait for idle from the actually-applying-a-relocation path to before the check-and-apply-relocations results in much more idling.  The next master commit of don't-ever-idle-for-relocations is broken.
Comment 6 Thomas Hellström 2008-02-27 12:56:57 UTC
(In reply to comment #5)
> Offhand guess, moving the wait for idle from the actually-applying-a-relocation
> path to before the check-and-apply-relocations results in much more idling. 
> The next master commit of don't-ever-idle-for-relocations is broken.
> 

Yes, that was my guess too.
Should hopefully be fixed now.

/Thomas
Comment 7 Shuang He 2008-02-27 17:13:57 UTC
(In reply to comment #3)
> Can you check which of the commits listed is causing this?
> 
> /Thomas
> 

both commits are causing this.
one is from 104FPS to 95FPS
one is from 95FPS to 81FPS
Comment 8 Shuang He 2008-02-27 17:16:48 UTC
(In reply to comment #4)
> Can you check with latest git HEAD?
> 

I tried latest git HEAD, it's around 95FPS now.
Comment 9 Thomas Hellström 2008-02-28 00:29:54 UTC
Created attachment 14634 [details] [review]
Replace multiple large kmallocs with a single vmalloc.

Patch 1
Comment 10 Thomas Hellström 2008-02-28 00:32:38 UTC
Created attachment 14635 [details] [review]
Don't invalidate the presumed_offset flag if buffer has moved.

The remaining fixes are actually valid bug fixes, and I'm not sure why they should cause a performance decrease.

Can you try the above two pathes and see whether any of them fixes the problem. That would help narrow it down.
Comment 11 Shuang He 2008-02-28 00:58:23 UTC
(In reply to comment #9)
> Created an attachment (id=14634) [details]
> Replace multiple large kmallocs with a single vmalloc.
> 
> Patch 1
> 

95FPS->93FPS
this one doesn't help. 
Comment 12 Shuang He 2008-02-28 01:01:30 UTC
(In reply to comment #10)
> Created an attachment (id=14635) [details]
> Don't invalidate the presumed_offset flag if buffer has moved.
> 
> The remaining fixes are actually valid bug fixes, and I'm not sure why they
> should cause a performance decrease.
> 
> Can you try the above two pathes and see whether any of them fixes the problem.
> That would help narrow it down.
> 

95FPS->around 92FPS
this one doesn't help as well
Comment 13 Thomas Hellström 2008-02-28 03:25:37 UTC
Created attachment 14639 [details] [review]
Fix-check patch

OK, so I've attached a patch that should revert back and make it possible to add the fixes one by one.

Could you
1) Apply the patch, and verify that performance is as expected.
2) Define FIX1 at the top of i915_dma.c, recheck.
3) Define FIX2 at the top of i915_dma.c, recheck.
4) Define FIX3 at the top of i915_dma.c, recheck.

Thanks,
Thomas
Comment 14 Shuang He 2008-02-28 05:08:08 UTC
(In reply to comment #13)
> Created an attachment (id=14639) [details]
> Fix-check patch
> 
> OK, so I've attached a patch that should revert back and make it possible to
> add the fixes one by one.
> 
> Could you
> 1) Apply the patch, and verify that performance is as expected.
> 2) Define FIX1 at the top of i915_dma.c, recheck.
> 3) Define FIX2 at the top of i915_dma.c, recheck.
> 4) Define FIX3 at the top of i915_dma.c, recheck.
> 
> Thanks,
> Thomas
> 

seems our Q965 can not be connected at the moment.
I will try these out when I get back to the machine tomorrow.
Comment 15 Shuang He 2008-02-28 17:20:47 UTC
(In reply to comment #13)
> Created an attachment (id=14639) [details]
> Fix-check patch
> 
> OK, so I've attached a patch that should revert back and make it possible to
> add the fixes one by one.
> 
> Could you
> 1) Apply the patch, and verify that performance is as expected.
this one gets about 100.6 fps
> 2) Define FIX1 at the top of i915_dma.c, recheck.
this one gets about 99.7 fps
> 3) Define FIX2 at the top of i915_dma.c, recheck.
this one gets about 92.6 fps
> 4) Define FIX3 at the top of i915_dma.c, recheck.
this one gets about 92.9 fps
> 
> Thanks,
> Thomas
> 

Comment 16 Thomas Hellström 2008-02-29 10:57:05 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > Created an attachment (id=14639) [details] [details]
> > Fix-check patch
> > 
> > OK, so I've attached a patch that should revert back and make it possible to
> > add the fixes one by one.
> > 
> > Could you
> > 1) Apply the patch, and verify that performance is as expected.
> this one gets about 100.6 fps
> > 2) Define FIX1 at the top of i915_dma.c, recheck.
> this one gets about 99.7 fps
> > 3) Define FIX2 at the top of i915_dma.c, recheck.
> this one gets about 92.6 fps

OK, so the problem is FIX2. 
What this patch is doing is to save all buffer object validation replies until we're completely sure we don't hit an EAGAIN. If we hit an EAGAIN, the IOCTL should be rerun with the original arguments and they are overwritten by the replies.

EAGAINS are not common during normal operation, but should be fairly common with X server command submissions, due to silken mouse operation.

I can't really see how this could affect performance on i965 unless the app is already heavily CPU bound and uses a lot of small textures.

It's possible that this can be restructured to restore the original arguments only if we hit an EAGAIN. I don't have access to an i965 until in a week or so to test.

/Thomas
 
Comment 17 Thomas Hellström 2008-03-01 08:10:57 UTC
Created attachment 14734 [details] [review]
Helps pipelining copy-backs and validation / relocation waits.

Ok so this patch will restore the original copy-order and have 
EAGAIN as an exception where the original arguments will be restored.

Can you try out and see whether you get back the original performance?

/Thomas
Comment 18 Shuang He 2008-03-01 19:18:25 UTC
(In reply to comment #17)
> Created an attachment (id=14734) [details]
> Helps pipelining copy-backs and validation / relocation waits.
> 
> Ok so this patch will restore the original copy-order and have 
> EAGAIN as an exception where the original arguments will be restored.
> 
> Can you try out and see whether you get back the original performance?
> 
> /Thomas
> 

I tried this patch with drm git tip, it does not help.
with/without the patch I both get 73.6FPS
Comment 19 Thomas Hellström 2008-03-01 23:19:50 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > Created an attachment (id=14734) [details] [details]
> > Helps pipelining copy-backs and validation / relocation waits.
> > 
> > Ok so this patch will restore the original copy-order and have 
> > EAGAIN as an exception where the original arguments will be restored.
> > 
> > Can you try out and see whether you get back the original performance?
> > 
> > /Thomas
> > 
> 
> I tried this patch with drm git tip, it does not help.
> with/without the patch I both get 73.6FPS
> 

But you reported git tip at 95FPS previously?
None of the subsequent commits should affect i965.

/Thomas
Comment 20 Shuang He 2008-03-02 04:51:35 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #17)
> > > Created an attachment (id=14734) [details] [details] [details]
> > > Helps pipelining copy-backs and validation / relocation waits.
> > > 
> > > Ok so this patch will restore the original copy-order and have 
> > > EAGAIN as an exception where the original arguments will be restored.
> > > 
> > > Can you try out and see whether you get back the original performance?
> > > 
> > > /Thomas
> > > 
> > 
> > I tried this patch with drm git tip, it does not help.
> > with/without the patch I both get 73.6FPS
> > 
> 
> But you reported git tip at 95FPS previously?
> None of the subsequent commits should affect i965.
> 
> /Thomas
> 

oh, I can't get 95FPS even I roll back DRM to that commit. seems some other component is impacting this as well(we update all components every day). I'll try to revert everything back first, and try this patch again.
Comment 21 Shuang He 2008-03-02 06:03:41 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > (In reply to comment #18)
> > > (In reply to comment #17)
> > > > Created an attachment (id=14734) [details] [details] [details] [details]
> > > > Helps pipelining copy-backs and validation / relocation waits.
> > > > 
> > > > Ok so this patch will restore the original copy-order and have 
> > > > EAGAIN as an exception where the original arguments will be restored.
> > > > 
> > > > Can you try out and see whether you get back the original performance?
> > > > 
> > > > /Thomas
> > > > 
> > > 
> > > I tried this patch with drm git tip, it does not help.
> > > with/without the patch I both get 73.6FPS
> > > 
> > 
> > But you reported git tip at 95FPS previously?
> > None of the subsequent commits should affect i965.
> > 
> > /Thomas
> > 
> 
> oh, I can't get 95FPS even I roll back DRM to that commit. seems some other
> component is impacting this as well(we update all components every day). I'll
> try to revert everything back first, and try this patch again.
> 

I just found that this is because we have set vblank_mode to 2, but forget to set it back to 0.

and I have tried this patch again:
without the patch, we get 94.8FPS
with the patch, we get 95.3 FPS
Comment 22 Thomas Hellström 2008-03-11 02:40:17 UTC
OK, So I got my hands on a dual-processor i965, and tried the benchmark out.

With TTM I get 50.0 fps before the commit, and 49.6 fps after.
Without TTM I see 114 fps after the commit.

Were you testing with or without TTM?

Regards,
Thomas

 
Comment 23 Shuang He 2008-03-11 02:51:19 UTC
(In reply to comment #22)
> OK, So I got my hands on a dual-processor i965, and tried the benchmark out.
> 
> With TTM I get 50.0 fps before the commit, and 49.6 fps after.
> Without TTM I see 114 fps after the commit.
> 
> Were you testing with or without TTM?
> 
> Regards,
> Thomas
> 
> 
> 

I'm testing with TTM, with the same openarena, we get about 81 FPS with mesa_7_0_2
Comment 24 Shuang He 2008-03-11 02:58:08 UTC
(In reply to comment #22)
> OK, So I got my hands on a dual-processor i965, and tried the benchmark out.
> 
> With TTM I get 50.0 fps before the commit, and 49.6 fps after.
> Without TTM I see 114 fps after the commit.
> 
> Were you testing with or without TTM?
> 
> Regards,
> Thomas
> 
> 
> 

And I just tried export INTEL_NO_TTM=1, and get 81.9 FPS
Comment 25 Thomas Hellström 2008-03-11 03:03:32 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > OK, So I got my hands on a dual-processor i965, and tried the benchmark out.
> > 
> > With TTM I get 50.0 fps before the commit, and 49.6 fps after.
> > Without TTM I see 114 fps after the commit.
> > 
> > Were you testing with or without TTM?
> > 
> > Regards,
> > Thomas
> > 
> > 
> > 
> 
> I'm testing with TTM, with the same openarena, we get about 81 FPS with
> mesa_7_0_2
> 

But there is no TTM-enabled i965 with mesa_7_0_2???

/Thomas

Comment 26 Thomas Hellström 2008-03-11 04:34:23 UTC
Additionally with UP, I get (TTM) 66.0 fps before the commit and 65.9 after.
I'd say that's within the margin of error. 

I'm closing this bug, as I can't reproduce it, and the app is clearly heavily throttled on something else. There are probably some minor optimizations that can be done, but let's return to those once we get similar performance w & wo TTM.

/Thomas
Comment 27 Shuang He 2008-03-11 05:36:09 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > OK, So I got my hands on a dual-processor i965, and tried the benchmark out.
> > 
> > With TTM I get 50.0 fps before the commit, and 49.6 fps after.
> > Without TTM I see 114 fps after the commit.
> > 
> > Were you testing with or without TTM?
> > 
> > Regards,
> > Thomas
> > 
> > 
> > 
> 
> I'm testing with TTM, with the same openarena, we get about 81 FPS with
> mesa_7_0_2
> 

Oh, I think I mislead you, I'm always using mesa git tip for testing. 
here I just give a in-hand non-TTM(with mesa_7_0_2) result for your reference.
Comment 28 Shuang He 2008-03-11 05:45:45 UTC
(In reply to comment #26)
> Additionally with UP, I get (TTM) 66.0 fps before the commit and 65.9 after.
> I'd say that's within the margin of error. 
> 
> I'm closing this bug, as I can't reproduce it, and the app is clearly heavily
> throttled on something else. There are probably some minor optimizations that
> can be done, but let's return to those once we get similar performance w & wo
> TTM.
> 
> /Thomas
> 

OK, fair enough, one of your commit have brought the performance back to about 97FPS, which is almost the same as before(104FPS)