Bug 71523 - [IVB/HSW/BYT-M bisected] lightsmark v2008 performance reduced ~30%
Summary: [IVB/HSW/BYT-M bisected] lightsmark v2008 performance reduced ~30%
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: All Linux (All)
: high major
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-12 06:47 UTC by zhoujian
Modified: 2013-11-21 01:51 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description zhoujian 2013-11-12 06:47:27 UTC
System Environment:       
----------------------------------------------
Platform: HSW/BYT/IVB
Libdrm:(master)2.4.47-11-gda738d1ed0a0941a0c
Mesa:(master) git-a594cec
Xf86_video_intel:(master)2.99.905-102-g922a8bab89c1a59
Cairo:(master)56a195a76554abe1d5567c733ba679058fe01303
Kernel:(drm-intel-nightly) 164a4cb4c1431a0

Bug detailed description:
----------------------------------------------
lightsmark v2008 performance reduced ~30%. The problem exists on both gnome-session and Raw X.
It’s Mesa regression, bisecting show that the first bad commit is:

59b01ca252bd6706f08cd80a864819d71dfe741c
Author: Fredrik H枚glund <fredrik@kde.org>
Date:   Tue Apr 9 20:54:25 2013 +0200
    mesa: Add ARB_vertex_attrib_binding
    update_array() and update_array_format() are changed to update the new
attrib and binding states, and the client arrays become derived state.

    Reviewed-by: Eric Anholt eric@anholt.net

Performance
--------------------------------------------------------------------
Test lightsmark on ULT with Raw X
git-a594cec: 49.40
git-c6a3fb6: 75.68

Reproduce steps:
---------------------------------------------
1, xinit&
2. vblank_mode=0 ./backend silent 1920x1080
Comment 1 Fredrik Höglund 2013-11-12 21:03:24 UTC
The most likely explanation is the overhead from computing the gl_client_arrays in _mesa_update_state(). Lightsmark makes a lot of calls to glDrawElements() each frame, and changes the vertex arrays between those calls.

That being said, I can't reproduce the regression with r600g. I get approximately 115 FPS with and without the ARB_vertex_attrib_binding changes.

Could you investigate further?
Comment 2 Eero Tamminen 2013-11-14 10:37:37 UTC
Looking at other test data from relevant time period:

* Why Lighstmark test is marked on HSW D as "blocked"?

* On HSW ULT & Mobile, performance drop is ~30% between 11-10 and 11-11, on IVB D & M it's 37%, on BYT, the performance drop between 11-07 and 11-11 is ~45%.

* On HSW D, x11perf "aa10text" regressed during this interval 13% and "rgb10text" regressed by 68%.  x11perf stuff is CPU bound, but it could be unrelated as on BYT "aa10text" got 45% increase in its performance.

* There's no performance drop in any other tests on any other tested HW.


> Lightsmark makes a lot of calls to glDrawElements() each frame,

Nope.  It's an old program using glBegin/glEnd...

Stuff that Lighstmarks does in every frame:
   44922 glBindTexture (avg = 27.4 calls in each of 1638 frames)
   42369 glEnable (avg = 25.9 calls in each of 1638 frames)
   42336 glActiveTexture (avg = 25.8 calls in each of 1638 frames)
   30908 glVertex2f (avg = 18.9 calls in each of 1638 frames)
   30908 glMultiTexCoord2f (avg = 18.9 calls in each of 1638 frames)
   22780 glGetUniformLocation (avg = 13.9 calls in each of 1638 frames)
   22000 glMatrixMode (avg = 13.4 calls in each of 1637 frames)
   20253 glDisable (avg = 12.4 calls in each of 1638 frames)
   13310 glGetIntegerv (avg = 8.1 calls in each of 1638 frames)
    9737 glUniform1i (avg = 5.9 calls in each of 1638 frames)
    9582 glBlendFunc (avg = 5.9 calls in each of 1637 frames)
    7727 glEnd (avg = 4.7 calls in each of 1638 frames)
    7727 glBegin (avg = 4.7 calls in each of 1638 frames)
    7586 glUseProgram (avg = 4.6 calls in each of 1638 frames)
    7255 glUniform4f (avg = 4.4 calls in each of 1638 frames)
    6714 glPushMatrix (avg = 4.1 calls in each of 1637 frames)
    6714 glPopMatrix (avg = 4.1 calls in each of 1637 frames)
    6639 glLoadMatrixd (avg = 4.1 calls in each of 1637 frames)
    6534 glDepthMask (avg = 4.0 calls in each of 1638 frames)
    4917 glXMakeContextCurrent (avg = 3.0 calls in each of 1638 frames)
    4837 glClear (avg = 3.0 calls in each of 1637 frames)
    4762 glAlphaFunc (avg = 2.9 calls in each of 1637 frames)
    3954 glTexImage2D (avg = 33.5 calls in each of 118 frames)
    3754 glMultMatrixd (avg = 2.3 calls in each of 1637 frames)
    3274 glLoadIdentity (avg = 2.0 calls in each of 1637 frames)
    3267 glGetBooleanv (avg = 2.0 calls in each of 1638 frames)
    2708 glViewport (avg = 1.7 calls in each of 1638 frames)
    2509 glCallList (avg = 1.5 calls in each of 1637 frames)
    2464 glGetFloatv (avg = 1.5 calls in each of 1637 frames)
    1991 glColor3f (avg = 1.2 calls in each of 1637 frames)
    1922 glUniform3f (avg = 1.2 calls in each of 1637 frames)
    1736 glUniform4fv (avg = 1.1 calls in each of 1635 frames)
    1638 glFlush (avg = 1.0 calls in each of 1638 frames)
    1637 glOrtho (avg = 1.0 calls in each of 1637 frames)
    1637 glColor4f (avg = 1.0 calls in each of 1637 frames)

Stuff that Lighstmark does in some frames:
   60558 glClientActiveTexture (avg = 348.0 calls in each of 174 frames)
   60378 glTexCoordPointer (avg = 347.0 calls in each of 174 frames)
   30787 glVertexPointer (avg = 176.9 calls in each of 174 frames)
   30786 glDrawElements (avg = 176.9 calls in each of 174 frames)
   30441 glNormalPointer (avg = 174.9 calls in each of 174 frames)
   29929 glColorPointer (avg = 173.0 calls in each of 173 frames)
   28877 glCullFace (avg = 166.0 calls in each of 174 frames)
    6702 glTexParameteri (avg = 36.8 calls in each of 182 frames)
    3685 glPixelStorei (avg = 31.2 calls in each of 118 frames)
    2347 glBindFramebufferEXT (avg = 3.5 calls in each of 680 frames)
    1583 glFramebufferTexture2DEXT (avg = 2.3 calls in each of 680 frames)
    1583 glEnableClientState (avg = 9.1 calls in each of 174 frames)
    1569 glGenTextures (avg = 65.4 calls in each of 24 frames)
    1360 glColorMask (avg = 2.0 calls in each of 680 frames)
     933 glGetTexLevelParameteriv (avg = 1.4 calls in each of 684 frames)
     883 glDisableClientState (avg = 5.1 calls in each of 174 frames)
     819 glCheckFramebufferStatusEXT (avg = 1.2 calls in each of 680 frames)
     736 glPolygonOffset (avg = 1.1 calls in each of 680 frames)
     680 glClearDepth (avg = 1.0 calls in each of 680 frames)
     218 glUniformMatrix4fv (avg = 2.2 calls in each of 101 frames)
     177 glColor4ub (avg = 1.0 calls in each of 174 frames)
     167 glClearColor (avg = 2.0 calls in each of 84 frames)
     167 glReadBuffer (avg = 2.0 calls in each of 83 frames)
     167 glDrawBuffer (avg = 2.0 calls in each of 83 frames)
      83 glUniform2f (avg = 1.0 calls in each of 83 frames)
      83 glReadPixels (avg = 1.0 calls in each of 83 frames)
Comment 3 zhoujian 2013-11-18 02:19:54 UTC
Verified it with Mesa git-45a56ce39.
Comment 4 Ian Romanick 2013-11-18 20:09:48 UTC
(In reply to comment #3)
> Verified it with Mesa git-45a56ce39.

Can you bisect that?  I don't see anything in the range a594cec..45a56ce39.  The only thing that worries me more than unexplained bugs is unexplained fixes. :(

Also, is the original performance drop reproducible on the 10.0 branch?

Did the kernel change at any point?
Comment 5 zhoujian 2013-11-19 07:38:38 UTC
(In reply to comment #4)
by bisect the fixed commit is
commit ff353c218a1ab1fd3fb797a4780612ec4b0451d8
Author: Fredrik Höglund <fredrik@kde.org>
Date:   Mon Nov 11 18:54:15 2013 +0100

    mesa: Fix derived vertex state not being updated in glCallList()

    AEcontext::NewState is not always set when the vertex array state
    is changed.
Comment 6 zhoujian 2013-11-19 07:39:52 UTC
(In reply to comment #4)
It's reproducible on 10.0 branch.
Comment 7 Fredrik Höglund 2013-11-20 11:45:08 UTC
(In reply to comment #6)
> (In reply to comment #4)
> It's reproducible on 10.0 branch.

But ff353c218a1ab1fd3fb797a4780612ec4b0451d8 was cherry-picked
to the 10.0 branch on November 15 according to git.
Comment 8 zhoujian 2013-11-21 01:51:14 UTC
(In reply to comment #7)
Hi, it's reproducible on 10.0 branch because git-59b01ca also exists on 10.0.
It's also fixed with patch ff353c218 on 10.0.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.