Bug 102334 - [BXT]15% performance drop in SynMark v7.0 Multithread use-case
Summary: [BXT]15% performance drop in SynMark v7.0 Multithread use-case
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-21 12:46 UTC by Eero Tamminen
Modified: 2018-02-13 16:38 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features:


Attachments
Fix wake_affine regression (3.20 KB, patch)
2017-08-24 21:20 UTC, Chris Wilson
no flags Details | Splinter Review

Description Eero Tamminen 2017-08-21 12:46:13 UTC
drm-tip kernel has caused ~15% drop in SynMark v7.0 Multithread 3D use-case between these commits:

kernel git://anongit.freedesktop.org/drm-tip at 560582ef03d1f6bf9010232fcc6613188a00d198 2017-07-17_16-27-42 drm-tip: 2017y-07m-17d-16h-27m-16s UTC integration manifest

kernel git://anongit.freedesktop.org/drm-tip at 10de1e17faaab452782e5a1baffd1b30a639a261 2017-07-18_10-09-14 drm-tip: 2017y-07m-18d-10h-08m-42s UTC integration manifest

(This test has quite a large variance, but 15% drop in this period is very clear from the longer term trend.)

This drop is BXT specific and visible on 2 different BXT (18 EU) J4205 devices.

Multithread test has 4 work threads doing low and hi resolution reflections / environment map rendering, with main thread rendering the final HDR image with tone mapping and bloom.  This requires enough draw calls & syching that this test is (mostly) CPU bound.

Note: The less CPU bound version (HdrBloom) of this test, which has only 1 work thread instead of 4, and which uses higher resolution & details for that, doesn't show any (clear) performance regression.
Comment 1 Chris Wilson 2017-08-22 18:25:05 UTC
I don't have those commit ids available, but that date does coincide with the rc1 merge. Is that true? Just checking the merge base would be useful to narrow it down between i915 vs the world.
Comment 2 Chris Wilson 2017-08-24 13:48:50 UTC
Ok, between v4.12 and drm-tip I see a drop from 6 to 4.5 on my bxt. Should poke Martin about ezbench bisection...
Comment 3 Chris Wilson 2017-08-24 16:33:15 UTC
First bisection gives:

commit 3fed382b46baac83703130fe4cd3d9147f427fb9
Author: Rik van Riel <riel@redhat.com>
Date:   Fri Jun 23 12:55:29 2017 -0400

    sched/numa: Implement NUMA node level wake_affine()
    
    Since select_idle_sibling() can place a task anywhere on a socket,
    comparing loads between individual CPU cores makes no real sense
    for deciding whether to do an affine wakeup across sockets, either.
    
    Instead, compare the load between the sockets in a similar way the
    load balancer and the numa balancing code do.
    
    Signed-off-by: Rik van Riel <riel@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: jhladky@redhat.com
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170623165530.22514-4-riel@redhat.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
Comment 4 Chris Wilson 2017-08-24 21:20:41 UTC
Created attachment 133763 [details] [review]
Fix wake_affine regression

Playing around to restore the old code; gives us the spread across cores required to make OglMultithread happier.
Comment 5 Chris Wilson 2017-08-24 21:36:01 UTC
Ok, found a version by PeterZ: 20170801121912.fnykqlq3r5jcbtn2@hirez.programming.kicks-ass.net
Comment 6 Chris Wilson 2017-08-24 22:53:36 UTC
Gave it a cursory test and it appears to have fixed this particular regression, so pushed to topic/core-for-CI for wider testing:

commit 367a3aa35e7bed10a74bed139f6728916f5d1508
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Tue Aug 1 23:43:07 2017 +0200

    sched/fair: Fix wake_affine() for !NUMA_BALANCING

It should only affect !llc platforms, but the logic is slightly different than the original prior to

commit 3fed382b46baac83703130fe4cd3d9147f427fb9
Author: Rik van Riel <riel@redhat.com>
Date:   Fri Jun 23 12:55:29 2017 -0400

    sched/numa: Implement NUMA node level wake_affine()

so testing definitely required.
Comment 7 Chris Wilson 2017-09-06 18:20:38 UTC
Part 1 upstream:

commit 90001d67be2fa2acbe3510d1f64fa6533efa30ef
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Jul 31 17:50:05 2017 +0200

    sched/fair: Fix wake_affine() for !NUMA_BALANCING

Needs another fix which is pending.
Comment 8 Chris Wilson 2017-09-07 08:24:11 UTC
            Commit-ID:  a731ebe6f17bff9e7ca12ef227f9da4d5bdf8425
            Gitweb:     http://git.kernel.org/tip/a731ebe6f17bff9e7ca12ef227f9da4d5bdf8425
            Author:     Peter Zijlstra <peterz@infradead.org>
            AuthorDate: Wed, 6 Sep 2017 12:51:31 +0200
            Committer:  Ingo Molnar <mingo@kernel.org>
            CommitDate: Thu, 7 Sep 2017 09:29:31 +0200
            
            sched/fair: Fix wake_affine_llc() balancing rules
            
            Chris Wilson reported that the SMT balance rules got the +1 on the
            wrong side, resulting in a bias towards the current LLC; which the
            load-balancer would then try and undo.
            
            Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
            Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
            Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
            Cc: Andy Lutomirski <luto@kernel.org>
            Cc: Linus Torvalds <torvalds@linux-foundation.org>
            Cc: Mike Galbraith <efault@gmx.de>
            Cc: Peter Zijlstra <peterz@infradead.org>
            Cc: Thomas Gleixner <tglx@linutronix.de>
            Cc: linux-kernel@vger.kernel.org
            Fixes: 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING")
            Link: http://lkml.kernel.org/r/20170906105131.gqjmaextmn3u6tj2@hirez.programming.kicks-ass.net
            Signed-off-by: Ingo Molnar <mingo@kernel.org>
Comment 9 Eero Tamminen 2017-09-08 07:25:34 UTC
There have been *two*, equally large BXT improvements to SynMark Multithread performance.

First one, which already more than fixed the earlier regression was between:
650ffdaee4 at 2017-08-24 14:30:13 UTC
4823f19b97 at 2017-08-25 15:55:28 UTC

and the second was between:
587d276952 at 2017-09-06 17:54:04 UTC
2300dd9c44 at 2017-09-07 17:00:01 UTC

I assume latter to be the above referred commit.  That actually improved Multithread performance slightly also on several other platforms.

While I can now verify this, I'm wondering a bit about the first commit...

Thanks!
Comment 10 Elizabeth 2018-02-13 16:38:40 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.