90671 – [HSW bisected] SynMark2_v6_0_0_OglBatch1 to OglBatch5 performance reduce by 14% with gnome-session

Bug 90671 - [HSW bisected] SynMark2_v6_0_0_OglBatch1 to OglBatch5 performance reduce by 14% with gnome-session

Summary: [HSW bisected] SynMark2_v6_0_0_OglBatch1 to OglBatch5 performance reduce by 1...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	All Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-05-27 06:23 UTC by ye.tian
Modified:	2017-10-06 14:29 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg info (51.91 KB, text/plain) 2015-05-27 06:27 UTC, ye.tian	no flags	Details
Xorg.0.log info (18.83 KB, text/plain) 2015-05-27 06:28 UTC, ye.tian	no flags	Details
Refine dri2 copyregion ring selection (1.56 KB, patch) 2015-05-27 06:42 UTC, Chris Wilson	no flags	Details \| Splinter Review
View All

Description ye.tian 2015-05-27 06:23:41 UTC

System Environment:       
Platform:  HSW
Kernel:  (drm-intel-nightly)7e83d3b288f06c29d5ab4f870b84db5fccfa3a08
Libdrm: (master)libdrm-2.4.61-2-ga1acffd4e0968ffa65b673163574188a00c9ab7e
Mesa: (master)3ec18152858fd9aadb398d78d5ad2d2b938507c1
Xserver:	(master)xorg-server-1.17.0-151-gad02d0df75318660c3f7cd6063eac409327fe560
Xf86_video_intel:	(master)2.99.917-312-ga6dd2655cb41000943e554ccea16e5781bcbf012
Cairo: (master)2cf2d8e340a325adb205baf8e4bd64e1d1858008
Libva:  (master)4763db1c2133d4e6b92355938ecb6f23a7767b6b
Libva_intel_driver: (master)4a1c4d21f3428b08ef765d7f7de75b97006514ac

Bug detailed description:
--------------------------------------------------
SynMark2_v6_0_0_OglBatch1 performance reduce by 12% with gnome-session on HSW. It’s kernel regression. By bisected, show that b471618 is the first bad commit.
This bug will affect the following cases:
OglBatch1-- OglBatch5 reduced by 14%
OglGeomTriStrip reduced by 12%
OglPSBump2 reduced by 9%
OglVSDiffuse1 reduced by 9%
OglVSDiffuse8 reduced by 9%
OglVSTangent reduced by 9%

commit b47161858ba13c9c7e03333132230d66e008dd55
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Mon Apr 27 13:41:17 2015 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Thu May 21 15:11:42 2015 +0200

    drm/i915: Implement inter-engine read-read optimisations

    Currently, we only track the last request globally across all engines.
    This prevents us from issuing concurrent read requests on e.g. the RCS
    and BCS engines (or more likely the render and media engines). Without
    semaphores, we incur costly stalls as we synchronise between rings -
    greatly impacting the current performance of Broadwell versus Haswell in
    certain workloads (like video decode). With the introduction of
    reference counted requests, it is much easier to track the last request
    per ring, as well as the last global write request so that we can
    optimise inter-engine read read requests (as well as better optimise
    certain CPU waits).

Please see Xrog and dmesg info.

Reproduce steps:
----------------------------
1, xinit& 
2, gnome-session&
3, ./synmark2 OglBatch1

Comment 1 ye.tian 2015-05-27 06:27:05 UTC

Created attachment 116066 [details]
dmesg info

Comment 2 ye.tian 2015-05-27 06:28:04 UTC

Created attachment 116067 [details]
Xorg.0.log info

Comment 3 Chris Wilson 2015-05-27 06:28:45 UTC

Sigh. Any with raw? And with other compositing managers? And with working swap elision?

Anyway, it is more likely that reducing contention here leads to other clients being able to schedule more work, reducing output latency but also reducing throughput of the hog.

Comment 4 Chris Wilson 2015-05-27 06:42:16 UTC

Created attachment 116068 [details] [review]
Refine dri2 copyregion ring selection

The other side-effect is that given multiple active rings, we have a chance to flip onto the blt ring more often, so this should help:

Comment 5 ye.tian 2015-05-27 07:30:08 UTC

(In reply to Chris Wilson from comment #4)
> Created attachment 116068 [details] [review] [review]
> Refine dri2 copyregion ring selection
> 
> The other side-effect is that given multiple active rings, we have a chance
> to flip onto the blt ring more often, so this should help:

Test it with this patch, this problem does not exists.

Comment 6 Chris Wilson 2015-05-27 07:40:23 UTC

commit fb1643f0f904eb258da71cd0b8deb8d3ec6dafed
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 27 07:37:35 2015 +0100

    sna/dri2: Refine ring selection with multiple active rings
    
    The preference given multiple rings is to the previous writer, or if
    none, to the render ring if active.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=90671
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 7 ye.tian 2015-05-27 07:43:46 UTC

Verified it.

Comment 8 Elizabeth 2017-10-06 14:29:48 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.