Bug 79715 - [DRI3 IVB/HSW/BYT-M/BDW Bisected] DRI3 environment Xonotic 0.7 performance show 60% worse than DRI2's
Summary: [DRI3 IVB/HSW/BYT-M/BDW Bisected] DRI3 environment Xonotic 0.7 performance sh...
Status: VERIFIED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: All Linux (All)
: high major
Assignee: Keith Packard
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-06 08:56 UTC by zhoujian
Modified: 2015-10-21 08:40 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description zhoujian 2014-06-06 08:56:08 UTC
Platform: IVB/HSW/BYT-M/BDW
Libdrm: (master)libdrm-2.4.54-9-g8fc62ca8ac010659023bb63c4759eb683de4f9af
Mesa: (master)cf29913aa156accbe60cb35f9a0bd2c21726cfa3
Xserver: (master)xorg-server-1.15.99.903
Xf86_video_intel: (master)2.99.911-256-g08148896196443a8582c30b47ff546acca78d69c
Cairo: (master)ead5c7909f3db1d0d81121fc2775c458871891b2
Libva: (staging)35e70cb9b9c77dfb99fb370e319ed501f0c31b17
Libva_intel_driver: (staging)fbbe401aa28a0b3859d587ef08f0df15a2f7c8f2
Kernel:(drm-intel-nightly)git-0a37b5

Bug detailed description:
----------------------------------------------
Warsow1.0,Xonotic 0.7,GpuTest_v0.5_triangle_fullscreen,Counter-Strike Source,Half-Life 2 performance reduced by 20%~60% on IVB/HSW/BYT-M/BDW. The problem exists on gnome-session and Raw X. It’s  Xserver regression.
By bisected show the first bad commit is :
e6f5d9d7b7efdacea0f22f1808efca849bcede4c
Author:     Keith Packard <keithp@keithp.com>
AuthorDate: Mon Jan 27 11:23:58 2014 -0800
Commit:     Keith Packard <keithp@keithp.com>
CommitDate: Wed Jun 4 22:03:35 2014 -0700
    present: Queue flips for later execution


Reproduce steps:
---------------------------------------------
1. xinit&
vblank_mode=0 ./warsow.x86_64 +set vid_fullscreen 1 +logconsole pts-log exec rofiles/high+.cfg +timedemo 1 +cg_showFPS 1 +cl_maxfps 999 +cl_checkForUpdate 0 +demo pts-demo10 +next quit +r_mode -1 +vid_customwidth 1920 +vid_customheight 1080 +vid_restart
Comment 1 meng 2014-06-06 09:02:22 UTC
The first bad commit git-e6f5d9d7 also causes  X crash when run synmark2 v5.3.0(bug 79709).
Comment 2 zhoujian 2014-06-27 07:26:38 UTC
Verified it.
Comment 3 zhoujian 2014-06-30 02:33:48 UTC
Sorry,found the issue still exist.
Comment 4 Gordon Jin 2014-07-31 01:45:40 UTC
Keith, could you please look into this xserver regression?
Comment 5 zhoujian 2014-07-31 04:54:53 UTC
Have retest this bug,found the issue still existence on IVB/BYT-M/BDW/HSW.
Comment 6 Eero Tamminen 2014-08-22 11:19:58 UTC
Does the issue still happen if you apply Keith's quad buffer patch to Mesa:
  http://lists.freedesktop.org/archives/dri-devel/2014-July/062842.html
?

(It applies still fine to last Mesa.)
Comment 7 zhoujian 2014-08-26 07:16:06 UTC
I have tried this patch: http://lists.freedesktop.org/archives/dri-devel/2014-July/062842.html. It can fixed this issue, and some cases performance value ( eg OglBatch0) improve ~30% after applied this patch vs. latest nightly result.
Comment 8 meng 2014-09-16 06:10:33 UTC
The issues still exists on 2014Q3 as below:
Mesa:      (10.3)9f67c26d1b424b8f3d86b5435c8f74d0a81eb86d
Kernel:  3.16.2
Xf86_video_intel:   (master)2.99.914
Libdrm:         (master)libdrm-2.4.56
Libva:           (master) 1.4.0.pre1
vaapi-intel-driver:    (master) 1.4.0.pre1
Cairo:        1.12.16
Xserver:    (server-1.16-branch)xorg-server-1.16.0
Comment 9 wendy.wang 2014-11-06 08:44:11 UTC
Test DRI3 on BDW GT2 configuration using latest GFX SW Stack(2014-11-05), which included Keith's quad buffer patch(which was merged into Mesa on 9-30 as commit f7a355556ef5fe23056299a77414f9ad8b5e5a1d)

From the test result, find Warsow1.0 and GpuTest_v0.5_triangle_fullscreen performance recovers to the same value as DRI2 setting.
But below cases performance data still not good compare with DRI2.

DRI3 vs. DRI2
etqwET: Quake Wars Demo :  -13%
etqw-demo: 				 -14%
Xonotic v0.7				 -61%
Half Life 2: 				 -26%
SynMarkDrvCtx: 			 -30%
SynMarkShMapPcf:			 -24%
Comment 10 Eero Tamminen 2014-11-06 11:33:49 UTC
(In reply to wendy.wang from comment #9)
> From the test result, find Warsow1.0 and GpuTest_v0.5_triangle_fullscreen
> performance recovers to the same value as DRI2 setting.
> But below cases performance data still not good compare with DRI2.
> 
> DRI3 vs. DRI2
> etqwET: Quake Wars Demo :  -13%
> etqw-demo: 				 -14%
> Xonotic v0.7				 -61%
> Half Life 2: 				 -26%
> SynMarkDrvCtx: 			 -30%
> SynMarkShMapPcf:			 -24%

I think context (re-)creation is least significant of these problems, although on my HSW GT3e setup that drops to <1/5th with DRI3.  We can ignore that for now.

However, shadow mapping test suffering from DRI3 seems strange, especially as the other shadow mapping test is fine.  In my case, on HSW GT3e, ShMapPcf FPS with DRI3 is *exactly* same as screen update frequency (on multiple runs), so it seems that DRI3 for some reason causes it to be synched to fullscreen page flip, despite zero swap interval setting and Mesa quad buffer workaround.

Wendy, does ShMapPcf and/or the games in your case get limited to 60 FPS?
Comment 11 wendy.wang 2014-11-07 03:33:18 UTC
(In reply to Eero Tamminen from comment #10)
> (In reply to wendy.wang from comment #9)
> > From the test result, find Warsow1.0 and GpuTest_v0.5_triangle_fullscreen
> > performance recovers to the same value as DRI2 setting.
> > But below cases performance data still not good compare with DRI2.
> > 
> > DRI3 vs. DRI2
> > etqwET: Quake Wars Demo :  -13%
> > etqw-demo: 				 -14%
> > Xonotic v0.7				 -61%
> > Half Life 2: 				 -26%
> > SynMarkDrvCtx: 			 -30%
> > SynMarkShMapPcf:			 -24%
> 
> I think context (re-)creation is least significant of these problems,
> although on my HSW GT3e setup that drops to <1/5th with DRI3.  We can ignore
> that for now.
> 
> However, shadow mapping test suffering from DRI3 seems strange, especially
> as the other shadow mapping test is fine.  In my case, on HSW GT3e, ShMapPcf
> FPS with DRI3 is *exactly* same as screen update frequency (on multiple
> runs), so it seems that DRI3 for some reason causes it to be synched to
> fullscreen page flip, despite zero swap interval setting and Mesa quad
> buffer workaround.
> 
> Wendy, does ShMapPcf and/or the games in your case get limited to 60 FPS?

Yes, run 3 cycles, ShMapPcf benchmark show exactly 60 FPS
Comment 12 Eero Tamminen 2014-11-12 15:26:39 UTC
From comparing Xonotic and Half-Life 2 frame timings between DRI2 and DRI3, it's clear that the huge difference in performance is because DRI3 limits the performance to 60 FPS.

While in ShMapPcf and Half-Life 2 case DRI3 Vsync-limits everything that would go >60 FPS, in Xonotic case it for some reason Vsyncs only parts of the run (2 parts, taking ~40% of the whole run).
Comment 13 Eero Tamminen 2014-11-13 10:36:54 UTC
Wendy, are there a additional warning messages on the console with DRI3 when you run these tests?

If yes, could you try whether performance gets fixed by directing the warnings to elsewhere, e.g. with "2> stderr.txt"?
Comment 14 Eero Tamminen 2014-11-13 14:38:50 UTC
With few days newer Mesa, I'm not anymore seeing the 60 FPS issue in ShMapPcf, now GfxBench 3.0 T-Rex performance has almost halved due to that.  I.e. where this issue hits, seems a bit random.

Looking at the frame timings information, in ETQW and DrvCtx cases the issue isn't unwanted 60 FPS syncing.

DrvCtx issue is related to context creation.  Perf tells that the highest CPU consuming functions are for allocs and frees.  Valgrind/Kcachegrind [1] callgraphs shows that with DRI3, test spends additional CPU in xcb_poll_for_event() (~6% of total) and doing fprintfs (~6% of total).  Only with DRI3, I get (constantly) following warnings:
xgeWireToEvent: Unknown extension 148, this should never happen

Keith, where that warning comes from?  Valve/Steam game forums have also complaints about that in relation to Ubuntu 14.10.


[1] Btw. For some reason valgrind doesn't work with latest Mesa unless one compiles it with GCC -fstack-protector (similarly to rest of Ubuntu 14.10 libs).  Without that option, GL programs SIGTRAP on first glXSwapBuffers call under Valgrind.
Comment 15 Keith Packard 2014-11-13 18:56:56 UTC
the xgeWireToEvent warning is bogus, and comes from libXext. That has been patched upstream, but no new upstream release has been made yet.
Comment 16 Eero Tamminen 2014-12-15 16:53:39 UTC
With latest Mesa and everything else from Ubuntu 14.10, about everything is now again synched to 60 FPS under DRI3 on fullscreen.  First tests start at full speed, but then after 1-10s, something jumps on the breaks and frames slow down to 60.

Wendy, are you seeing also the same when all the 3D stack components are latest versions?

(Could be related to Mario Kleiner's Mesa changes from yesterday.)
Comment 17 Eero Tamminen 2014-12-16 10:40:20 UTC
(In reply to Eero Tamminen from comment #16)
> With latest Mesa and everything else from Ubuntu 14.10, about everything is
> now again synched to 60 FPS under DRI3 on fullscreen.  First tests start at
> full speed, but then after 1-10s, something jumps on the breaks and frames
> slow down to 60.

Don't see this with latest of everything.  The flickering issue seems also gone.  Needs full testing to see are there still some tests getting perf regression / 60 FPS.
Comment 18 wendy.wang 2014-12-30 05:54:06 UTC
(In reply to Eero Tamminen from comment #16)
> With latest Mesa and everything else from Ubuntu 14.10, about everything is
> now again synched to 60 FPS under DRI3 on fullscreen.  First tests start at
> full speed, but then after 1-10s, something jumps on the breaks and frames
> slow down to 60.
> 
> Wendy, are you seeing also the same when all the 3D stack components are
> latest versions?
> 
> (Could be related to Mario Kleiner's Mesa changes from yesterday.)

Checked HSW-GT3e and BDW GT2 F0 machine,

Used SW stack: 2014-11-07 X11R7  + 2014-12-18 Mesa + 2014-12-18 Kernel 
  + DRI3 enable

Did not see hl2 and Synmark 2 tests OglShMapPcf case limited to 60 FPS problem.
Comment 19 Eero Tamminen 2014-12-30 09:17:40 UTC
(In reply to wendy.wang from comment #18)
> Checked HSW-GT3e and BDW GT2 F0 machine,
> 
> Used SW stack: 2014-11-07 X11R7  + 2014-12-18 Mesa + 2014-12-18 Kernel 
>   + DRI3 enable
> 
> Did not see hl2 and Synmark 2 tests OglShMapPcf case limited to 60 FPS
> problem.

What about the other tests (etqw, xonotic, drvctx), which perf had previously regressed with DRI3, is their perf now OK with it?
Comment 20 Eero Tamminen 2015-01-30 15:17:12 UTC
Compared DRI3 vs. DRI2 on BSW & BYT.

Instead of doubling the trivial GpuTest Triangle test performance compared to DRI2, DRI3 clearly regresses it on BSW (when using latest 3D stack components from git: mesa, kernel, X).

On BYT, instead of doubling GpuTest Triangle performance in fullscreen (like earlier), DRI3 didn't anymore affect it at all.

Any idea what has caused this change in the last few months?

NOTE: there were tests where DRI3 still (expectedly) improves performance, just not anymore in the trivial Triangle Fullscreen test which it should help most (as that has highest FPS).
Comment 21 wendy.wang 2015-10-21 08:40:17 UTC
Verified with latest GFX SW stack:
Kernel	4.3.0-rc4_drm-intel-nightly_c38f2c_20151010+
Mesa	git-82b324c
xf86	2.99.917-478-gdf72bc5
xserver	1.17.99.901

This DRI3 issue has been verified as fixed, so close it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.