Bug 96820 - intel-virtual-output: bad performance
Summary: intel-virtual-output: bad performance
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-05 12:49 UTC by main.haarp
Modified: 2016-07-18 10:00 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
5000 lines of ivo -v log (208.72 KB, text/plain)
2016-07-06 16:18 UTC, main.haarp
no flags Details
ivo log without Xfwm4's compositor (203.01 KB, text/plain)
2016-07-11 08:02 UTC, main.haarp
no flags Details
ivo log with xf86-video-intel of today's git (573.03 KB, text/plain)
2016-07-15 07:47 UTC, main.haarp
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description main.haarp 2016-07-05 12:49:03 UTC
Hey,

I'm using intel-virtual-output on a Thinkpad W520 (Sandy Bridge, Nvidia GPU). xf86-video-intel-2.99.917.

I'm facing bad performance on the virtual screens. Cursor movement for instance is choppy. It feels as if the monitors are running at low fps (despite the fact that they're physically running at their fill 59.95Hz)

When there is movement on the virtual screen, e.g. a video being shown, CPU usage of the Xorg process running on the Nvidia rises sharply (package power consumption as shown in turbostat rises from 8 to 30W)

I'd like to improve this. Is there anything I can do to diagnose this or provide useful information?
Comment 1 Chris Wilson 2016-07-05 13:07:39 UTC
intel-virtual-output -v will give you a lot of output about what it is doing, including the copying from one screen to another and importantly the method. Ideally we could share a buffer between Intel and Nvidia so that all copies are performed by the GPU. On the Intel side, this is easy to setup (and should already be) but I've not spent much time trying to find a way to do zero-copy transfer to nvidia.

Start by confirming it is the copy that is rate-limiting, that should be clear from the log and if you feel confident have a look through the code and see what improvements you can make. (Or offer a few pointers.)
Comment 2 main.haarp 2016-07-06 16:18:16 UTC
Created attachment 124936 [details]
5000 lines of ivo -v log

Thanks.

I've ran ivo with the -v flag while attaching two 2560x1440 monitors over DP. This produces quite a lot of log, I've attached it.

However I can't find any hints regarding being rate-limited. What exactly am I looking for?

I've also played with virtualgl vs. primus, but did not notice a difference between the two. ivo uses its own independent method of copying between X servers, correct?
Comment 3 Chris Wilson 2016-07-06 16:53:36 UTC
(In reply to main.haarp from comment #2)
> Created attachment 124936 [details]
> 5000 lines of ivo -v log
> 
> Thanks.
> 
> I've ran ivo with the -v flag while attaching two 2560x1440 monitors over
> DP. This produces quite a lot of log, I've attached it.
> 
> However I can't find any hints regarding being rate-limited. What exactly am
> I looking for?

I actually thought there was more timing info in there, but it's the frequency and duration of the rendering and its synchronisation that's important.
 
> I've also played with virtualgl vs. primus, but did not notice a difference
> between the two. ivo uses its own independent method of copying between X
> servers, correct?

ivo is just using plain X (which iirc virtualgl and primus also do). What I am hoping for is for nvidia to support DRI3/Present as that would give a very simple way to do efficient transfers. Alternatively using glMapBuffer() may fare better (although it should not from a technical standpoint).

The first step you may want to try is to stop your desktop wm from marking the whole framebuffer as being damaged on every frame.
Comment 4 main.haarp 2016-07-11 08:02:32 UTC
Created attachment 125003 [details]
ivo log without Xfwm4's compositor

Hey,

thanks for your response. It appears that Xfwm4's compositor was responsible for damaging the entire framebuffer on every frame. Disabling it improves performance a bit.

If nothing is going on, the mouse is now smooth. However if there is movement (e.g. a movie playing), the mouse becomes laggy again and CPU usage rises yet again. So disabling the compositor did not really improve the underlying problem.


So, if I'm interpreting things correctly, it's mostly a problem of copying frames from the Nvidia X to Intel X, correct? What exactly is the bottleneck? CPU, memory bandwidth, PCIe bandwidth?
Comment 5 Chris Wilson 2016-07-11 08:48:06 UTC
(In reply to main.haarp from comment #4)
> So, if I'm interpreting things correctly, it's mostly a problem of copying
> frames from the Nvidia X to Intel X, correct? What exactly is the
> bottleneck? CPU, memory bandwidth, PCIe bandwidth?

Other way. The copy is from the intel screen to the nvidia screen. The Intel half is fully accelerated, the nvidia portion is not. Ideally, we can eliminate one of those copies by passing a buffer between instead.
Comment 6 main.haarp 2016-07-15 07:47:46 UTC
Created attachment 125083 [details]
ivo log with xf86-video-intel of today's git

Oh, right. Yes, that direction makes more sense :)


I've tested today's GIT of xf86-video-intel (which finally fixes my crashing/freezing X on vt switch!) and discovered that ivo's performance went waaay down for accelerated applications.

Case in point: Chrome and Firefox. Multiple seconds just to switch tabs. The browser I'm writing this in is lagging behind my typing about half a minute. I can see a couple of letters appearing in burst almost precisely each second.

Unaccelerated applications seem to be unaffected, switching off hardware acceleration in Firefox also helps. But that's just a workaround.

ivo log attached.
Comment 7 Chris Wilson 2016-07-15 08:37:42 UTC
Hmm, it is doing full-screen updates via DRI3. You can try disabling DRI3 (xorg.conf, section Device, Option "DRI" "2") or hacking out that path in DRI2. Meanwhile, time to think about what is actually happening and why DRI3 clients are most affected.
Comment 8 main.haarp 2016-07-15 09:29:50 UTC
Correct, DRI3 was the problem. Using DRI2 I am back at the original level of performance. Thanks for the hint!
Comment 9 Chris Wilson 2016-07-15 09:35:55 UTC
(In reply to main.haarp from comment #8)
> Correct, DRI3 was the problem. Using DRI2 I am back at the original level of
> performance. Thanks for the hint!

If you tried disabling DRI3 in xorg.conf, can you try hacking out DRI3 from ivo? Or vice versa? (That will help narrow the search.)
Comment 10 main.haarp 2016-07-18 10:00:13 UTC
(In reply to Chris Wilson from comment #9)
> (In reply to main.haarp from comment #8)
> > Correct, DRI3 was the problem. Using DRI2 I am back at the original level of
> > performance. Thanks for the hint!
> 
> If you tried disabling DRI3 in xorg.conf, can you try hacking out DRI3 from
> ivo? Or vice versa? (That will help narrow the search.)

Ok, had a quick test with it. #undef DRI3 in virtual.c did produce a different binary, but the performance issue when DRI3 is enabled in X remains.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.