I'm using intel-virtual-output on a Thinkpad W520 (Sandy Bridge, Nvidia GPU). xf86-video-intel-2.99.917.
I'm facing bad performance on the virtual screens. Cursor movement for instance is choppy. It feels as if the monitors are running at low fps (despite the fact that they're physically running at their fill 59.95Hz)
When there is movement on the virtual screen, e.g. a video being shown, CPU usage of the Xorg process running on the Nvidia rises sharply (package power consumption as shown in turbostat rises from 8 to 30W)
I'd like to improve this. Is there anything I can do to diagnose this or provide useful information?
intel-virtual-output -v will give you a lot of output about what it is doing, including the copying from one screen to another and importantly the method. Ideally we could share a buffer between Intel and Nvidia so that all copies are performed by the GPU. On the Intel side, this is easy to setup (and should already be) but I've not spent much time trying to find a way to do zero-copy transfer to nvidia.
Start by confirming it is the copy that is rate-limiting, that should be clear from the log and if you feel confident have a look through the code and see what improvements you can make. (Or offer a few pointers.)
Created attachment 124936 [details]
5000 lines of ivo -v log
I've ran ivo with the -v flag while attaching two 2560x1440 monitors over DP. This produces quite a lot of log, I've attached it.
However I can't find any hints regarding being rate-limited. What exactly am I looking for?
I've also played with virtualgl vs. primus, but did not notice a difference between the two. ivo uses its own independent method of copying between X servers, correct?
(In reply to main.haarp from comment #2)
> Created attachment 124936 [details]
> 5000 lines of ivo -v log
> I've ran ivo with the -v flag while attaching two 2560x1440 monitors over
> DP. This produces quite a lot of log, I've attached it.
> However I can't find any hints regarding being rate-limited. What exactly am
> I looking for?
I actually thought there was more timing info in there, but it's the frequency and duration of the rendering and its synchronisation that's important.
> I've also played with virtualgl vs. primus, but did not notice a difference
> between the two. ivo uses its own independent method of copying between X
> servers, correct?
ivo is just using plain X (which iirc virtualgl and primus also do). What I am hoping for is for nvidia to support DRI3/Present as that would give a very simple way to do efficient transfers. Alternatively using glMapBuffer() may fare better (although it should not from a technical standpoint).
The first step you may want to try is to stop your desktop wm from marking the whole framebuffer as being damaged on every frame.
Created attachment 125003 [details]
ivo log without Xfwm4's compositor
thanks for your response. It appears that Xfwm4's compositor was responsible for damaging the entire framebuffer on every frame. Disabling it improves performance a bit.
If nothing is going on, the mouse is now smooth. However if there is movement (e.g. a movie playing), the mouse becomes laggy again and CPU usage rises yet again. So disabling the compositor did not really improve the underlying problem.
So, if I'm interpreting things correctly, it's mostly a problem of copying frames from the Nvidia X to Intel X, correct? What exactly is the bottleneck? CPU, memory bandwidth, PCIe bandwidth?
(In reply to main.haarp from comment #4)
> So, if I'm interpreting things correctly, it's mostly a problem of copying
> frames from the Nvidia X to Intel X, correct? What exactly is the
> bottleneck? CPU, memory bandwidth, PCIe bandwidth?
Other way. The copy is from the intel screen to the nvidia screen. The Intel half is fully accelerated, the nvidia portion is not. Ideally, we can eliminate one of those copies by passing a buffer between instead.
Created attachment 125083 [details]
ivo log with xf86-video-intel of today's git
Oh, right. Yes, that direction makes more sense :)
I've tested today's GIT of xf86-video-intel (which finally fixes my crashing/freezing X on vt switch!) and discovered that ivo's performance went waaay down for accelerated applications.
Case in point: Chrome and Firefox. Multiple seconds just to switch tabs. The browser I'm writing this in is lagging behind my typing about half a minute. I can see a couple of letters appearing in burst almost precisely each second.
Unaccelerated applications seem to be unaffected, switching off hardware acceleration in Firefox also helps. But that's just a workaround.
ivo log attached.
Hmm, it is doing full-screen updates via DRI3. You can try disabling DRI3 (xorg.conf, section Device, Option "DRI" "2") or hacking out that path in DRI2. Meanwhile, time to think about what is actually happening and why DRI3 clients are most affected.
Correct, DRI3 was the problem. Using DRI2 I am back at the original level of performance. Thanks for the hint!
(In reply to main.haarp from comment #8)
> Correct, DRI3 was the problem. Using DRI2 I am back at the original level of
> performance. Thanks for the hint!
If you tried disabling DRI3 in xorg.conf, can you try hacking out DRI3 from ivo? Or vice versa? (That will help narrow the search.)
(In reply to Chris Wilson from comment #9)
> (In reply to main.haarp from comment #8)
> > Correct, DRI3 was the problem. Using DRI2 I am back at the original level of
> > performance. Thanks for the hint!
> If you tried disabling DRI3 in xorg.conf, can you try hacking out DRI3 from
> ivo? Or vice versa? (That will help narrow the search.)
Ok, had a quick test with it. #undef DRI3 in virtual.c did produce a different binary, but the performance issue when DRI3 is enabled in X remains.
Is there any progress on this? Using intel-virtual-output or reverse prime both result in huge CPU usage when anything on screen is updated. Disabling compositing helps somewhat, but not sufficiently.