Using SNA acceleration, Blobby Volley game is not playable. Game runs much slower than with UXA. Tested on: blobby 0.9c with OpenGL intel 965GM, driver version 2.20.5 mesa 7.12.2 and 8.0.4 kernel 3.2.23 and 3.5.2
Since this is a GL game, it suggests that something seriously broke when you tried to enable SNA. Please attach your Xorg.0.log, glxinfo and for my convenience a link to a deb ;-)
Even in the non-GL mode it takes <10% cpu on an 965gm @ 800x600.
Created attachment 66353 [details] glxinfo
Created attachment 66354 [details] xorg log
Version 0.9 is no longer in the repo :( But since 0.8 there was few changes and probably will behave the same. http://ftp.debian.org/debian/pool/main/b/blobby/blobby-data_0.8-dfsg-2_all.deb http://ftp.debian.org/debian/pool/main/b/blobby/blobby_0.8-dfsg-2_i386.deb
Ok, they look sane enough. What is top like whilst running blobby? And perf top?
I'm not sure what you are asking. I'm running blobby on fullscreen, in preferences I set opengl. Btw. window mode works faster. Game has about 5% cpu usage.
I'm trying to get some metrics to diagnose where your problem lies. Let's start with basic glxgears to check everything is in order. Can you run glxgears with and without export vblank_mode=0, windowed and fullscreen (glxgears -fullscreen) under UXA and SNA?
Created attachment 66370 [details] glxgears - SNA and UXA
Added glxears logs on attachment. Generally, SNA works better on my laptop. I have problems only with this game. Other games runs the same or faster.
Ok, the GL setup itself is functioning as expected. What environment are you running blobby under (kwin, gnome-shell etc)?
As far as I can tell, blobbly is equally responsive under UXA and SNA on an thinkpad t61 (with 965gm). Other than something interfering with the presentation of frames (like a compositor), the only other suggestion is that for some reason blobby is not being woken up after a swapbuffers in a timely manner. Probably time to compile xf86-video-intel manually. Check that the issue still occurs and then add --enable-debug=full as a configure option and attach the Xorg.0.log for a game session.
Log is large, it has about 150 MB. So i compressed it and uploaded there: http://www.speedyshare.com/a3pUR/Xorg.0.log-debug.tar.bz2 I use openbox 3.5. But if I remember correctly, on gnome 3.4 was the same. Previous logs were from my own compilation of driver too.
What is immediately apparent from the debug file is that it often takes a couple of vblanks for the swapbuffers to complete. To test that the scheduling is snafu can you try: diff --git a/src/sna/sna_dri.c b/src/sna/sna_dri.c index 5fa17d7..b380ab4 100644 --- a/src/sna/sna_dri.c +++ b/src/sna/sna_dri.c @@ -1708,6 +1708,25 @@ sna_dri_immediate_blit(struct sna *sna, DBG(("%s: emitting immediate blit, throttling client\n", __FUNCTION__)); VG_CLEAR(vbl); + { + info->type = DRI2_SWAP; + info->bo = sna_dri_copy_to_front(sna, draw, NULL, + get_private(info->front)->bo, + get_private(info->back)->bo, + true); + + vbl.request.type = + DRM_VBLANK_RELATIVE | + DRM_VBLANK_NEXTONMISS | + DRM_VBLANK_EVENT | + pipe_select(info->pipe); + vbl.request.sequence = 0; + vbl.request.signal = (unsigned long)info; + if (sna_wait_vblank(sna, &vbl)) + sna_dri_frame_event_info_free(sna, draw, info); + return; + } + if ((sna->flags & SNA_NO_WAIT) == 0) { info->type = DRI2_SWAP_THROTTLE; if (sna_dri_window_get_chain((WindowPtr)draw) == info) {
Created attachment 66374 [details] [review] Immediate vsync swapbuffers
Created attachment 66376 [details] Xorg.0.log - debug with patch
With this patch (Immediate vsync swapbuffers) runs slower than without it.
Still taking multiple vblanks from the time the blit for the swap is queue to the time it takes effect. At this point the indication would be that it is simply taking too long for blobby to render its frames (the key difference between UXA and SNA here is that SNA checks that the swap occurs before signalling the client, UXA just signals the client regardless after queueing the blit.) Can you build trace-cmd (git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git) and record what the driver sees with 'sudo trace-cmd record -e i915' then attach the output of 'trace-cmd report'.
I don't know if I use it correctly: root@deve:/home/Dokumenty/trace-cmd# ./trace-cmd record -e i915 /sys/kernel/debug/tracing/events/i915/filter /sys/kernel/debug/tracing/events/*/i915/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 1084 oldest event ts: 0.348270 now ts: 112.072546 CPU0 data recorded at offset=0x12e000 2482176 bytes in size Before hit Ctrl^C I run blobby. On attachment trace-cmd report.
Created attachment 66378 [details] trace-cmd report
(In reply to comment #19) > I don't know if I use it correctly: You did just fine. The report contains the interactions between blobby, X and fluxbox. Now I just need to piece it together to work out why the frame latency is so high. Just for my information, which -intel did you use whilst generating the trace?
Ok, I have a theory based on that trace, and the fault lies in -intel submitting the next vsync'ed blit immediately upon receiving the vblank - and so blocks the client rendering for a whole frame. Then we queue up two frames the client and the cycle continues. That should be demonstrable with diff --git a/src/sna/sna_dri.c b/src/sna/sna_dri.c index 0210d73..87881ba 100644 --- a/src/sna/sna_dri.c +++ b/src/sna/sna_dri.c @@ -1182,7 +1182,7 @@ static void chain_swap(struct sna *sna, chain->bo = sna_dri_copy_to_front(sna, draw, NULL, get_private(chain->front)->bo, get_private(chain->back)->bo, - true); + false); type = DRI2_BLIT_COMPLETE; }
Perhaps the bigger problem is the 10ms latency from the vblank firing to X waking up... Hmm.
All logs were from intel driver 2.20.5. To generate trace-cmd report I use debug version with SNA. Trace-cmd was on fluxbox because debug version of drivers often crashes my X environment. I thought it causes by openbox. But it probably doesn't matter.
With lastest change (comment 22), on first sight, blobby runs perfect :)
Created attachment 66399 [details] Xorg.0.log - debug log with patch from comment22
Ok, how about this little line: diff --git a/src/sna/sna_dri.c b/src/sna/sna_dri.c index 0210d73..a04427d 100644 --- a/src/sna/sna_dri.c +++ b/src/sna/sna_dri.c @@ -1268,7 +1268,7 @@ void sna_dri_vblank_handler(struct sna *sna, struct drm_ev break; case DRI2_SWAP_THROTTLE: - if (!sna_dri_blit_complete(sna, info)) + if (0&&!sna_dri_blit_complete(sna, info)) return; DBG(("%s: %d complete, frame=%d tv=%d.%06d\n",
With patch from comment 27 runs very similar to comment 22. I don't know if faster, maybe yes. Logs in attachment.
Created attachment 66438 [details] Xorg.0.log - debug with patch from comment 27
Most important is that we don't see any ill-effects, especially starvation of other clients... commit f837807cc257fbedd35fa6101dc0b4a57e5ad78d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Sep 1 10:36:09 2012 +0100 sna/dri: Remove busy-wait spin from vblank chaining of swapbuffers The issue being that, due to the delay, the chained swap would miss its intended vblank and so cause an unwanted reduction in frame throughput and increase output latency even further. Since both client and server have other rate-limiting processes in place, we can forgo the stall here and still keep the clients in check. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54274 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.