Something is causing an extra wait for vsync in xbmc when it needs to read back pixels. This seems to be specific to the radeon driver as I'm not seeing it with nouveau. Test scenario: - Configure xbmc to wait for vsync - Install something that activates xbmc's capture code (e.g. xbmc's boblight plugin) - Play a video whose framerate > screen refresh / 2 You'll see the frame rate reliably locked to screen refresh / 2 under these circumstances. A major problem as you generally want to run with a 24 Hz refresh rate for 24 Hz films. I'm unable to pinpoint exactly the offending GL command as xbmc's rendering is quite complex. I hope you're more experienced in finding these things. :) The rough sequence of events is: 1. xbmc renders the scene 2. Wait for vsync 3. Map PBO and memcpy data. (from previous frame) 4. Render small version of scene 5. glReadPixels into PBO
Hardware and software info: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] mesa-libGL-10.1.3-1.20140509.fc20.x86_64 kernel-3.14.4-200.fc20.x86_64
Ok, so it is glReadPixels() that is waiting for vsync. I added some measurements around that call and it is definitely where the delay happens. For good measure I removed the mapping (step 2) and the extra render (step 4). So it does seem that glReadPixels() is simply broken and isn't doing the nice asynchronous read is it supposed to. Oddly enough I could not reproduce the problem with a simple test program. So there is something else required to provoke the issue. Ideas?
I'd try to narrow down where exactly in glReadPixels the delay is incurred. Either using some profiling / tracing tool, or just by adding printfs with timestamps in strategic places.
Any appropriate tracing tools for this? I'm also seeing something else waiting for vsync. With glReadPixels() out of the picture, and glXSwapIntervalMESA(0), I'm still getting 60 fps and no tearing. Possibly related?
(In reply to comment #4) > I'm also seeing something else waiting for vsync. With glReadPixels() out of > the picture, and glXSwapIntervalMESA(0), I'm still getting 60 fps and no > tearing. Possibly related? Either Option "SwapbuffersWait" "off" and/or Option "EnablePageFlip" "off" should avoid that.
(In reply to comment #5) > (In reply to comment #4) > > I'm also seeing something else waiting for vsync. With glReadPixels() out of > > the picture, and glXSwapIntervalMESA(0), I'm still getting 60 fps and no > > tearing. Possibly related? > > Either > > Option "SwapbuffersWait" "off" > > and/or > > Option "EnablePageFlip" "off" > > should avoid that. I see. In the interest of being honest to applications, shouldn't glXGetSwapIntervalMESA() return 1 when those settings are on?
Anyhoo, getting back to the matter at hand. I set up a new machine so I can more easily do debugging of this. The hardware is instead: 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Richland [Radeon HD 8510G] Not identical, but should be a very similar chip (ARUBA vs TURKS). I am still seeing the frame rate problem on this machine. It is however not constant, and comes and goes. I am running a lower resolution here, which might be a factor. Also, for some reason I am not getting delays in glReadPixels(). So whatever goes wrong must be happening some place else. I turned on some of the gallium overlays. Not sure if this tells you anything: - Normal playback: buffer wait time of 15k-20k - With glReadPixels(): ~35k - When the bug strikes: jumps to ~50k The frame rate frop and buffer wait time increases are perfectly synchronised every time.
The vsynced blit path for presentation is implemented by *stalling* GPU command processing. That's a big hammer, and might be related to your problems. You can skip this (at the cost of tearing in some cases) by disabling SwapBuffersWait. Have you tried that, just for testing?
(In reply to comment #8) > The vsynced blit path for presentation is implemented by *stalling* GPU > command processing. That's a big hammer, and might be related to your > problems. You can skip this (at the cost of tearing in some cases) by > disabling SwapBuffersWait. Have you tried that, just for testing? No effect I'm afraid. Since this is a full screen application I assume it was already using flipping and not blitting anyway?
Ok, some kind of progress. The call that is causing the delay now is glUseProgram(0);. It grows gradually from no latency up to 17 ms (i.e. one frame @ 60 Hz). Then it goes back down again and the cycle repeats. Ideas?
Pinpointed it further to: FLUSH_VERTICES(ctx, _NEW_PROGRAM | _NEW_PROGRAM_CONSTANTS); But that's enough for tonight. Will have to resume this another day.
More digging. Down to dri2_drawable_get_buffers() now. I assume I'll be hitting a point where I'll have to switch over to looking in the X server soon...
(In reply to comment #9) > No effect I'm afraid. Since this is a full screen application I assume it > was already using flipping and not blitting anyway? Probably. For the sake of testing, have you tried disabling page flipping in addition to SwapBuffersWait? (In reply to comment #12) > Down to dri2_drawable_get_buffers() now. I assume I'll be hitting a point > where I'll have to switch over to looking in the X server soon... Yep, it's waiting for DRI2 buffer information from the X server, which is delayed until the previous buffer swap actually finishes. FWIW, you might get somewhat less confusing timings if you call glFinish() before glXSwapBuffers().
(In reply to comment #13) > (In reply to comment #9) > > No effect I'm afraid. Since this is a full screen application I assume it > > was already using flipping and not blitting anyway? > > Probably. For the sake of testing, have you tried disabling page flipping in > addition to SwapBuffersWait? > Both disabled behaves the same as just SwapBuffersWait=off as far as I can tell. I.e. with glXSwapInterval(1) I still get problems. With glXSwapInterval(0) I get consistent 60 fps, but tearing instead and it's difficult to see if I get some frame jitter. (SwapBuffersWait=on, EnablePageFlip=on and glXSwapInterval(0) also gets rid of the massive fps drops, but has jitter instead) > > (In reply to comment #12) > > Down to dri2_drawable_get_buffers() now. I assume I'll be hitting a point > > where I'll have to switch over to looking in the X server soon... > > Yep, it's waiting for DRI2 buffer information from the X server, which is > delayed until the previous buffer swap actually finishes. > Is that expected behaviour? I.e. am I chasing an already known limitation? The xbmc code seems to assume that those gl command will execute asynchronously as it is using querires to determine when the rendering and glReadPixels() is done. > > FWIW, you might get somewhat less confusing timings if you call glFinish() > before glXSwapBuffers(). I'll play around with it. But that doesn't sound like a good solution upstream as I guess it will remove parallelism for the drivers that can do all of this in the background?
Ok, I took a step back and decided to look at this at a higher level again. A single wait for vsync can't be causing problems, and there has to be at least two. So I set out to find the other one. And I think I've figured this out (somewhat). First this though: (In reply to comment #7) > I am still seeing the frame rate problem on this machine. It is however not > constant, and comes and goes. I am running a lower resolution here, which > might be a factor. Turns out it was caused be different configuration in xbmc. I hadn't turned on the setting where it tries to properly keep track of when to display frames (which I've found necessary in many cases to keep good A/V sync). With that setting on, I'm reliably getting a constant halved frame rate. Scenario 1 - No glReadPixels() ============================== This is the normal case that works, but by some luck it seems. This is how xbmc expects things to go: 1. Render the frame 2. Wait for vblanks until the right timestamp 3. glXSwapBuffers(); Now what happens here is that 1. will block and wait for the last swap. By the time we've reached 2., we've already passed the proper timestamp and we want to wait for -8 ms. This returns instantly and we move on to 3. and then repeat the cycle. This design seems broken even in the best of cases. Say that 1. is non-blocking. Then we'd get: The rendering (1.) goes instantly. We're still at the start of the screen refresh, so we have 8 ms to wait in step 2. But it waits for vblanks so we'll wait for 17 ms instead. When we then do the swap (3.), it will not happen until yet another 17 ms later, completely overshooting the desired presentation time. And if glXSwapBuffers() is blocking, then we'd be even worse of and limited to just one frame every other screen refresh. Scenario 2 - with glReadPixels() ================================ This to some extent degenerates into that bad scenario in the last paragraph. We now have these steps: 1. Render the frame 2. Wait for vblanks until the right timestamp 3. glXSwapBuffers(); 4. Render capture frame 5. glReadPixels() What happens here is precisely the same as a blocking glXSwapBuffers(). 1. will no longer be blocking as 4. has already forced a wait for a buffer swap. 2. will then wait for at least 8 ms, but in practice for another vblank. The swap is scheduled (3.), and then immediately waited on by doing more rendering (4.). By the time we come back to 1., xbmc realises too much time has passed and drops a frame.
Right now it feels like a lot of the blame falls on xbmc. But at the same time, this works on nouveau and presumably the proprietary drivers. I guess the behaviour that xbmc is expecting is that the only time it will wait for a vblank, is that explicit vblank waiting in step 2.? Now is that an unreasonable expectation? Or should the radeon driver be fixed to support this?
(In reply to comment #14) > (In reply to comment #13) > > (In reply to comment #12) > > > Down to dri2_drawable_get_buffers() now. I assume I'll be hitting a point > > > where I'll have to switch over to looking in the X server soon... > > > > Yep, it's waiting for DRI2 buffer information from the X server, which is > > delayed until the previous buffer swap actually finishes. > > Is that expected behaviour? I.e. am I chasing an already known limitation? Yes, it's a limitation of DRI2 (at least without triple buffering, which is hard to do with DRI2). > The xbmc code seems to assume that those gl command will execute > asynchronously as it is using querires to determine when the rendering and > glReadPixels() is done. glReadPixels is currently always synchronous with all Gallium based drivers, as there's no hardware acceleration for PBOs yet. > > FWIW, you might get somewhat less confusing timings if you call glFinish() > > before glXSwapBuffers(). > > I'll play around with it. But that doesn't sound like a good solution > upstream as I guess it will remove parallelism for the drivers that can do > all of this in the background? Of course, this wasn't intended as a solution but just as a debugging aid to try and make your timings correspond better to where the time is actually spent in the driver / hardware. That said, in the scenarios you described, there would need to be at least a glFlush() call before waiting for vblank, otherwise the driver / hardware may not even start actually rendering the frame before the glXSwapBuffers call. (In reply to comment #16) > I guess the behaviour that xbmc is expecting is that the only time it will > wait for a vblank, is that explicit vblank waiting in step 2.? > > Now is that an unreasonable expectation? I'm afraid so. It would be better to use something like GLX_OML_sync_control's glXSwapBuffersMscOML() for timing buffer swaps, instead of explicitly waiting for vblank and then calling glXSwapBuffers().
(In reply to comment #17) > > glReadPixels is currently always synchronous with all Gallium based drivers, > as there's no hardware acceleration for PBOs yet. > Hmm... But I'm not consistently seeing a delay around glReadPixels(). The area is small though, so maybe it just goes too fast and any delays I see is waits for vblank... > That said, in the scenarios you described, there would need to be at least a > glFlush() call before waiting for vblank, otherwise the driver / hardware > may not even start actually rendering the frame before the glXSwapBuffers > call. I dug around more and there are at least one glFlush() earlier. I can't swear it covers all the drawing, but at least parts of it. > (In reply to comment #16) > > I guess the behaviour that xbmc is expecting is that the only time it will > > wait for a vblank, is that explicit vblank waiting in step 2.? > > > > Now is that an unreasonable expectation? > > I'm afraid so. It would be better to use something like > GLX_OML_sync_control's glXSwapBuffersMscOML() for timing buffer swaps, > instead of explicitly waiting for vblank and then calling glXSwapBuffers(). It seems to fit this scenario well, yes. Unfortunately xbmc is very non-trivial, and also has a lot of abstraction to support other backends (like DirectX). Their current solution seems very similar to glXWaitForMscOML() though, but using m_glXWaitVideoSyncSGI() and conditionals. I guess the easiest solution for now is to look at that wait function (2.) and get rid of the degeneration condition. As for this bug, I'm not sure if you want to close it or not?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/494.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.