Bug 102597

Summary: [Regression] mpv, high rendering times (two to three times higher)
Product: Mesa Reporter: Cris <cristiano04>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Cris 2017-09-07 22:34:19 UTC
Known good version: 17.1.6
Known bad version: 17.2

To reproduce, one need only run mpv with the arguments --vo=opengl-hq --scale=ewa_lanczossharp
Comment 1 Kenneth Graunke 2017-09-08 01:30:38 UTC
We can't really look into this unless you tell us what hardware you're using.
Comment 2 Cris 2017-09-08 10:58:14 UTC
(In reply to Kenneth Graunke from comment #1)
> We can't really look into this unless you tell us what hardware you're using.

R9 280x, mpv git, linux 4.13. mpv-stats.lua is required to measure timings in a hassle free manner.
Comment 3 Niklas Haas 2017-09-11 23:23:39 UTC
Hello, this is the author of the affected `mpv` code.

I've reproduced and bisected the mesa issue, and the first bad commit is:

bd4b224fa6630262df2b70fd6a6fc8080ad59086 is the first bad commit
commit bd4b224fa6630262df2b70fd6a6fc8080ad59086
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Mon May 15 17:27:25 2017 +0200

    gallium/radeon: use a top-of-pipe timestamp for the start of TIME_ELAPSED
    
    Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

:040000 040000 22f52d00a837608d7a43e50498519850bf6ccdf6 eebe49b7fe167a7d6f96fe2a18218f350dededce M	src

This leads me to believe it's just a cosmetic issue, rather than a geneuine performance regression. In order to figure out whether the results were more correct before or after the change, I compared the numbers of all of my render passes (added together) against the measurements of a single TIME_ELAPSED query that surrounds the entire frame. The numbers matched perfectly before this commit. Since the commit, the per-pass numbers are all way higher than they should be.

I suspect this is because the driver is now “over-estimating” the amount of time taken in between successive passes. Consider the following call sequence:

glBeginQuery(GL_TIME_ELAPSED)
// submit some draw calls
glEndQuery(GL_TIME_ELAPSED)
glBeginQuery(GL_TIME_ELAPSED)
// submit some more draw calls
glEndQuery(GL_TIME_ELAPSED)

If what I suspect is right, and these are assembled into the same command buffer, then the “record TOP_OF_PIPE timestamp” for both queries would possibly fire at the same time (out of order with the draw calls), leading to an over-counting of the time taken for the second set of draw calls.

Perhaps it would be more correct to submit all of the timestamps with the pipeline stage set to BOTTOM_OF_PIPE bit, so that the timestamps are (respectively) only recorded once the previous commands in the command buffer have been fully realized?
Comment 4 Niklas Haas 2017-09-11 23:46:55 UTC
Also, even with this patch reverted, the reported timers are unrealistic: The TIME_ELAPSED queries tell me my rendering takes ~8ms, but I'm getting 200 fps. (With higher settings, it tells me it takes ~20ms but I'm getting 90 fps)
Comment 5 Niklas Haas 2017-09-11 23:51:34 UTC
> The TIME_ELAPSED queries tell me my rendering takes ~8ms, but I'm getting 200 fps.

Oops, my bad - I was measuring FPS incorrectly. With the measurement fixed, and the patch in question reverted, the numbers match up perfectly. GL_TIME_ELAPSED queries sum up to ~19ms and I'm getting ~50 fps.
Comment 6 Marek Olšák 2017-09-12 00:10:55 UTC
Here's why the times are different: Draw calls are not executed sequentially, but instead they are executed in parallel when it's possible. If you start 10 draw calls wrapped around 10 TIME_ELAPSED queries, the start times will be similar (they all should start quickly if compute units aren't overloaded), but the end times will be different. The TIME_ELAPSED query then returns (end - start).

Now some theory. The top-of-pipe timestamp means "before the next draw call started". The bottom-of-pipe timestamp means "after the last draw call finished". The driver subtracts top-of-pipe from bottom-of-pipe to get TIME_ELAPSED.

That is an accurate implementation for measuring how much time each draw call spent in this highly pipelined and parallel system that is the GPU. However, it's actually not what the spec "suggests" in the Issues section of ARB_timer_query. It suggests that TIME_ELAPSED should record both start and end timestamps at bottom-of-pipe. That seems kinda silly, because it means that TIME_ELAPSED should be (draw_finish_time - previous_draw_finish_time). What if they finish nearly at the same time? TIME_ELAPSED would be close to 0, and that is not how much time the draw call spent in the system.

This all becomes a non-issue if we simply assume that TIME_ELAPSED results are not additive.
Comment 7 Niklas Haas 2017-09-12 04:53:40 UTC
> That seems kinda silly, because it means that TIME_ELAPSED should be (draw_finish_time - previous_draw_finish_time). What if they finish nearly at the same time? TIME_ELAPSED would be close to 0, and that is not how much time the draw call spent in the system.

Doesn't it become a question of what these time_elapsed queries are supposed to be useful for? If a pass becomes "free" because it's executed in parallel with another pass, then I would (personally) expect the measured execution time to be low. Basically, what I'm interested in measuring is the critical path; rather than the “sum” of all GPU time spent in the program. I also highly rely on the additive property of pass queries, which is perhaps what khronos also intended when they advised using bottom_of_pipe queries.

As for practical solutions - I've already experimented using GL_TIMESTAMP instead of GL_TIME_ELAPSED - the main problem I had with this approach was that it completely broke on OS X and Windows (ANGLE), because apparently google and apple are unwilling to implement them, citing “lack of a way to communicate overflow” as a reason (and apparently oblivious to the possibility of using 64-bit timestamps like everybody else).

Worst comes to worst, I'll have to work around this particular issue by using GL_TIMESTAMP where available and falling back to GL_TIME_ELAPSED otherwise, but either way, I still think this is a rather silly violation of both “common sense” and the khronos specification.

Specifically:
> When BeginQuery and EndQuery are called with a <target> of
> TIME_ELAPSED, the GL prepares to start and stop the timer used for
> timer queries.  The timer is started or stopped when the effects from all
> previous commands on the GL client and server state and the framebuffer
> have been fully realized.
http://developer.download.nvidia.com/opengl/specs/GL_ARB_timer_query.txt

This is not just an advisal, this directly implies a dependency between the BeginQuery and the previous draw calls. My application is coded around the behavior described by this paragraph, which your patch appears to violate.
Comment 8 Cris 2017-09-20 15:02:56 UTC
Well? Thousands of mpv users rely on mesa's now incorrect behaviour to diagnose performance issues. Are you going to revert that commit in accordance with the spec or are you going to be stubborn and continue to violate it?
Comment 9 Niklas Haas 2017-10-02 21:38:05 UTC
> Thousands of mpv users rely on mesa's now incorrect behaviour to diagnose performance issues.

Well, to be fair, that's going to end up being misleading one way or the other - the timers are unreliable on so many platforms and configurations that using them as the basis of a performance evaluation needs to be taken with a heavy grain of salt. I also think these timers are only really relevant to developers anyway; for users the only thing that really matters is whether or not it drops frames (which mpv already displays separately).

> Are you going to revert that commit in accordance with the spec or are you going to be stubborn and continue to violate it?

I don't think pestering the developers is going to help this issue progress; if anything, it will accomplish the opposite.
Comment 10 Dieter Nützel 2018-11-14 02:38:00 UTC
Code fix under way:

https://lists.freedesktop.org/archives/mesa-dev/2018-November/209473.html

With this patch mpv drops notably, apart that '--vo=opengl-hq' isn't available any longer. Was replaced by '--vo=gpu'.
Comment 11 Emil Velikov 2018-12-07 16:09:54 UTC
The following commit reverts to the original behaviour, this we can close this. Feel free to reopen if it doesn't help.


commit ea9f95e2a67eca90bb84eea24e7b4b804b3b1345
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Tue Nov 13 16:19:42 2018 -0500

    radeonsi: go back to using bottom-of-pipe for beginning of TIME_ELAPSED

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.