89014 – PIPE_QUERY_GPU_FINISHED is not acting as expected on SI

Bug 89014 - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI

Summary: PIPE_QUERY_GPU_FINISHED is not acting as expected on SI

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/Gallium/radeonsi (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:	Default DRI bug account

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-02-06 16:31 UTC by Axel Davy
Modified:	2015-02-17 16:49 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Hack used to use pipe fences instead of PIPE_QUERY_GPU_FINISHED (2.43 KB, text/plain) 2015-02-06 16:38 UTC, Axel Davy	Details
patch 1 (4.25 KB, patch) 2015-02-07 12:22 UTC, Marek Olšák	Details \| Splinter Review
patch 2 (3.71 KB, patch) 2015-02-07 12:23 UTC, Marek Olšák	Details \| Splinter Review
View All

Description Axel Davy 2015-02-06 16:31:36 UTC

I put it as radeonsi bug, but it is probably r600 bug given the implementation seems shared there.

Some d3d9 games do manual throttling as advised at the end of:
http://www.nvidia.com/object/General_FAQ.html#G4

"A second solution is to use DirectX 9's Asynchronous Query functionality (analogous to using fences in OpenGL).  At the end of your frame, insert a D3DQUERYTYPE_EVENT query into your rendering stream.  You can then poll whether the GPU has reached this event yet by using GetData."

Games like Heroes V of Might and Magic uses two d3d9 event queries (mapped to PIPE_QUERY_GPU_FINISHED) to do manual throttling:

end query A
loop until query B is OK (d3d9: loop on GetData /pipe: loop on pipe->get_query_result)
render
present frame
end query B
loop until query A is OK
render
present frame
etc

Only old apps seems to do this manual throttling, as likely recent drivers do it automatically, like Mesa.

Both Gallium Nine and Wine get poor performance with this scheme and get same performance (and is the same performance than by forcing a glfinish)

Not advertising the query under Gallium Nine gives a enormous performance boost to the app. Similarly advertising the query, but not using PIPE_QUERY_GPU_FINISHED but rather a custom implementation with pipe fences, gives the correct performance.
In both cases, forcing glFinish gives the same bad performance than before.

Thus PIPE_QUERY_GPU_FINISHED implementation seems to have a bug that makes it acts as glFinish instead of just waiting what was before the end query is rendered.

What seems strange is that Wine uses ARB_sync to implement the query, and it doesn't seem to be implemented in Mesa with PIPE_QUERY_GPU_FINISHED.

Comment 1 Axel Davy 2015-02-06 16:38:42 UTC

Created attachment 113232 [details]
Hack used to use pipe fences instead of PIPE_QUERY_GPU_FINISHED

Comment 2 Marek Olšák 2015-02-07 12:22:06 UTC

Created attachment 113242 [details] [review]
patch 1

Comment 3 Marek Olšák 2015-02-07 12:23:25 UTC

Created attachment 113243 [details] [review]
patch 2

Can you try these patches? Patch 1 is there only to avoid merge conflicts.

Comment 4 Axel Davy 2015-02-08 18:01:27 UTC

Yes, I confirm the patch does the trick.

Also for the comment that wine was too getting same performance than glFinish, I checked twice, and I think this is just mere coincidence that it gets performance around that. In specific scenes, wine got better than that.

Comment 5 Marek Olšák 2015-02-17 16:49:48 UTC

Fixed by 5f1cef76f9bbaae772120dcb38e0b98d68a93f26. Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.