Created attachment 142700 [details]
According to the Vulkan spec we need VK_QUERY_RESULT_WAIT_BIT here to ensure that all previous query commands have written a result. Using that on RADV with a large query count causes a GPU hang.
Attached a test case which reproduces this. Hangs for me on a Vega 64, with latest git master and 18.3 branch. Doesn't hang if the query count is reduced (in my testing, 512 queries hangs but 256 doesn't), or if VK_QUERY_RESULT_WAIT_BIT is not used.
FWIW, doesn't occur on AMDVLK. As far as I can see, that is just using a barrier rather than waiting for each individual query value to be available.
Also, it looks like RADV is only waiting on the low 32 bits of the query value. Couldn't you get very unlucky and get a valid timestamp with the low 32 bits as 0xffffffff, which would hang?
I can confirm this, working on.
The problem has been introduced by:
Author: Samuel Pitoiset <email@example.com>
Date: Tue Sep 25 20:26:58 2018 +0200
radv: do not use the availability bit for timestamp queries
It's unnecessary because we can just check if the timestamp
is to different to the default value when a pool is created
or resetted. Instead of waiting for the availability bit to
be 1, we have to emit a not equal WAIT_REG_MEM for checking
if the timestamp is ready.
Signed-off-by: Samuel Pitoiset <firstname.lastname@example.org>
Reviewed-by: Dave Airlie <email@example.com>
Can you try this patch https://patchwork.freedesktop.org/series/53482/ ?
Created attachment 142732 [details]
New test case
Should be fixed with