Bug 107460 - radv: OpControlBarrier does not always work correctly (bisected)
Summary: radv: OpControlBarrier does not always work correctly (bisected)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/radeon (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-02 20:04 UTC by Philip Rebohle
Modified: 2018-08-15 14:22 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Probably affected compute shader (70.45 KB, application/octet-stream)
2018-08-02 20:04 UTC, Philip Rebohle
Details
Screenshot that shows the issue (2.14 MB, image/png)
2018-08-14 14:49 UTC, Philip Rebohle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Rebohle 2018-08-02 20:04:14 UTC
Created attachment 140941 [details]
Probably affected compute shader

Hello,

a regression that affects OpControlBarrier instructions in compute shaders causes major rendering issues in Final Fantasy XV. The attached shader seems to be the one running into this issue.

In short, DXVK translates sync_g_t instructions to the following:

    OpControlBarrier Workgroup, Workgroup, WorkgroupMemory | AcquireRelease

This currently does not work as expected. Interestingly, inserting an additional OpMemoryBarrier seems to fix the problem:

    OpMemoryBarrier Workgroup, WorkgroupMemory | AcquireRelease
    OpControlBarrier Workgroup, Workgroup, WorkgroupMemory | AcquireRelease

While this is closer to what glslang emits for equivalent GLSL barriers, emitting the extra OpMemoryBarrier instruction should not be necessary according to the SPIR-V spec for OpControlBarrier.



The commit which introduced the problem is:

    [f2b3e96e754a5d722f2b0fa1bd5efa1c0640ed3b]
    radv: drop copy of ac_create_target_machine.

Looking at the difference between the two implementations, there's the following line in ac_create_target_machine:

    bool barrier_does_waitcnt = family != CHIP_VEGA20;

Changing this to 'false' fixes the problem for me. My GPU is an RX 480 (Polaris 10).



Unfortunately it is quite hard to isolate the exact issue in the game, a renderdoc capture was not useful because the bug got baked into it for some reason, so here's a D3D11 apitrace that needs to be replayed with DXVK in order to reproduce the bug.

https://drive.google.com/file/d/1ywMEhn-P68Sino1_5yBkceLcMACTtDAI/view?usp=sharing
Comment 1 Samuel Pitoiset 2018-08-14 14:30:28 UTC
Can you upload a screenshot of the rendering issue, please?
Comment 2 Philip Rebohle 2018-08-14 14:49:23 UTC
Created attachment 141081 [details]
Screenshot that shows the issue

Here's a screenshot. Basically, the water geometry is extremely messed up and the game renders randomly flickering garbage.

Please note that I implemented the OpMemoryBarrier workaround in DXVK 0.64, so this can only be reproduced with 0.63.
Comment 3 Samuel Pitoiset 2018-08-14 16:46:49 UTC
I can confirm the issue with DXVK 0.63 on my RX480, as well as the potential fix. Though, the attached compute shader doesn't seem to be affected. I have to find the good one first. I will let you know.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.