Bug 108894 - [anv] vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() write-after-write hazard
Summary: [anv] vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() write-after-write hazard
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Vulkan/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-29 09:06 UTC by Józef Kucia
Modified: 2018-11-30 09:45 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Józef Kucia 2018-11-29 09:06:47 UTC
It seems to be impossible to synchronize vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() commands. I think it should be enough to issue a pipeline barrier with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT between vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() to avoid write-after-write hazards. However, in my test case, I consistently get the value written by vkCmdCopyBuffer(). vkCmdCopyBuffer() is issued before vkCmdCopyQueryPoolResults().

--- api dump ---
vkCmdResetQueryPool(commandBuffer, queryPool, firstQuery, queryCount) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55c2b2e7d430
    queryPool:                      VkQueryPool = 0x55c2b2dbd050
    firstQuery:                     uint32_t = 0
    queryCount:                     uint32_t = 1

vkCmdWriteTimestamp(commandBuffer, pipelineStage, queryPool, query) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55c2b2e7d430
    pipelineStage:                  VkPipelineStageFlagBits = 8192 (VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT)
    queryPool:                      VkQueryPool = 0x55c2b2dbd050
    query:                          uint32_t = 0

vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, regionCount, pRegions) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55c2b2e7d430
    srcBuffer:                      VkBuffer = 0x55c2b2e80150
    dstBuffer:                      VkBuffer = 0x55c2b2dbcfc0
    regionCount:                    uint32_t = 1
    pRegions:                       const VkBufferCopy* = 0x7ffc0d802d20
        pRegions[0]:                    const VkBufferCopy = 0x7ffc0d802d20:
            srcOffset:                      VkDeviceSize = 0
            dstOffset:                      VkDeviceSize = 0
            size:                           VkDeviceSize = 32

// we should issue a pipeline barrier here

vkCmdCopyQueryPoolResults(commandBuffer, queryPool, firstQuery, queryCount, dstBuffer, dstOffset, stride, flags) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55c2b2e7d430
    queryPool:                      VkQueryPool = 0x55c2b2dbd050
    firstQuery:                     uint32_t = 0
    queryCount:                     uint32_t = 1
    dstBuffer:                      VkBuffer = 0x55c2b2dbcfc0
    dstOffset:                      VkDeviceSize = 0
    stride:                         VkDeviceSize = 8
    flags:                          VkQueryResultFlags = 3 (VK_QUERY_RESULT_64_BIT | VK_QUERY_RESULT_WAIT_BIT)
--- api dump ---

The expected result is to get the value written by vkCmdCopyQueryPoolResults(). On Anvil, the result is the value written by vkCmdCopyBuffer().

I have a test case in the form of a vkd3d test: https://source.winehq.org/git/vkd3d.git/blob/HEAD:/tests/d3d12.c#l18279
The test passes on RADV and Nvidia. Note that the current version of vkd3d doesn't issue a required pipeline barrier between vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults().
Comment 1 Lionel Landwerlin 2018-11-29 12:03:27 UTC
(In reply to Józef Kucia from comment #0)

> 
> I have a test case in the form of a vkd3d test:
> https://source.winehq.org/git/vkd3d.git/blob/HEAD:/tests/d3d12.c#l18279
> The test passes on RADV and Nvidia. Note that the current version of vkd3d
> doesn't issue a required pipeline barrier between vkCmdCopyBuffer() and
> vkCmdCopyQueryPoolResults().

Are you sure this is the test_query_timestamp() test?
It's passing for me and tracing the commands of that test, I can't see any vkCmdCopyBuffer() call.
Comment 2 Józef Kucia 2018-11-29 12:04:42 UTC
It is test_resolve_non_issued_query_data().
Comment 3 Lionel Landwerlin 2018-11-29 12:22:10 UTC
Got it.

Yes, I think you need to put barriers because in Anv we're using the 3d pipeline to copy buffers, whereas timestamps are copied using the command streamer.
Those being 2 different pieces of hardware, there is no guarantee about which one is going to write to memory first/last.
Comment 4 Józef Kucia 2018-11-29 12:25:50 UTC
Yes, but even with barriers I am not able to get the test passing on Anv.
Comment 5 Lionel Landwerlin 2018-11-29 12:27:50 UTC
(In reply to Józef Kucia from comment #4)
> Yes, but even with barriers I am not able to get the test passing on Anv.

I'm unfamiliar with d3d, what should I add to insert a barrier?
Comment 6 Jason Ekstrand 2018-11-29 13:33:58 UTC
The issue is most likely because we accumulate barriers and delay actually applying the them until the next draw or copy command.  Should be an easy fix.
Comment 7 Józef Kucia 2018-11-29 14:16:24 UTC
(In reply to Lionel Landwerlin from comment #5)
> I'm unfamiliar with d3d, what should I add to insert a barrier?

transition_resource_state(command_list, readback_buffer,                    
   D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_COPY_DEST);    

produces the following Vulkan barrier:

vkCmdPipelineBarrier(commandBuffer, srcStageMask, dstStageMask, dependencyFlags, memoryBarrierCount, pMemoryBarriers, bufferMemoryBarrierCount, pBufferMemoryBarriers, imageMemoryBarrierCount, pImageMemoryBarriers) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55a1a81b7430
    srcStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dstStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dependencyFlags:                VkDependencyFlags = 0
    memoryBarrierCount:             uint32_t = 0
    pMemoryBarriers:                const VkMemoryBarrier* = NULL
    bufferMemoryBarrierCount:       uint32_t = 1
    pBufferMemoryBarriers:          const VkBufferMemoryBarrier* = 0x7fffeabcdd00
        pBufferMemoryBarriers[0]:       const VkBufferMemoryBarrier = 0x7fffeabcdd00:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER (44)
            pNext:                          const void* = NULL
            srcAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            dstAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            srcQueueFamilyIndex:            uint32_t = 4294967295
            dstQueueFamilyIndex:            uint32_t = 4294967295
            buffer:                         VkBuffer = 0x55a1a80f6fc0
            offset:                         VkDeviceSize = 0
            size:                           VkDeviceSize = 18446744073709551615
    imageMemoryBarrierCount:        uint32_t = 0
    pImageMemoryBarriers:           const VkImageMemoryBarrier* = NULL


with this change the test fails intermittently on Anv.
Comment 8 Lionel Landwerlin 2018-11-29 14:44:00 UTC
(In reply to Jason Ekstrand from comment #6)
> The issue is most likely because we accumulate barriers and delay actually
> applying the them until the next draw or copy command.  Should be an easy
> fix.

A call to genX(cmd_buffer_apply_pipe_flushes) seems to fix it for me.
Comment 9 Jason Ekstrand 2018-11-29 14:56:17 UTC
> A call to genX(cmd_buffer_apply_pipe_flushes) seems to fix it for me.

I think we need to do a bit more than just that because MI writes aren't pipelined.  Probably something like:

/* Because MI writes aren't pipelined, we need a CS stall */
if (flushes & ANV_PIPE_FLUSH_BITS)
   flushes |= ANV_PIPE_CS_STALL_BIT;
genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
Comment 10 Lionel Landwerlin 2018-11-29 22:18:32 UTC
Should be fixed in :

commit 37f9788e9a8e443772b5ad6f339567e6ae6a8320
Author: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Date:   Thu Nov 29 13:02:03 2018 +0000

    anv: flush pipeline before query result copies
Comment 11 Józef Kucia 2018-11-30 09:45:11 UTC
*** Bug 108909 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.