I've had crashes that I was unable diagnose on the same application (a custom one that I wrote) on mesa 18, but I never debug symbols or anything helpful. I installed 19.0.0-rc2 and tried it again out of curiosity. It still crashes, but I get an assertion error at least: v4: ../mesa-19.0.0-rc2/src/amd/vulkan/radv_cmd_buffer.c:2765: radv_CmdBindDescriptorSets: Assertion `dyn_idx < dynamicOffsetCount' failed. Thread 8 "v4" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff6e58700 (LWP 9437)] 0x00007ffff7c23d7f in raise () from /usr/lib/libc.so.6 (gdb) bt #0 0x00007ffff7c23d7f in raise () from /usr/lib/libc.so.6 #1 0x00007ffff7c0e672 in abort () from /usr/lib/libc.so.6 #2 0x00007ffff7c0e548 in __assert_fail_base.cold.0 () from /usr/lib/libc.so.6 #3 0x00007ffff7c1c396 in __assert_fail () from /usr/lib/libc.so.6 #4 0x00007ffff49af3bf in ?? () from /usr/lib/libvulkan_radeon.so #5 0x00007ffff4173548 in ?? () from /usr/lib/libVkLayer_unique_objects.so #6 0x00007ffff40d936d in ?? () from /usr/lib/libVkLayer_unique_objects.so #7 0x00007fffd54770e9 in ?? () from /usr/lib/libVkLayer_core_validation.so #8 0x00007fffd50c6374 in ?? () from /usr/lib/libVkLayer_object_lifetimes.so #9 0x00007fffd50351b0 in ?? () from /usr/lib/libVkLayer_object_lifetimes.so #10 0x00007fffd4c34001 in ?? () from /usr/lib/libVkLayer_parameter_validation.so #11 0x00007fffd494f684 in ?? () from /usr/lib/libVkLayer_thread_safety.so #12 0x00007fffd48e5a43 in ?? () from /usr/lib/libVkLayer_thread_safety.so #13 0x000055555588990c in ash::vk::DeviceFnV1_0::cmd_bind_descriptor_sets () This assertion only triggers with standard validation layers turned on. When I re-run my app without them enabled, the backtrace is similar but no assertion is triggered. This may indicate some weird interaction with validation layers perhaps? Thread 2 "v4" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff7a5e700 (LWP 12868)] 0x00007ffff4bb01cd in ?? () from /usr/lib/libvulkan_radeon.so (gdb) bt #0 0x00007ffff4bb01cd in ?? () from /usr/lib/libvulkan_radeon.so #1 0x000055555588711c in ash::vk::DeviceFnV1_0::cmd_bind_descriptor_sets () Because this is a segfault and not an explicit assertion that failed, it may be that these are separate problems. Either way, I am not using any dynamic offsets and this code works on the proprietary AMD driver on Windows. Strangely, it does not work on amdvlk and in fact it puts my GPU in an infinite loop where I have to reboot the whole machine. It also works a Skylake iGPU using anv, but it hangs during shader execution for a different reason. Please advise on what data to extract for debugging purposes.
This is on 19.0.0-rc2 compiled with --buildtype=debug, but debug symbols are still missing? I'm not sure why that is the case.
Are you sure you use dynamic bindings correctly first? Can you share a link to your custom app?
I'm not sure what you mean, I don't use any of the *_DYNAMIC variants of DescriptorType. This is how I define the layout of 2nd Descriptor Set I'm trying to bind in the failing call: https://github.com/farnoy/renderer/blob/3d42799a1fa6f6881c86506d9f76670899f0f7a9/src/forward_renderer/renderer.rs#L113-L142 And this is how I bind it: https://github.com/farnoy/renderer/blob/3d42799a1fa6f6881c86506d9f76670899f0f7a9/src/forward_renderer/renderer.rs#L703-L710
If I try binding just the 2nd one, it does not crash and lets me submit that command buffer to the compute queue. The problem must be with the first set (called mvp_set in my app). It fails to bind in a graphics context as well. I will keep fiddling to see if I can change the definition until it binds successfully. Its (failing) layout is: vk::DescriptorSetLayoutBinding { binding: 0, descriptor_type: vk::DescriptorType::UNIFORM_BUFFER, descriptor_count: 1, stage_flags: vk::ShaderStageFlags::VERTEX | vk::ShaderStageFlags::COMPUTE, p_immutable_samplers: ptr::null(), }
So I've duplicated the binding of that set into two separate bindings, one for VERTEX stage, one for COMPUTE. And I updated them the same way: Thread 0, Frame 0: vkUpdateDescriptorSets(device, descriptorWriteCount, pDescriptorWrites, descriptorCopyCount, pDescriptorCopies) returns void: device: VkDevice = 0x5577e9626f10 descriptorWriteCount: uint32_t = 2 pDescriptorWrites: const VkWriteDescriptorSet* = 0x7ffdeda2aee0 pDescriptorWrites[0]: const VkWriteDescriptorSet = 0x7ffdeda2aee0: sType: VkStructureType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET (35) pNext: const void* = NULL dstSet: VkDescriptorSet = 0x5577e9626c00 dstBinding: uint32_t = 0 dstArrayElement: uint32_t = 0 descriptorCount: uint32_t = 1 descriptorType: VkDescriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER (6) pImageInfo: const VkDescriptorImageInfo* = UNUSED pBufferInfo: const VkDescriptorBufferInfo* = 0x7ffdeda2ad80 pBufferInfo[0]: const VkDescriptorBufferInfo = 0x7ffdeda2ad80: buffer: VkBuffer = 0x5577e9626b10 offset: VkDeviceSize = 0 range: VkDeviceSize = 262144 pTexelBufferView: const VkBufferView* = UNUSED pDescriptorWrites[1]: const VkWriteDescriptorSet = 0x7ffdeda2af20: sType: VkStructureType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET (35) pNext: const void* = NULL dstSet: VkDescriptorSet = 0x5577e9626c00 dstBinding: uint32_t = 1 dstArrayElement: uint32_t = 0 descriptorCount: uint32_t = 1 descriptorType: VkDescriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER (6) pImageInfo: const VkDescriptorImageInfo* = UNUSED pBufferInfo: const VkDescriptorBufferInfo* = 0x7ffdeda2ad80 pBufferInfo[0]: const VkDescriptorBufferInfo = 0x7ffdeda2ad80: buffer: VkBuffer = 0x5577e9626b10 offset: VkDeviceSize = 0 range: VkDeviceSize = 262144 pTexelBufferView: const VkBufferView* = UNUSED descriptorCopyCount: uint32_t = 0 pDescriptorCopies: const VkCopyDescriptorSet* = 0x5577e82ee320 Now I get a different assertion error, but only when using the api_dump layer (without other validation errors): v4: ../mesa-19.0.0-rc2/src/amd/vulkan/radv_cmd_buffer.c:2727: radv_bind_descriptor_set: Assertion `!(set->layout->flags & VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR)' failed. This really confuses me, because I am not using or enabling the push descriptor extension, how could this bit be set?
I've made some unrelated changes (the mvp_set and its layout is still the same), and even started using advanced descriptor indexing features with partially bound descriptor sets and suddenly it's working. Not with validation layers though, that still segfaults when recording commands.
If you explain me how to build your own app, I should be able to reproduce the problem myself and investigate.
Alright, so here goes: # glTF-Sample-Models submodule is pretty big $ git clone --recurse-submodules https://github.com/farnoy/renderer.git Install rustup from https://rustup.rs, linux distros usually package their own. Then you night a nightly version of the Rust compiler: $ rustup install nightly Then from the cloned repo: # to get the version that uses descriptor indexing and partially bound sets: $ git checkout a8be0be1c44cac83e1f24fc066a97ee7b82be516 $ rustup run nightly cargo run --release # this should work and render some helmets and a ground mesh # WSAD and mouse gets you around $ rustup run nightly cargo run --release --features validation # this enables lunarg validation layers which for me produce a SEGFAULT # to get the old version of my app from the initial report: $ git checkout 3d42799a1fa6f6881c86506d9f76670899f0f7a9 $ rustup run nightly cargo run --release --features validation # On on vulkan ICD loader 1.1.96, these caused the weird dynamicOffset assertions in radv, # on loader 1.1.99, these hang my GPU and I need to reset my machine Let me know if you have any questions, I also upgraded the kernel from 4.20.6 to 4.20.7 since the original report, but I think it's more likely the vulkan loader upgrade changed behavior of my program in version 3d42799a1fa6f6881c86506d9f76670899f0f7a9.
Do you still have a problem with binding offsets?
Nope, I made some unrelated changes and I'm not hitting this problem anymore.
I assume it was a bug on your side?
I don't think so, like I said I wasn't using dynamic offsets at all, maybe validation layers added something but I'm not sure anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.