Created attachment 138110 [details]
My system is Linux archlinux 4.15.8-1-ARCH #1 SMP PREEMPT Sat Mar 10 00:00:33 UTC 2018 x86_64 GNU/Linux.
My gpu is intel hd graphics 630.
I installed latest mesa from git (18.1.0_devel.100894.fcf267ba08)
When recording command buffer I get error:
Program received signal SIGBUS, Bus error.
anv_state_stream_alloc (stream=stream@entry=0x55559dbf9dd8, size=64, alignment=alignment@entry=32) at vulkan/anv_allocator.c:913
913 VG_NOACCESS_WRITE(&sb->block, stream->block);
0 in anv_state_stream_alloc of vulkan/anv_allocator.c:913
1 in anv_cmd_buffer_alloc_dynamic_state of vulkan/anv_batch_chain.c:654
2 in anv_cmd_buffer_push_constants of vulkan/anv_cmd_buffer.c:729
3 in cmd_buffer_flush_push_constants of vulkan/genX_cmd_buffer.c:2420
4 in gen9_cmd_buffer_flush_state of vulkan/genX_cmd_buffer.c:2571
5 in gen9_CmdDrawIndexed of vulkan/genX_cmd_buffer.c:2709
6 in ?? of /usr/lib/libVkLayer_core_validation.so
7 in ?? of /usr/lib/libVkLayer_parameter_validation.so
8 in ?? of /usr/lib/libVkLayer_threading.so
9 in vkcmd_create_secondary_command_buffer of vkcmd.c:207
10 in vkcmd_create_secondary_command_buffer_for_inst of vkcmd.c:88
11 in scn_load_scene of scene.c:407
12 in create_scene of main.c:903
13 in main of main.c:583
I send push constants with matrix and color (80 bytes) for both stages. The limit is 128 bytes so it should be fine. I get no validation output from standard layer.
The function that creates secondary command buffer: https://pastebin.com/vN2WjA1W
I zipped vktrace file and put on dropbox because it is too big (40MB): https://www.dropbox.com/s/ko5cqlrb5baj2wy/trace.7z?dl=0
I recompiled mesa with --enable-debug and now I get assert failed which is more useful.
0 in raise of /usr/lib/libc.so.6
1 in abort of /usr/lib/libc.so.6
2 in __assert_fail_base of /usr/lib/libc.so.6
3 in __assert_fail of /usr/lib/libc.so.6
4 in anv_block_pool_expand_range of vulkan/anv_allocator.c:327
5 in anv_block_pool_grow of vulkan/anv_allocator.c:521
6 in anv_block_pool_alloc_new of vulkan/anv_allocator.c:564
7 in anv_block_pool_alloc of vulkan/anv_allocator.c:582
8 in anv_fixed_size_state_pool_alloc_new of vulkan/anv_allocator.c:655
9 in anv_state_pool_alloc_no_vg of vulkan/anv_allocator.c:772
10 in anv_state_stream_alloc of vulkan/anv_allocator.c:909
11 in anv_cmd_buffer_alloc_dynamic_state of vulkan/anv_batch_chain.c:654
12 in anv_cmd_buffer_push_constants of vulkan/anv_cmd_buffer.c:729
13 in cmd_buffer_flush_push_constants of vulkan/genX_cmd_buffer.c:2420
14 in gen9_cmd_buffer_flush_state of vulkan/genX_cmd_buffer.c:2571
15 in gen9_CmdDrawIndexed of vulkan/genX_cmd_buffer.c:2709
This assert fails:
assert(size - center_bo_offset <=
BLOCK_POOL_MEMFD_SIZE - BLOCK_POOL_MEMFD_CENTER);
Because size=1073741824 is more than 1GB and block pools are restricted by 1GB:
/* Block pools are backed by a fixed-size 1GB memfd */
#define BLOCK_POOL_MEMFD_SIZE (1ul << 30)
Still don't know how to fix it. Is it valid behavior? Why it's not handled by validation layers? Should I use a new command pool for next portion of objects?
So the root cause is the anv_allocator. Command buffer takes up a lot of maped memory: 2320+4096+16384 bytes for anv_cmd_buffer and surface state, dynamic state. Then allocator rounds this to be power of 2 - 32KB. So one command buffer takes up 32KB. And maximum amount of maped memory per device is 1GB. It's enough for ~32000 command buffers. After that mmap fails.
So default allocator was designed for small number of command buffers.
Workaround is to use one command buffer per frame (or per thread). Just record everything inside primary command buffer. It is worse for performance but it is better for mmap and it solved the problem.
(In reply to Vyacheslav from comment #4)
> Workaround is to use one command buffer per frame (or per thread). Just
> record everything inside primary command buffer. It is worse for performance
> but it is better for mmap and it solved the problem.
If you are using enough secondary command buffers to blow past the 1GB limit, then you are probably going to get garbage performance anyway. We have to do a full GPU on every vkCmdExecuteCommands so, while performance is probably fine if secondary command buffers are used sparingly, over-use of them can cause significant perf problems.
I think there still is something of a driver bug here in that we really should be throwing VK_ERROR_OUT_OF_DEVICE_MEMORY instead of asserting or getting you int a situation where you can get SIGBUS.