In this example output, we fail to evict pages. [ 284.937397] CPU: 2 PID: 2113 Comm: RenderThread 2 Tainted: G E 4.3.0-0.rc7.git2.2.fc23.x86_64+debug+ #1 [ 284.937916] Hardware name: Dell Inc. Precision M6800/05NG6V, BIOS A15 09/29/2015 [ 284.938277] 0000000000000000 00000000a3aa6e6e ffff88078b2af350 ffffffff813a496f [ 284.938699] ffff880523ce68a0 ffff88078b2af370 ffffffffa01d6132 ffff88003f9c4738 [ 284.939093] 0000000000000002 ffff88078b2af3b0 ffffffffa013e5f4 0000000000000000 [ 284.939563] Call Trace: [ 284.939740] [<ffffffff813a496f>] dump_stack+0x44/0x55 [ 284.939999] [<ffffffffa01d6132>] radeon_ttm_io_mem_reserve+0xd2/0x100 [radeon] [ 284.940417] [<ffffffffa013e5f4>] ttm_mem_io_reserve+0x64/0x110 [ttm] [ 284.940797] [<ffffffffa013eb53>] ttm_mem_reg_ioremap+0x53/0x140 [ttm] [ 284.941111] [<ffffffffa013f0b0>] ttm_bo_move_memcpy+0xe0/0x680 [ttm] [ 284.941481] [<ffffffffa01d69f0>] radeon_bo_move+0x190/0x200 [radeon] [ 284.941893] [<ffffffffa013cd62>] ttm_bo_handle_move_mem+0x2c2/0x530 [ttm] [ 284.942262] [<ffffffffa013d537>] ? ttm_bo_mem_space+0x137/0x3b0 [ttm] [ 284.942617] [<ffffffffa013d121>] ttm_bo_evict+0x151/0x220 [ttm] [ 284.942952] [<ffffffffa013d388>] ttm_mem_evict_first+0x198/0x210 [ttm] [ 284.943308] [<ffffffffa013d6fa>] ttm_bo_mem_space+0x2fa/0x3b0 [ttm] [ 284.943715] [<ffffffffa013dbd9>] ttm_bo_validate+0x199/0x210 [ttm] [ 284.944016] [<ffffffffa0141998>] ? ttm_eu_reserve_buffers+0x168/0x300 [ttm] [ 284.944428] [<ffffffffa01d88ec>] radeon_bo_list_validate+0xcc/0x210 [radeon] [ 284.944809] [<ffffffffa01ee6c3>] radeon_cs_parser_relocs+0x393/0x460 [radeon] [ 284.945181] [<ffffffffa01ef049>] radeon_cs_ioctl+0x269/0x780 [radeon] [ 284.945532] [<ffffffffa00d3408>] drm_ioctl+0x138/0x500 [drm] [ 284.945944] [<ffffffffa01eede0>] ? radeon_cs_parser_init+0x490/0x490 [radeon] [ 284.946305] [<ffffffff811d7aae>] ? handle_mm_fault+0xb6e/0x1840 [ 284.946674] [<ffffffff8177e96e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [ 284.946999] [<ffffffffa01b904c>] radeon_drm_ioctl+0x4c/0x80 [radeon] [ 284.947380] [<ffffffff81235435>] do_vfs_ioctl+0x295/0x470 [ 284.947741] [<ffffffff81065104>] ? __do_page_fault+0x1b4/0x400 [ 284.948024] [<ffffffff81235689>] SyS_ioctl+0x79/0x90 [ 284.948294] [<ffffffff8177eeee>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 284.948635] radeon_ttm_io_mem_reserve: Check if it's bus.offset + bus.size greater than BAR SIZE: is 6a977000 > 10000000? [ 284.949261] [TTM] in ttm_bo_handle_move_mem(): Failing! - OTHER, from ttm_bo_move_memcpy() return [ 284.949796] [TTM] Buffer eviction failed [ 284.950004] [TTM] No space for ffff880523ce6868 (1367 pages, 5468K, 5M) [ 284.950422] [TTM] placement[0]=0x00060002 (1) [ 284.950796] [TTM] has_type: 1 [ 284.950958] [TTM] use_type: 1 [ 284.951130] [TTM] flags: 0x0000000A [ 284.951333] [TTM] gpu_offset: 0x80000000 [ 284.951709] [TTM] size: 2097152 [ 284.951892] [TTM] available_caching: 0x00070000 [ 284.952141] [TTM] default_caching: 0x00010000 [ 284.953945] [TTM] placement[1]=0x00060001 (0) [ 284.954193] [TTM] has_type: 1 [ 284.954359] [TTM] use_type: 1 [ 284.954593] [TTM] flags: 0x00000002 [ 284.954780] [TTM] gpu_offset: 0x00000000 [ 284.954986] [TTM] size: 0 [ 284.955160] [TTM] available_caching: 0x00070000 [ 284.955421] [TTM] default_caching: 0x00010000 In discussions on IRC, a current workaround in radeonsi DRI is this: --- r600_buffer_common.c 2015-11-02 01:56:10.796446185 -0500 +++ r600_buffer_common.c.workaround 2015-11-01 21:16:55.398517539 -0500 @@ -133,7 +133,7 @@ bool r600_init_resource(struct r600_comm case PIPE_USAGE_IMMUTABLE: default: /* Not listing GTT here improves performance in some apps. */ - res->domains = RADEON_DOMAIN_VRAM; + res->domains = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT; flags |= RADEON_FLAG_GTT_WC; break; } @@ -158,7 +158,7 @@ bool r600_init_resource(struct r600_comm /* Tiled textures are unmappable. Always put them in VRAM. */ if (res->b.b.target != PIPE_BUFFER && rtex->surface.level[0].mode >= RADEON_SURF_MODE_1D) { - res->domains = RADEON_DOMAIN_VRAM; + res->domains = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT; flags &= ~RADEON_FLAG_CPU_ACCESS; flags |= RADEON_FLAG_NO_CPU_ACCESS; } One solution discussed is to split up the transfer into smaller chunks in radeon_ttm.
(In reply to Shawn Starr from comment #0) > One solution discussed is to split up the transfer into smaller chunks in > radeon_ttm. Specifically, here's how I think a fallback could be implemented in the kernel driver which can never fail because of fragmentation or resource starvation: During initialization, reserve some pinned GTT memory for bounce buffers. When a BO can't be bound to GTT for eviction as in the case reported here, instead do the eviction directly from VRAM to CPU domain in one or several passes of: 1. Copy part of the BO from VRAM to one of the reserved bounce buffers in GTT using the GPU. 2. Copy that part of the BO from the bounce buffer to the BO's system RAM pages using the CPU.
(In reply to Michel Dänzer from comment #1) > (In reply to Shawn Starr from comment #0) > > One solution discussed is to split up the transfer into smaller chunks in > > radeon_ttm. > > Specifically, here's how I think a fallback could be implemented in the > kernel driver which can never fail because of fragmentation or resource > starvation: > > During initialization, reserve some pinned GTT memory for bounce buffers. > When a BO can't be bound to GTT for eviction as in the case reported here, > instead do the eviction directly from VRAM to CPU domain in one or several > passes of: > 1. Copy part of the BO from VRAM to one of the reserved bounce buffers in > GTT using the GPU. > 2. Copy that part of the BO from the bounce buffer to the BO's system RAM > pages using the CPU. if we can split a large BO to two parts, one is in VRAM, one is in GTT, seems also to be helpful for this case.
Actually it doesn't need to be so complicated. Just take a look at amdgpu|radeon_move_vram_ram(). Instead of trying to reallocate and binding everything at once we just need to bind the already allocate new_mem pages page by page and copy page by page.
I believe this can be closed, given the massive changes in amdgpu. I haven't seen issues anymore.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.