Bug 92775 - [radeon][TTM] Contention when evicting large buffers between VRAM and GTT
Summary: [radeon][TTM] Contention when evicting large buffers between VRAM and GTT
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-02 09:02 UTC by Shawn Starr
Modified: 2017-07-17 20:32 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Shawn Starr 2015-11-02 09:02:37 UTC
In this example output, we fail to evict pages.

[  284.937397] CPU: 2 PID: 2113 Comm: RenderThread 2 Tainted: G            E   4.3.0-0.rc7.git2.2.fc23.x86_64+debug+ #1
[  284.937916] Hardware name: Dell Inc. Precision M6800/05NG6V, BIOS A15 09/29/2015
[  284.938277]  0000000000000000 00000000a3aa6e6e ffff88078b2af350 ffffffff813a496f
[  284.938699]  ffff880523ce68a0 ffff88078b2af370 ffffffffa01d6132 ffff88003f9c4738
[  284.939093]  0000000000000002 ffff88078b2af3b0 ffffffffa013e5f4 0000000000000000
[  284.939563] Call Trace:
[  284.939740]  [<ffffffff813a496f>] dump_stack+0x44/0x55
[  284.939999]  [<ffffffffa01d6132>] radeon_ttm_io_mem_reserve+0xd2/0x100 [radeon]
[  284.940417]  [<ffffffffa013e5f4>] ttm_mem_io_reserve+0x64/0x110 [ttm]
[  284.940797]  [<ffffffffa013eb53>] ttm_mem_reg_ioremap+0x53/0x140 [ttm]
[  284.941111]  [<ffffffffa013f0b0>] ttm_bo_move_memcpy+0xe0/0x680 [ttm]
[  284.941481]  [<ffffffffa01d69f0>] radeon_bo_move+0x190/0x200 [radeon]
[  284.941893]  [<ffffffffa013cd62>] ttm_bo_handle_move_mem+0x2c2/0x530 [ttm]
[  284.942262]  [<ffffffffa013d537>] ? ttm_bo_mem_space+0x137/0x3b0 [ttm]
[  284.942617]  [<ffffffffa013d121>] ttm_bo_evict+0x151/0x220 [ttm]
[  284.942952]  [<ffffffffa013d388>] ttm_mem_evict_first+0x198/0x210 [ttm]
[  284.943308]  [<ffffffffa013d6fa>] ttm_bo_mem_space+0x2fa/0x3b0 [ttm]
[  284.943715]  [<ffffffffa013dbd9>] ttm_bo_validate+0x199/0x210 [ttm]
[  284.944016]  [<ffffffffa0141998>] ? ttm_eu_reserve_buffers+0x168/0x300 [ttm]
[  284.944428]  [<ffffffffa01d88ec>] radeon_bo_list_validate+0xcc/0x210 [radeon]
[  284.944809]  [<ffffffffa01ee6c3>] radeon_cs_parser_relocs+0x393/0x460 [radeon]
[  284.945181]  [<ffffffffa01ef049>] radeon_cs_ioctl+0x269/0x780 [radeon]
[  284.945532]  [<ffffffffa00d3408>] drm_ioctl+0x138/0x500 [drm]
[  284.945944]  [<ffffffffa01eede0>] ? radeon_cs_parser_init+0x490/0x490 [radeon]
[  284.946305]  [<ffffffff811d7aae>] ? handle_mm_fault+0xb6e/0x1840
[  284.946674]  [<ffffffff8177e96e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  284.946999]  [<ffffffffa01b904c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[  284.947380]  [<ffffffff81235435>] do_vfs_ioctl+0x295/0x470
[  284.947741]  [<ffffffff81065104>] ? __do_page_fault+0x1b4/0x400
[  284.948024]  [<ffffffff81235689>] SyS_ioctl+0x79/0x90
[  284.948294]  [<ffffffff8177eeee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  284.948635] radeon_ttm_io_mem_reserve: Check if it's bus.offset + bus.size greater than BAR SIZE: is 6a977000 > 10000000?
[  284.949261] [TTM] in ttm_bo_handle_move_mem(): Failing! - OTHER, from ttm_bo_move_memcpy() return
[  284.949796] [TTM] Buffer eviction failed
[  284.950004] [TTM] No space for ffff880523ce6868 (1367 pages, 5468K, 5M)
[  284.950422] [TTM]   placement[0]=0x00060002 (1)
[  284.950796] [TTM]     has_type: 1
[  284.950958] [TTM]     use_type: 1
[  284.951130] [TTM]     flags: 0x0000000A
[  284.951333] [TTM]     gpu_offset: 0x80000000
[  284.951709] [TTM]     size: 2097152
[  284.951892] [TTM]     available_caching: 0x00070000
[  284.952141] [TTM]     default_caching: 0x00010000
[  284.953945] [TTM]   placement[1]=0x00060001 (0)
[  284.954193] [TTM]     has_type: 1
[  284.954359] [TTM]     use_type: 1
[  284.954593] [TTM]     flags: 0x00000002
[  284.954780] [TTM]     gpu_offset: 0x00000000
[  284.954986] [TTM]     size: 0
[  284.955160] [TTM]     available_caching: 0x00070000
[  284.955421] [TTM]     default_caching: 0x00010000

In discussions on IRC, a current workaround in radeonsi DRI is this:

--- r600_buffer_common.c        2015-11-02 01:56:10.796446185 -0500
+++ r600_buffer_common.c.workaround     2015-11-01 21:16:55.398517539 -0500
@@ -133,7 +133,7 @@ bool r600_init_resource(struct r600_comm
        case PIPE_USAGE_IMMUTABLE:
        default:
                /* Not listing GTT here improves performance in some apps. */
-               res->domains = RADEON_DOMAIN_VRAM;
+               res->domains = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT;
                flags |= RADEON_FLAG_GTT_WC;
                break;
        }
@@ -158,7 +158,7 @@ bool r600_init_resource(struct r600_comm
        /* Tiled textures are unmappable. Always put them in VRAM. */
        if (res->b.b.target != PIPE_BUFFER &&
            rtex->surface.level[0].mode >= RADEON_SURF_MODE_1D) {
-               res->domains = RADEON_DOMAIN_VRAM;
+               res->domains = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT;
                flags &= ~RADEON_FLAG_CPU_ACCESS;
                flags |= RADEON_FLAG_NO_CPU_ACCESS;
        }

One solution discussed is to split up the transfer into smaller chunks in radeon_ttm.
Comment 1 Michel Dänzer 2015-12-25 07:41:41 UTC
(In reply to Shawn Starr from comment #0)
> One solution discussed is to split up the transfer into smaller chunks in
> radeon_ttm.

Specifically, here's how I think a fallback could be implemented in the kernel driver which can never fail because of fragmentation or resource starvation:

During initialization, reserve some pinned GTT memory for bounce buffers. When a BO can't be bound to GTT for eviction as in the case reported here, instead do the eviction directly from VRAM to CPU domain in one or several passes of:
1. Copy part of the BO from VRAM to one of the reserved bounce buffers in GTT using the GPU.
2. Copy that part of the BO from the bounce buffer to the BO's system RAM pages using the CPU.
Comment 2 david1.zhou@amd.com 2015-12-25 08:16:12 UTC
(In reply to Michel Dänzer from comment #1)
> (In reply to Shawn Starr from comment #0)
> > One solution discussed is to split up the transfer into smaller chunks in
> > radeon_ttm.
> 
> Specifically, here's how I think a fallback could be implemented in the
> kernel driver which can never fail because of fragmentation or resource
> starvation:
> 
> During initialization, reserve some pinned GTT memory for bounce buffers.
> When a BO can't be bound to GTT for eviction as in the case reported here,
> instead do the eviction directly from VRAM to CPU domain in one or several
> passes of:
> 1. Copy part of the BO from VRAM to one of the reserved bounce buffers in
> GTT using the GPU.
> 2. Copy that part of the BO from the bounce buffer to the BO's system RAM
> pages using the CPU.


if we can split a large BO to two parts, one is in VRAM, one is in GTT, seems also to be helpful for this case.
Comment 3 Christian König 2015-12-25 15:51:49 UTC
Actually it doesn't need to be so complicated. Just take a look at amdgpu|radeon_move_vram_ram().

Instead of trying to reallocate and binding everything at once we just need to bind the already allocate new_mem pages page by page and copy page by page.
Comment 4 Shawn Starr 2017-07-17 20:32:32 UTC
I believe this can be closed, given the massive changes in amdgpu. I haven't seen issues anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.