Bug 107328

Summary: radeon_gart_table_vram_pin takes 473 ms during ACPI S3 resume
Product: DRI Reporter: Paul Menzel <pmenzel+bugs.freedesktop.org>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: CLOSED INVALID QA Contact:
Severity: normal    
Priority: medium    
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Screenshot from HTML output of `sleepgraph.py` with `devicefilter: radeon` and `maxdepth: 20` none

Description Paul Menzel 2018-07-22 07:45:52 UTC
Created attachment 140760 [details]
Screenshot from HTML output of `sleepgraph.py` with `devicefilter: radeon` and `maxdepth: 20`

On a ASRock E350M1 with Linux 4.18-rc5+, profiling ACPI S3 suspend and resume time with `sleepgraph.py` from pm-graph [1], the radeon module over half a second to resume, which is also visible adding `initcall_debug` to the command line.

> radeon @ 0000:00:01.0 {radeon} async_device (Total Suspend: 36.541 ms Total Resume: 687.797 ms)

evergreen_startup [radeon] (562.983 ms @ 403.615437)
→ radeon_gart_table_vram_pin [radeon] (473.376 ms @ 403.617537)

The function from `drivers/gpu/drm/radeon/radeon_gart.c` looks like below, and the problem is the for loop in the end.

/**
 * radeon_gart_table_vram_pin - pin gart page table in vram
 *
 * @rdev: radeon_device pointer
 *
 * Pin the GART page table in vram so it will not be moved
 * by the memory manager (pcie r4xx, r5xx+).  These asics require the
 * gart table to be in video memory.
 * Returns 0 for success, error for failure.
 */
int radeon_gart_table_vram_pin(struct radeon_device *rdev)
{
        uint64_t gpu_addr;
        int r;

        r = radeon_bo_reserve(rdev->gart.robj, false);
        if (unlikely(r != 0))
                return r;
        r = radeon_bo_pin(rdev->gart.robj,
                                RADEON_GEM_DOMAIN_VRAM, &gpu_addr);
        if (r) {
                radeon_bo_unreserve(rdev->gart.robj);
                return r;
        }
        r = radeon_bo_kmap(rdev->gart.robj, &rdev->gart.ptr);
        if (r)  
                radeon_bo_unpin(rdev->gart.robj);
        radeon_bo_unreserve(rdev->gart.robj);
        rdev->gart.table_addr = gpu_addr;

        if (!r) {
                int i;

                /* We might have dropped some GART table updates while it wasn't
                 * mapped, restore all entries
                 */
                for (i = 0; i < rdev->gart.num_gpu_pages; i++)
                        radeon_gart_set_page(rdev, i, rdev->gart.pages_entry[i]);
                mb();
                radeon_gart_tlb_flush(rdev);
        }

        return r;
}

Is there a way to get rid of the for loop? Some memset equivalent?

[1]: https://github.com/01org/pm-graph
Comment 1 Christian König 2018-07-22 18:00:38 UTC
(In reply to Paul Menzel from comment #0)
> Is there a way to get rid of the for loop? Some memset equivalent?

At least not easily. 

The hardware is not initialized yet, so we have to writeback the GART table using the CPU.

Depending on the size of the table (which in turn depends on installed system memory) it can easily take even more than a second to do this.
Comment 2 Paul Menzel 2018-07-24 09:26:58 UTC
Just as a follow-up, the long time seems to have been caused by ftrace.

With the patch below on top of drm-tip from this morning, it only takes 7 ms. 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index 1cef155cc933..f79a2f8b8d8e 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -175,10 +175,13 @@ int radeon_gart_table_vram_pin(struct radeon_device *rdev)
                /* We might have dropped some GART table updates while it wasn't
                 * mapped, restore all entries
                 */
+               DRM_INFO("GART: Restore entries: num cpu pages %u, num gpu pages %u\n",
+                               rdev->gart.num_cpu_pages, rdev->gart.num_gpu_pages);
                for (i = 0; i < rdev->gart.num_gpu_pages; i++)
                        radeon_gart_set_page(rdev, i, rdev->gart.pages_entry[i]);
                mb();
                radeon_gart_tlb_flush(rdev);
+               DRM_INFO("GART: Done restoring entries\n");
        }
 
        return r;
Comment 3 Christian König 2018-07-24 12:54:53 UTC
(In reply to Paul Menzel from comment #2)
> Just as a follow-up, the long time seems to have been caused by ftrace.
> 
> With the patch below on top of drm-tip from this morning, it only takes 7
> ms. 

Interesting. I wasn't aware that ftrace could have such negative side effects.

Anyway if somebody has a radeon in a box with halve a TB system memory he can probably live with a few ms resume time :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.