107328 – radeon_gart_table_vram_pin takes 473 ms during ACPI S3 resume

Bug 107328 - radeon_gart_table_vram_pin takes 473 ms during ACPI S3 resume

Summary: radeon_gart_table_vram_pin takes 473 ms during ACPI S3 resume

Status:	CLOSED INVALID

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-07-22 07:45 UTC by Paul Menzel
Modified:	2018-07-24 12:54 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Screenshot from HTML output of `sleepgraph.py` with `devicefilter: radeon` and `maxdepth: 20` (512.10 KB, application/x-7z-compressed) 2018-07-22 07:45 UTC, Paul Menzel	no flags	Details
View All

Description Paul Menzel 2018-07-22 07:45:52 UTC

Created attachment 140760 [details]
Screenshot from HTML output of `sleepgraph.py` with `devicefilter: radeon` and `maxdepth: 20`

On a ASRock E350M1 with Linux 4.18-rc5+, profiling ACPI S3 suspend and resume time with `sleepgraph.py` from pm-graph [1], the radeon module over half a second to resume, which is also visible adding `initcall_debug` to the command line.

> radeon @ 0000:00:01.0 {radeon} async_device (Total Suspend: 36.541 ms Total Resume: 687.797 ms)

evergreen_startup [radeon] (562.983 ms @ 403.615437)
→ radeon_gart_table_vram_pin [radeon] (473.376 ms @ 403.617537)

The function from `drivers/gpu/drm/radeon/radeon_gart.c` looks like below, and the problem is the for loop in the end.

/**
 * radeon_gart_table_vram_pin - pin gart page table in vram
 *
 * @rdev: radeon_device pointer
 *
 * Pin the GART page table in vram so it will not be moved
 * by the memory manager (pcie r4xx, r5xx+).  These asics require the
 * gart table to be in video memory.
 * Returns 0 for success, error for failure.
 */
int radeon_gart_table_vram_pin(struct radeon_device *rdev)
{
        uint64_t gpu_addr;
        int r;

        r = radeon_bo_reserve(rdev->gart.robj, false);
        if (unlikely(r != 0))
                return r;
        r = radeon_bo_pin(rdev->gart.robj,
                                RADEON_GEM_DOMAIN_VRAM, &gpu_addr);
        if (r) {
                radeon_bo_unreserve(rdev->gart.robj);
                return r;
        }
        r = radeon_bo_kmap(rdev->gart.robj, &rdev->gart.ptr);
        if (r)  
                radeon_bo_unpin(rdev->gart.robj);
        radeon_bo_unreserve(rdev->gart.robj);
        rdev->gart.table_addr = gpu_addr;

        if (!r) {
                int i;

                /* We might have dropped some GART table updates while it wasn't
                 * mapped, restore all entries
                 */
                for (i = 0; i < rdev->gart.num_gpu_pages; i++)
                        radeon_gart_set_page(rdev, i, rdev->gart.pages_entry[i]);
                mb();
                radeon_gart_tlb_flush(rdev);
        }

        return r;
}

Is there a way to get rid of the for loop? Some memset equivalent?

[1]: https://github.com/01org/pm-graph

Comment 1 Christian König 2018-07-22 18:00:38 UTC

(In reply to Paul Menzel from comment #0)
> Is there a way to get rid of the for loop? Some memset equivalent?

At least not easily. 

The hardware is not initialized yet, so we have to writeback the GART table using the CPU.

Depending on the size of the table (which in turn depends on installed system memory) it can easily take even more than a second to do this.

Comment 2 Paul Menzel 2018-07-24 09:26:58 UTC

Just as a follow-up, the long time seems to have been caused by ftrace.

With the patch below on top of drm-tip from this morning, it only takes 7 ms. 

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index 1cef155cc933..f79a2f8b8d8e 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -175,10 +175,13 @@ int radeon_gart_table_vram_pin(struct radeon_device *rdev)
                /* We might have dropped some GART table updates while it wasn't
                 * mapped, restore all entries
                 */
+               DRM_INFO("GART: Restore entries: num cpu pages %u, num gpu pages %u\n",
+                               rdev->gart.num_cpu_pages, rdev->gart.num_gpu_pages);
                for (i = 0; i < rdev->gart.num_gpu_pages; i++)
                        radeon_gart_set_page(rdev, i, rdev->gart.pages_entry[i]);
                mb();
                radeon_gart_tlb_flush(rdev);
+               DRM_INFO("GART: Done restoring entries\n");
        }
 
        return r;

Comment 3 Christian König 2018-07-24 12:54:53 UTC

(In reply to Paul Menzel from comment #2)
> Just as a follow-up, the long time seems to have been caused by ftrace.
> 
> With the patch below on top of drm-tip from this morning, it only takes 7
> ms. 

Interesting. I wasn't aware that ftrace could have such negative side effects.

Anyway if somebody has a radeon in a box with halve a TB system memory he can probably live with a few ms resume time :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.