Summary: | Raven: pci_pm_suspend takes over 1 second | ||
---|---|---|---|
Product: | DRI | Reporter: | Paul Menzel <pmenzel+bugs.freedesktop.org> |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | enhancement | ||
Priority: | medium | CC: | pmenzel+bugs.freedesktop.org |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Paul Menzel
2018-07-18 14:47:47 UTC
Created attachment 140699 [details] HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg -maxdepth=5` According to the trace, most of the time is spent in the functions below. > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620) > amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953) > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977) > amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369) There isn't much you can do here: > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620) This is evacuating the content of VRAM to RAM/disk to make sure we don't lose screen content while suspended. > amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953) Waiting for the evacuation to be completed. > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977) This is hardware teardown and rather interesting and the only point we could actually do something. Can you figure out what takes so long here? > amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369) Again evacuating VRAM which was locked before because the hardware was still using it. (In reply to Christian König from comment #2) > There isn't much you can do here: > > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620) > > This is evacuating the content of VRAM to RAM/disk to make sure we don't > lose screen content while suspended. I do not understand that. The integrated graphics device uses the system RAM as VRAM doesn’t it? So why does it have to be evicted at all? Also, I believe it’s 1 GB of VRAM. That means the speed would be 3 GB/s, where it should be much higher with DDR4 shouldn’t it? > > amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953) > > Waiting for the evacuation to be completed. > > > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977) > > This is hardware teardown and rather interesting and the only point we could > actually do something. Can you figure out what takes so long here? I’ll try to figure that out. > > amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369) > > Again evacuating VRAM which was locked before because the hardware was still > using it. (In reply to Paul Menzel from comment #3) > (In reply to Christian König from comment #2) > > There isn't much you can do here: > > > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620) > > > > This is evacuating the content of VRAM to RAM/disk to make sure we don't > > lose screen content while suspended. > > I do not understand that. The integrated graphics device uses the system RAM > as VRAM doesn’t it? So why does it have to be evicted at all? Also, I > believe it’s 1 GB of VRAM. That means the speed would be 3 GB/s, where it > should be much higher with DDR4 shouldn’t it? It's not a problem with S3 (suspend to ram), but it is for S4 (suspend to disk) because power will be lost and the vram carve out area is not managed by the OS. Currently S3 and S4 share the same code paths, so they would need to be reworked to handle S3 and S4 differently. As for the time it takes, depending on how much system memory is in use, some stuff may have to be swapped out to disk to make room for all of the buffers in vram, so a lot of shuffling may need to take place. Created attachment 140702 [details] [review] quick hack for S3 Here's a quick hack for S3, but it will need more work to not break S4 support as well. (In reply to Paul Menzel from comment #3) > (In reply to Christian König from comment #2) […] > > > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977) > > > > This is hardware teardown and rather interesting and the only point we could > > actually do something. Can you figure out what takes so long here? > > I’ll try to figure that out. I increased the maximum depth to 10, and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. Also, in all the function is called three times. static void gfx_v9_0_enter_rlc_safe_mode(struct amdgpu_device *adev) { uint32_t rlc_setting, data; unsigned i; if (adev->gfx.rlc.in_safe_mode) return; /* if RLC is not enabled, do nothing */ rlc_setting = RREG32_SOC15(GC, 0, mmRLC_CNTL); if (!(rlc_setting & RLC_CNTL__RLC_ENABLE_F32_MASK)) return; if (adev->cg_flags & (AMD_CG_SUPPORT_GFX_CGCG | AMD_CG_SUPPORT_GFX_MGCG | AMD_CG_SUPPORT_GFX_3D_CGCG)) { data = RLC_SAFE_MODE__CMD_MASK; data |= (1 << RLC_SAFE_MODE__MESSAGE__SHIFT); WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE, data); /* wait for RLC_SAFE_MODE */ for (i = 0; i < adev->usec_timeout; i++) { if (!REG_GET_FIELD(SOC15_REG_OFFSET(GC, 0, mmRLC_SAFE_MODE), RLC_SAFE_MODE, CMD)) break; udelay(1); } adev->gfx.rlc.in_safe_mode = true; } } (In reply to Paul Menzel from comment #6) > I increased the maximum depth to 10, Can you attach the resulting HTML output, or is it too large? > and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. Can you find out if the loop times out or not? (In reply to Michel Dänzer from comment #7) > (In reply to Paul Menzel from comment #6) > > I increased the maximum depth to 10, > > Can you attach the resulting HTML output, or is it too large? With a max-depth of 10 it’s too large, and even with 16 GB memory the browser takes ages to render it. I can try to trim the trace file. It should be possible to filter stuff, but I haven’t figured out how to pass the device. Though, you can easily try it yourself. ``` $ git clone https://github.com/01org/pm-graph $ cd pm-graph $ vim config/suspend-callgraph.cfg # change maxdepth to 10 $ sudo ./sleepgraph.py -config config/suspend-callgraph.py # wait $ ls -ltr # shows you the output directory ``` It normally easier to look at the trace file directly. > > and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. > > Can you find out if the loop times out or not? I’ll try. Created attachment 140713 [details]
Trimmed ftrace output
Here is the trimmed ftrace output.
Created attachment 140714 [details] [review] drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode Does this patch help? Created attachment 140734 [details] HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg` with filter for amdgpu (In reply to Michel Dänzer from comment #10) > Created attachment 140714 [details] [review] [review] > drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode > > Does this patch help? It looks like it. Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> > amdgpu @ 0000:38:00.0 {amdgpu} async_device (Total Suspend: 140.661 ms Total Resume: 1339.330 ms) (Filtering with amdgpu works now, so I upload the HTML file.) Now the remaining hot spots are listed below. amdgpu_device_ip_suspend [amdgpu] (102.888 ms @ 534.324902) → … amdgpu_vcn_suspend [amdgpu] (23.804 ms @ 534.325065) → … drm_atomic_helper_suspend [drm_kms_helper] (53.051 ms @ 534.349412) → __drm_atomic_helper_disable_all.constprop.28 [drm_kms_helper] (52.919 ms @ 534.349543) → … psp_v10_0_ring_stop [amdgpu] (19.894 ms @ 534.406785) […] amdgpu_bo_evict_vram [amdgpu] (17.507 ms @ 534.427790) → fixed by *quick hack for S3* […] pci_set_power_state (20.071 ms @ 534.445402) (In reply to Paul Menzel from comment #11) > > Does this patch help? > > It looks like it. > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Cool, thanks for testing. Looks like that cut down suspend time by ~600 ms. Would that and a solution for VRAM eviction (reducing suspend time to 140 ms) be enough to resolve this report? (In reply to Michel Dänzer from comment #12) > (In reply to Paul Menzel from comment #11) > > > Does this patch help? > > > > It looks like it. > > > > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> > > Cool, thanks for testing. Looks like that cut down suspend time by ~600 ms. > Would that and a solution for VRAM eviction (reducing suspend time to 140 > ms) be enough to resolve this report? Yes, that would be enough. For the remaining optimization possibilities separate issues/tickets should be created. I can also create a new issue/ticket for the VRAM eviction issue, if you like. Then this issue/ticket can be closed directly after your commit is submitted. For the record, bug #100941 [1] is the same for radeon (like Fusion devices). [1]: https://bugs.freedesktop.org/show_bug.cgi?id=100941 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/454. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.