Bug 107277 - Raven: pci_pm_suspend takes over 1 second
Summary: Raven: pci_pm_suspend takes over 1 second
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium enhancement
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-18 14:47 UTC by Paul Menzel
Modified: 2018-07-31 16:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Linux 4.18-rc5+ messages (63.13 KB, text/plain)
2018-07-18 14:47 UTC, Paul Menzel
no flags Details
HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg -maxdepth=5` (3.99 MB, text/html)
2018-07-18 14:50 UTC, Paul Menzel
no flags Details
quick hack for S3 (1.32 KB, patch)
2018-07-18 15:44 UTC, Alex Deucher
no flags Details | Splinter Review
Trimmed ftrace output (608.31 KB, text/plain)
2018-07-19 15:56 UTC, Paul Menzel
no flags Details
drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode (1.15 KB, patch)
2018-07-19 16:35 UTC, Michel Dänzer
no flags Details | Splinter Review
HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg` with filter for amdgpu (157.10 KB, text/html)
2018-07-20 15:28 UTC, Paul Menzel
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Menzel 2018-07-18 14:47:47 UTC
Created attachment 140698 [details]
Linux 4.18-rc5+ messages

Profiling the suspend and resume time [1] on a MSI B350M MORTAR (MS-7A37) with AMD Ryzen 3 2200G with Radeon Vega Graphics with Linux 4.18-rc5+, `pci_pm_suspend()` takes over one second, which is too long.

[   74.694600] amdgpu 0000:38:00.0: calling pci_pm_suspend+0x0/0x120 @ 1664, parent: 0000:00:08.1
[   74.700292] sd 8:0:0:0: scsi_bus_suspend+0x0/0x20 [scsi_mod] returned 0 after 25514 usecs
[   74.700326] scsi target8:0:0: calling scsi_bus_suspend+0x0/0x20 [scsi_mod] @ 175, parent: host8
[   74.700346] scsi target8:0:0: scsi_bus_suspend+0x0/0x20 [scsi_mod] returned 0 after 6 usecs
[   74.700382] scsi host8: calling scsi_bus_suspend+0x0/0x20 [scsi_mod] @ 181, parent: ata9
[   74.700404] scsi host8: scsi_bus_suspend+0x0/0x20 [scsi_mod] returned 0 after 6 usecs
[   74.700439]  ata9: calling ata_port_pm_suspend+0x0/0x40 [libata] @ 173, parent: 0000:39:00.0
[   74.700522]  ata9: ata_port_pm_suspend+0x0/0x40 [libata] returned 0 after 67 usecs
[   74.700544] ahci 0000:39:00.0: calling pci_pm_suspend+0x0/0x120 @ 7, parent: 0000:00:08.2
[   74.700565] ahci 0000:39:00.0: pci_pm_suspend+0x0/0x120 returned 0 after 14 usecs
[   74.700588] pcieport 0000:00:08.2: calling pci_pm_suspend+0x0/0x120 @ 1666, parent: pci0000:00
[   74.700620] pcieport 0000:00:08.2: pci_pm_suspend+0x0/0x120 returned 0 after 25 usecs
[   74.712063] usb 1-10: usb_dev_suspend+0x0/0x10 [usbcore] returned 0 after 36874 usecs
[   74.712092] usb usb1: calling usb_dev_suspend+0x0/0x10 [usbcore] @ 174, parent: 0000:15:00.0
[   74.712156] usb usb1: usb_dev_suspend+0x0/0x10 [usbcore] returned 0 after 51 usecs
[   74.712173] xhci_hcd 0000:15:00.0: calling pci_pm_suspend+0x0/0x120 @ 1669, parent: 0000:00:01.2
[   74.714213] xhci_hcd 0000:15:00.0: pci_pm_suspend+0x0/0x120 returned 0 after 1985 usecs
[   74.714233] pcieport 0000:00:01.2: calling pci_pm_suspend+0x0/0x120 @ 1671, parent: pci0000:00
[   74.714248] pcieport 0000:00:01.2: pci_pm_suspend+0x0/0x120 returned 0 after 11 usecs
[   75.670011] amdgpu 0000:38:00.0: 000000007bb4b3e1 unpin not necessary
[   75.736060] amdgpu 0000:38:00.0: pci_pm_suspend+0x0/0x120 returned 0 after 1017036 usecs

[1]: https://github.com/01org/pm-graph/
Comment 1 Paul Menzel 2018-07-18 14:50:48 UTC
Created attachment 140699 [details]
HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg -maxdepth=5`

According to the trace, most of the time is spent in the functions below.

> amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620)
> amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953)
> amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977)
> amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369)
Comment 2 Christian König 2018-07-18 15:05:31 UTC
There isn't much you can do here:
> amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620)

This is evacuating the content of VRAM to RAM/disk to make sure we don't lose screen content while suspended.

> amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953)

Waiting for the evacuation to be completed.

> amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977)

This is hardware teardown and rather interesting and the only point we could actually do something. Can you figure out what takes so long here?

> amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369)

Again evacuating VRAM which was locked before because the hardware was still using it.
Comment 3 Paul Menzel 2018-07-18 15:19:21 UTC
(In reply to Christian König from comment #2)
> There isn't much you can do here:
> > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620)
> 
> This is evacuating the content of VRAM to RAM/disk to make sure we don't
> lose screen content while suspended.

I do not understand that. The integrated graphics device uses the system RAM  as VRAM doesn’t it? So why does it have to be evicted at all? Also, I believe it’s 1 GB of VRAM. That means the speed would be 3 GB/s, where it should be much higher with DDR4 shouldn’t it?

> > amdgpu_fence_driver_suspend [amdgpu] (0.023 ms @ 75.000953)
> 
> Waiting for the evacuation to be completed.
> 
> > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977)
> 
> This is hardware teardown and rather interesting and the only point we could
> actually do something. Can you figure out what takes so long here?

I’ll try to figure that out.

> > amdgpu_bo_evict_vram [amdgpu] (24.217 ms @ 75.695369)
> 
> Again evacuating VRAM which was locked before because the hardware was still
> using it.
Comment 4 Alex Deucher 2018-07-18 15:37:10 UTC
(In reply to Paul Menzel from comment #3)
> (In reply to Christian König from comment #2)
> > There isn't much you can do here:
> > > amdgpu_bo_evict_vram [amdgpu] (306.331 ms @ 74.694620)
> > 
> > This is evacuating the content of VRAM to RAM/disk to make sure we don't
> > lose screen content while suspended.
> 
> I do not understand that. The integrated graphics device uses the system RAM
> as VRAM doesn’t it? So why does it have to be evicted at all? Also, I
> believe it’s 1 GB of VRAM. That means the speed would be 3 GB/s, where it
> should be much higher with DDR4 shouldn’t it?

It's not a problem with S3 (suspend to ram), but it is for S4 (suspend to disk) because power will be lost and the vram carve out area is not managed by the OS.  Currently S3 and S4 share the same code paths, so they would need to be reworked to handle S3 and S4 differently.  As for the time it takes, depending on how much system memory is in use, some stuff may have to be swapped out to disk to make room for all of the buffers in vram, so a lot of shuffling may need to take place.
Comment 5 Alex Deucher 2018-07-18 15:44:04 UTC
Created attachment 140702 [details] [review]
quick hack for S3

Here's a quick hack for S3, but it will need more work to not break S4 support as well.
Comment 6 Paul Menzel 2018-07-18 17:17:21 UTC
(In reply to Paul Menzel from comment #3)
> (In reply to Christian König from comment #2)

[…]

> > > amdgpu_device_ip_suspend [amdgpu] (694.390 ms @ 75.000977)
> > 
> > This is hardware teardown and rather interesting and the only point we could
> > actually do something. Can you figure out what takes so long here?
> 
> I’ll try to figure that out.

I increased the maximum depth to 10, and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. Also, in all the function is called three times.

static void gfx_v9_0_enter_rlc_safe_mode(struct amdgpu_device *adev)
{
        uint32_t rlc_setting, data;
        unsigned i;

        if (adev->gfx.rlc.in_safe_mode)
                return;

        /* if RLC is not enabled, do nothing */
        rlc_setting = RREG32_SOC15(GC, 0, mmRLC_CNTL);
        if (!(rlc_setting & RLC_CNTL__RLC_ENABLE_F32_MASK))
                return;

        if (adev->cg_flags &
            (AMD_CG_SUPPORT_GFX_CGCG | AMD_CG_SUPPORT_GFX_MGCG |
             AMD_CG_SUPPORT_GFX_3D_CGCG)) {
                data = RLC_SAFE_MODE__CMD_MASK;
                data |= (1 << RLC_SAFE_MODE__MESSAGE__SHIFT);
                WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE, data);

                /* wait for RLC_SAFE_MODE */
                for (i = 0; i < adev->usec_timeout; i++) {
                        if (!REG_GET_FIELD(SOC15_REG_OFFSET(GC, 0, mmRLC_SAFE_MODE), RLC_SAFE_MODE, CMD))
                                break;
                        udelay(1);
                }
                adev->gfx.rlc.in_safe_mode = true;
        }
}
Comment 7 Michel Dänzer 2018-07-19 15:15:19 UTC
(In reply to Paul Menzel from comment #6)
> I increased the maximum depth to 10,

Can you attach the resulting HTML output, or is it too large?

> and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. 

Can you find out if the loop times out or not?
Comment 8 Paul Menzel 2018-07-19 15:46:19 UTC
(In reply to Michel Dänzer from comment #7)
> (In reply to Paul Menzel from comment #6)
> > I increased the maximum depth to 10,
> 
> Can you attach the resulting HTML output, or is it too large?

With a max-depth of 10 it’s too large, and even with 16 GB memory the browser takes ages to render it. I can try to trim the trace file. It should be possible to filter stuff, but I haven’t figured out how to pass the device.

Though, you can easily try it yourself.

```
$ git clone https://github.com/01org/pm-graph
$ cd pm-graph
$ vim config/suspend-callgraph.cfg # change maxdepth to 10
$ sudo ./sleepgraph.py -config config/suspend-callgraph.py
# wait
$ ls -ltr # shows you the output directory
```

It normally easier to look at the trace file directly.

> > and according to the trace the loop in `gfx_v9_0_enter_rlc_safe_mode()` is the culprit. 
> 
> Can you find out if the loop times out or not?

I’ll try.
Comment 9 Paul Menzel 2018-07-19 15:56:44 UTC
Created attachment 140713 [details]
Trimmed ftrace output

Here is the trimmed ftrace output.
Comment 10 Michel Dänzer 2018-07-19 16:35:39 UTC
Created attachment 140714 [details] [review]
drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode

Does this patch help?
Comment 11 Paul Menzel 2018-07-20 15:28:38 UTC
Created attachment 140734 [details]
HTML output of `sudo ./sleepgraph.py -config config/suspend-callgraph.cfg` with filter for amdgpu

(In reply to Michel Dänzer from comment #10)
> Created attachment 140714 [details] [review] [review]
> drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode
> 
> Does this patch help?

It looks like it.

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>

> amdgpu @ 0000:38:00.0 {amdgpu} async_device (Total Suspend: 140.661 ms Total Resume: 1339.330 ms)

(Filtering with amdgpu works now, so I upload the HTML file.)

Now the remaining hot spots are listed below.

amdgpu_device_ip_suspend [amdgpu] (102.888 ms @ 534.324902)
    → … amdgpu_vcn_suspend [amdgpu] (23.804 ms @ 534.325065)
    → … drm_atomic_helper_suspend [drm_kms_helper] (53.051 ms @ 534.349412)
        → __drm_atomic_helper_disable_all.constprop.28 [drm_kms_helper] (52.919 ms @ 534.349543)
    → … psp_v10_0_ring_stop [amdgpu] (19.894 ms @ 534.406785)
[…]
amdgpu_bo_evict_vram [amdgpu] (17.507 ms @ 534.427790) → fixed by *quick hack for S3*
[…]
pci_set_power_state (20.071 ms @ 534.445402)
Comment 12 Michel Dänzer 2018-07-20 16:30:49 UTC
(In reply to Paul Menzel from comment #11)
> > Does this patch help?
> 
> It looks like it.
> 
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>

Cool, thanks for testing. Looks like that cut down suspend time by ~600 ms. Would that and a solution for VRAM eviction (reducing suspend time to 140 ms) be enough to resolve this report?
Comment 13 Paul Menzel 2018-07-23 09:36:58 UTC
(In reply to Michel Dänzer from comment #12)
> (In reply to Paul Menzel from comment #11)
> > > Does this patch help?
> > 
> > It looks like it.
> > 
> > Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
> 
> Cool, thanks for testing. Looks like that cut down suspend time by ~600 ms.
> Would that and a solution for VRAM eviction (reducing suspend time to 140
> ms) be enough to resolve this report?

Yes, that would be enough. For the remaining optimization possibilities separate issues/tickets should be created.

I can also create a new issue/ticket for the VRAM eviction issue, if you like. Then this issue/ticket can be closed directly after your commit is submitted.
Comment 14 Paul Menzel 2018-07-31 16:43:13 UTC
For the record, bug #100941 [1] is the same for radeon (like Fusion devices).

[1]: https://bugs.freedesktop.org/show_bug.cgi?id=100941


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.