100761 – `radeon_ib_ring_tests` takes 0.5 s on resume

Bug 100761 - `radeon_ib_ring_tests` takes 0.5 s on resume

Summary: `radeon_ib_ring_tests` takes 0.5 s on resume

Status:	RESOLVED NOTABUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-04-23 08:45 UTC by Paul Menzel
Modified:	2017-05-05 06:54 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
HTML page generated by pm-graph (`sudo ./analyze_suspend.py -config config/suspend-callgraph.cfg -filter radeon`) (40.17 KB, text/html) 2017-04-23 08:45 UTC, Paul Menzel	no flags	Details
ftrace log (3.12 MB, text/plain) 2017-04-23 08:56 UTC, Paul Menzel	no flags	Details
View All

Description Paul Menzel 2017-04-23 08:45:54 UTC

Created attachment 130987 [details]
HTML page generated by pm-graph (`sudo ./analyze_suspend.py -config config/suspend-callgraph.cfg -filter radeon`)

The ASRock E350M1 has a Radeon HD 6310.

```
$ sudo lspci -s 0:01.0 -nn -v
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Wrestler [Radeon HD 6310] [1002:9802] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Wrestler [Radeon HD 6310] [1002:9802]
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Memory at e0000000 (32-bit, prefetchable) [size=256M]
	I/O ports at 2000 [size=256]
	Memory at f0100000 (32-bit, non-prefetchable) [size=256K]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Kernel driver in use: radeon
	Kernel modules: radeon
```

With Debian Sid/unstable with Linux 4.9.18 suspend and resume times are benchmarked with pm-graph [1], and the command below.

```
sudo ./analyze_suspend.py -config config/suspend-callgraph.cfg -filter radeon
```

It turns out that with 700 ms the radeon module takes the majority of the time during resume.

```
[ 1182.585217] rtc_cmos 00:00: System wakeup disabled by ACPI
[ 1182.589380] [drm] Found smc ucode version: 0x00010601
[ 1182.595454] sd 0:0:0:0: [sda] Starting disk
[ 1182.608583] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[ 1182.608873] radeon 0000:00:01.0: WB enabled
[ 1182.608880] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000018000c00 and cpu addr 0xffa80c00
[ 1182.608884] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000018000c0c and cpu addr 0xffa80c0c
[ 1182.609743] radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xf8e32118
[ 1182.611934] r8169 0000:03:00.0 enp3s0: link down
[ 1182.626087] [drm] ring test on 0 succeeded in 1 usecs
[ 1182.626096] [drm] ring test on 3 succeeded in 3 usecs
[ 1182.671894] [drm] ring test on 5 succeeded in 1 usecs
[ 1182.691820] [drm] UVD initialized successfully.
[ 1182.691935] [drm] ib test on ring 0 succeeded in 0 usecs
[ 1182.691977] [drm] ib test on ring 3 succeeded in 0 usecs
[ 1182.909916] ata5: SATA link down (SStatus 0 SControl 300)
[ 1182.909975] ata6: SATA link down (SStatus 0 SControl 300)
[ 1182.910021] ata3: SATA link down (SStatus 0 SControl 300)
[ 1182.910095] ata2: SATA link down (SStatus 0 SControl 300)
[ 1182.910133] ata4: SATA link down (SStatus 0 SControl 300)
[ 1183.070885] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1183.073953] ata1.00: configured for UDMA/133
[ 1183.219034] [drm] ib test on ring 5 succeeded
[ 1183.290348] PM: resume of devices complete after 705.896 msecs
```

Looking at the generated graph 540 ms is spent in `radeon_ib_ring_tests`, which contributes most of the time.

It’d be awesome to improve these times.


[1] https://github.com/01org/pm-graph

Comment 1 Paul Menzel 2017-04-23 08:56:27 UTC

Created attachment 130988 [details]
ftrace log

Comment 2 Christian König 2017-04-24 07:44:31 UTC

(In reply to Paul Menzel from comment #0)
> Looking at the generated graph 540 ms is spent in `radeon_ib_ring_tests`,
> which contributes most of the time.
> 
> It’d be awesome to improve these times.

And impossible as well. UVD hardware startup just takes about 400-500ms on the older boxes. No idea why.

Comment 3 Paul Menzel 2017-04-25 07:21:55 UTC

(In reply to Christian König from comment #2)
> (In reply to Paul Menzel from comment #0)
> > Looking at the generated graph 540 ms is spent in `radeon_ib_ring_tests`,
> > which contributes most of the time.
> > 
> > It’d be awesome to improve these times.
> 
> And impossible as well. UVD hardware startup just takes about 400-500ms on
> the older boxes. No idea why.

Interesting. What part is responsible for this? AtomBIOS?

Does the UVD hardware need to start up before displaying something on the screen, or could it be moved to the user space driver?

How long does it take on the new hardware?

Comment 4 Christian König 2017-04-25 08:25:40 UTC

(In reply to Paul Menzel from comment #3)
> Interesting. What part is responsible for this? AtomBIOS?

No, that is completely unrelated to AtomBIOS.

The PLL just needs awfully long to settle and booting the firmware doesn't necessary succeed on the first try sometimes.

If the later happens this can even take multiple seconds until everything is settled.

> Does the UVD hardware need to start up before displaying something on the
> screen, or could it be moved to the user space driver?

Yes, all blocks must be started correctly for power management to work because we have a lot of interactions between the different blocks of the hardware.

> How long does it take on the new hardware?

I honestly don't know in detail. On newer hardware you give a single command to the power management uC and it does everything needed in the background.

Comment 5 Christian König 2017-04-25 08:28:00 UTC

What I've forgot: One thing that could possible help would be an HDP flush directly before we wait for the fence.

Just in case the value written for the fence is stuck in the HDP (unlikely, but worth a try).

Comment 6 Paul Menzel 2017-05-05 06:54:08 UTC

(In reply to Christian König from comment #5)
> What I've forgot: One thing that could possible help would be an HDP flush
> directly before we wait for the fence.
> 
> Just in case the value written for the fence is stuck in the HDP (unlikely,
> but worth a try).

Sorry, I unfortunately don’t know how to write such a patch. It’d be awesome, if you provided a patch, that I could try.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.