Bug 65922

Summary:

[HSW IVB bisected] i915 module resume time regression

Product:

DRI

Reporter:

cancan,feng <cancan.feng>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED WONTFIX

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

major

Priority:

high

CC:

cancan.feng, yangweix.shui

Version:

unspecified

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
dmesg of first bad commit	none
dmesg of good commit	none

Description cancan,feng 2013-06-19 08:07:14 UTC

System Environment:
--------------------------
Platform: Haswell ULT, Haswell desktop, Haswell mobile
Kernel: (drm-intel-next-queued)04c23c6203954cc38a34c86b4dd0686a1b1a53de
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Wed Jun 12 17:27:30 2013 -0300
  
Bug detailed description:
-------------------------
I did suspend-resume cycle of S3 on latest drm-intel-next-queued kernel, and found that resume time slower than expectation. This happens on all HSW platforms, and I have bisected it on HSW desktop, and it's applicable for the other two platforms. 

Bisect info:
-------------------------
24576d23976746cb52e7700c4cadbf4bc1bc3472 is the first bad commit

commit 24576d23976746cb52e7700c4cadbf4bc1bc3472
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Mar 26 09:25:45 2013 -0700

    drm/i915: enable VT switchless resume v3

    With the other bits in place, we can do this safely.

    v2: disable backlight on suspend to prevent premature enablement on resume
    v3: disable CRTCs on suspend to allow RTD3 (Kristen)

    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


Following table shows the bad value(from first bad commit) compared with good value(from first-bad-commit's father commit) of HSW desktop:

resume time:
-------------------------------------
Display ports	eDP	           HDMI	              DP
Bad	        710.256msecs       262.942msecs       190.676msecs 
Good     	52.750msecs	   56.695msecs	      67.484msecs

Comment 1 Chris Wilson 2013-06-19 08:15:37 UTC

Rotflmao. Jesse!

Can you please attach a drm.debug=6 of both the good and bad resume so that we can see what is going on?

Comment 2 cancan,feng 2013-06-19 08:47:25 UTC

Created attachment 81054 [details]
dmesg of first bad commit

Comment 3 cancan,feng 2013-06-19 08:48:51 UTC

Created attachment 81055 [details]
dmesg of good commit

Comment 4 Chris Wilson 2013-06-19 09:40:34 UTC

During resume:

good -> [   24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs
bad  -> [   31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs

So switchless is indeed doing its job in speeding up resume. So I guess we need to look at what happens after the kernel finishes its resume. How are you measuring the resume time?

Comment 5 Chris Wilson 2013-06-19 09:51:07 UTC

ALSA snd-intel resume time is 5s.

Despite being switchless, we spend a good 1s after resuming doing the hotplug dance to reconfigure the fbcon. I hope that is sped up by the hotplug encoder work, so would be good to also see the current state of affairs in -nightly.

But, I see nothing in that log that correlates with your measurement - so please do describe your experimental procedure.

Comment 6 Gordon Jin 2013-06-19 13:34:19 UTC

(In reply to comment #4)
> During resume:
> 
> good -> [   24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs
> bad  -> [   31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs

it looks like you make it opposite?

> So switchless is indeed doing its job in speeding up resume. So I guess we
> need to look at what happens after the kernel finishes its resume. How are
> you measuring the resume time?

Documented in QA wiki "HowtoGetResponseTime". But I think Cancan should put details here.

Comment 7 Chris Wilson 2013-06-19 15:53:18 UTC

Aye, the resume time for the device itself is dramatically improved by that patch. But that is not the whole story if userspace measures a long resume process.

Comment 8 cancan,feng 2013-06-20 01:48:02 UTC

(In reply to comment #4)
> During resume:
> 
> good -> [   24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs
> bad  -> [   31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs
> 
> So switchless is indeed doing its job in speeding up resume. So I guess we
> need to look at what happens after the kernel finishes its resume. How are
> you measuring the resume time?

1. After booting a machine, I will first do a S3 suspend-resume cycle, then 
dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" 
We will get some message as following(Here I used attachment good_commit.dmesg, dmesg of good commit):
[    1.381662] calling  i915_init+0x0/0x68 [i915] @ 1249
[    1.826391] initcall i915_init+0x0/0x68 [i915] returned 0 after 434521 usecs
[   30.819536] calling  0000:00:02.0+ @ 3917, parent: pci0000:00
[   30.860073] call 0000:00:02.0+ returned 0 after 39605 usecs
[   30.936632] PM: suspend of devices complete after 124.805 msecs
[   30.936957] PM: late suspend of devices complete after 0.322 msecs
[   30.981010] calling  0000:00:02.0+ @ 3841, parent: pci0000:00
[   30.981011] call 0000:00:02.0+ returned 0 after 0 usecs
[   30.981071] PM: noirq suspend of devices complete after 44.137 msecs
[   31.199416] calling  0000:00:02.0+ @ 3841, parent: pci0000:00
[   31.210059] call 0000:00:02.0+ returned 0 after 10398 usecs
[   31.298228] PM: noirq resume of devices complete after 98.936 msecs
[   31.298385] PM: early resume of devices complete after 0.129 msecs
[   31.298404] calling  0000:00:02.0+ @ 3914, parent: pci0000:00
[   31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs
[   36.313220] PM: resume of devices complete after 5017.604 msecs

So, we will get resume time: 10398 usecs + 38479 usecs = 48877 usecs = 48.877 msecs

2. For the bad_commit.dmesg,
[    1.871578] initcall i915_init+0x0/0x68 [i915] returned 0 after 478124 usecs
[   24.050262] calling  0000:00:02.0+ @ 3861, parent: pci0000:00
[   24.080930] call 0000:00:02.0+ returned 0 after 29964 usecs
[   24.167496] PM: suspend of devices complete after 123.843 msecs
[   24.167819] PM: late suspend of devices complete after 0.321 msecs
[   24.211848] calling  0000:00:02.0+ @ 3753, parent: pci0000:00
[   24.211849] call 0000:00:02.0+ returned 0 after 0 usecs
[   24.211909] PM: noirq suspend of devices complete after 44.112 msecs
[   24.431277] calling  0000:00:02.0+ @ 3753, parent: pci0000:00
[   24.441924] call 0000:00:02.0+ returned 0 after 10402 usecs
[   24.530095] PM: noirq resume of devices complete after 98.940 msecs
[   24.530252] PM: early resume of devices complete after 0.129 msecs
[   24.530276] calling  0000:00:02.0+ @ 25, parent: pci0000:00
[   24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs
[   29.546061] PM: resume of devices complete after 5018.598 msecs

resume time is: 10402 usecs + 253503 usecs = 263905 usecs = 263.905 msecs

Comment 9 cancan,feng 2013-06-20 01:51:47 UTC

Now, I find our IVB machines also have this issue, so I will change the subject which will be more reasonable...

Comment 10 cancan,feng 2013-06-20 02:08:39 UTC

(In reply to comment #5)
> ALSA snd-intel resume time is 5s.
> 
> Despite being switchless, we spend a good 1s after resuming doing the
> hotplug dance to reconfigure the fbcon. I hope that is sped up by the
> hotplug encoder work, so would be good to also see the current state of
> affairs in -nightly.
> 
> But, I see nothing in that log that correlates with your measurement - so
> please do describe your experimental procedure.

My experimental procedure:
1. plug in a HDMI (or eDP, or DP), the dmesg which I attached used HDMI
2. reboot the machine with kernel command: initcall_debug drm.debug=0xe 
3. do a suspend-resume cycle of S3
4. dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" 

Then, we can calculate the resume time.

Comment 11 Chris Wilson 2013-06-20 07:09:49 UTC

I had my good/bad logs reversed. This just seems to be a pipe_config reconstruction issue along with the usual slow modesetting sequence due to waiting on inactive vblanks. The extra time is mostly due to the 4 missed vblanks during the modeset.

Comment 12 Daniel Vetter 2013-06-24 16:51:51 UTC

VT-switchless resume simply moved the modeset time from fbcon into the driver. Of course we then still suck down the fail which is called fbcon output probing :(

In any case this isn't a regression (and so are any other resume time regressions bisected to this commit) but just an artifact of how the code worked previously and that we didn't actually measure the full gfx resume sequence properly.

Speeding up modeset to fit into the envelope Jesse set will require Real Work which needs to be properly tracked. Hence I'll close this.

Still, please keep a close eye on these numbers, and if they regress by a significant amount (anything above +5ms) please report these regressions.

Comment 13 Elizabeth 2017-10-06 14:45:46 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.