System Environment: -------------------------- Platform: Haswell ULT, Haswell desktop, Haswell mobile Kernel: (drm-intel-next-queued)04c23c6203954cc38a34c86b4dd0686a1b1a53de Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Wed Jun 12 17:27:30 2013 -0300 Bug detailed description: ------------------------- I did suspend-resume cycle of S3 on latest drm-intel-next-queued kernel, and found that resume time slower than expectation. This happens on all HSW platforms, and I have bisected it on HSW desktop, and it's applicable for the other two platforms. Bisect info: ------------------------- 24576d23976746cb52e7700c4cadbf4bc1bc3472 is the first bad commit commit 24576d23976746cb52e7700c4cadbf4bc1bc3472 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Tue Mar 26 09:25:45 2013 -0700 drm/i915: enable VT switchless resume v3 With the other bits in place, we can do this safely. v2: disable backlight on suspend to prevent premature enablement on resume v3: disable CRTCs on suspend to allow RTD3 (Kristen) Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Following table shows the bad value(from first bad commit) compared with good value(from first-bad-commit's father commit) of HSW desktop: resume time: ------------------------------------- Display ports eDP HDMI DP Bad 710.256msecs 262.942msecs 190.676msecs Good 52.750msecs 56.695msecs 67.484msecs
Rotflmao. Jesse! Can you please attach a drm.debug=6 of both the good and bad resume so that we can see what is going on?
Created attachment 81054 [details] dmesg of first bad commit
Created attachment 81055 [details] dmesg of good commit
During resume: good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs So switchless is indeed doing its job in speeding up resume. So I guess we need to look at what happens after the kernel finishes its resume. How are you measuring the resume time?
ALSA snd-intel resume time is 5s. Despite being switchless, we spend a good 1s after resuming doing the hotplug dance to reconfigure the fbcon. I hope that is sped up by the hotplug encoder work, so would be good to also see the current state of affairs in -nightly. But, I see nothing in that log that correlates with your measurement - so please do describe your experimental procedure.
(In reply to comment #4) > During resume: > > good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs > bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs it looks like you make it opposite? > So switchless is indeed doing its job in speeding up resume. So I guess we > need to look at what happens after the kernel finishes its resume. How are > you measuring the resume time? Documented in QA wiki "HowtoGetResponseTime". But I think Cancan should put details here.
Aye, the resume time for the device itself is dramatically improved by that patch. But that is not the whole story if userspace measures a long resume process.
(In reply to comment #4) > During resume: > > good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs > bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs > > So switchless is indeed doing its job in speeding up resume. So I guess we > need to look at what happens after the kernel finishes its resume. How are > you measuring the resume time? 1. After booting a machine, I will first do a S3 suspend-resume cycle, then dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" We will get some message as following(Here I used attachment good_commit.dmesg, dmesg of good commit): [ 1.381662] calling i915_init+0x0/0x68 [i915] @ 1249 [ 1.826391] initcall i915_init+0x0/0x68 [i915] returned 0 after 434521 usecs [ 30.819536] calling 0000:00:02.0+ @ 3917, parent: pci0000:00 [ 30.860073] call 0000:00:02.0+ returned 0 after 39605 usecs [ 30.936632] PM: suspend of devices complete after 124.805 msecs [ 30.936957] PM: late suspend of devices complete after 0.322 msecs [ 30.981010] calling 0000:00:02.0+ @ 3841, parent: pci0000:00 [ 30.981011] call 0000:00:02.0+ returned 0 after 0 usecs [ 30.981071] PM: noirq suspend of devices complete after 44.137 msecs [ 31.199416] calling 0000:00:02.0+ @ 3841, parent: pci0000:00 [ 31.210059] call 0000:00:02.0+ returned 0 after 10398 usecs [ 31.298228] PM: noirq resume of devices complete after 98.936 msecs [ 31.298385] PM: early resume of devices complete after 0.129 msecs [ 31.298404] calling 0000:00:02.0+ @ 3914, parent: pci0000:00 [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs [ 36.313220] PM: resume of devices complete after 5017.604 msecs So, we will get resume time: 10398 usecs + 38479 usecs = 48877 usecs = 48.877 msecs 2. For the bad_commit.dmesg, [ 1.871578] initcall i915_init+0x0/0x68 [i915] returned 0 after 478124 usecs [ 24.050262] calling 0000:00:02.0+ @ 3861, parent: pci0000:00 [ 24.080930] call 0000:00:02.0+ returned 0 after 29964 usecs [ 24.167496] PM: suspend of devices complete after 123.843 msecs [ 24.167819] PM: late suspend of devices complete after 0.321 msecs [ 24.211848] calling 0000:00:02.0+ @ 3753, parent: pci0000:00 [ 24.211849] call 0000:00:02.0+ returned 0 after 0 usecs [ 24.211909] PM: noirq suspend of devices complete after 44.112 msecs [ 24.431277] calling 0000:00:02.0+ @ 3753, parent: pci0000:00 [ 24.441924] call 0000:00:02.0+ returned 0 after 10402 usecs [ 24.530095] PM: noirq resume of devices complete after 98.940 msecs [ 24.530252] PM: early resume of devices complete after 0.129 msecs [ 24.530276] calling 0000:00:02.0+ @ 25, parent: pci0000:00 [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs [ 29.546061] PM: resume of devices complete after 5018.598 msecs resume time is: 10402 usecs + 253503 usecs = 263905 usecs = 263.905 msecs
Now, I find our IVB machines also have this issue, so I will change the subject which will be more reasonable...
(In reply to comment #5) > ALSA snd-intel resume time is 5s. > > Despite being switchless, we spend a good 1s after resuming doing the > hotplug dance to reconfigure the fbcon. I hope that is sped up by the > hotplug encoder work, so would be good to also see the current state of > affairs in -nightly. > > But, I see nothing in that log that correlates with your measurement - so > please do describe your experimental procedure. My experimental procedure: 1. plug in a HDMI (or eDP, or DP), the dmesg which I attached used HDMI 2. reboot the machine with kernel command: initcall_debug drm.debug=0xe 3. do a suspend-resume cycle of S3 4. dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" Then, we can calculate the resume time.
I had my good/bad logs reversed. This just seems to be a pipe_config reconstruction issue along with the usual slow modesetting sequence due to waiting on inactive vblanks. The extra time is mostly due to the 4 missed vblanks during the modeset.
VT-switchless resume simply moved the modeset time from fbcon into the driver. Of course we then still suck down the fail which is called fbcon output probing :( In any case this isn't a regression (and so are any other resume time regressions bisected to this commit) but just an artifact of how the code worked previously and that we didn't actually measure the full gfx resume sequence properly. Speeding up modeset to fit into the envelope Jesse set will require Real Work which needs to be properly tracked. Hence I'll close this. Still, please keep a close eye on these numbers, and if they regress by a significant amount (anything above +5ms) please report these regressions.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.