Summary: | [HSW IVB bisected] i915 module resume time regression | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | cancan,feng <cancan.feng> | ||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | major | ||||||||
Priority: | high | CC: | cancan.feng, yangweix.shui | ||||||
Version: | unspecified | ||||||||
Hardware: | Other | ||||||||
OS: | All | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
cancan,feng
2013-06-19 08:07:14 UTC
Rotflmao. Jesse! Can you please attach a drm.debug=6 of both the good and bad resume so that we can see what is going on? Created attachment 81054 [details]
dmesg of first bad commit
Created attachment 81055 [details]
dmesg of good commit
During resume: good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs So switchless is indeed doing its job in speeding up resume. So I guess we need to look at what happens after the kernel finishes its resume. How are you measuring the resume time? ALSA snd-intel resume time is 5s. Despite being switchless, we spend a good 1s after resuming doing the hotplug dance to reconfigure the fbcon. I hope that is sped up by the hotplug encoder work, so would be good to also see the current state of affairs in -nightly. But, I see nothing in that log that correlates with your measurement - so please do describe your experimental procedure. (In reply to comment #4) > During resume: > > good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs > bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs it looks like you make it opposite? > So switchless is indeed doing its job in speeding up resume. So I guess we > need to look at what happens after the kernel finishes its resume. How are > you measuring the resume time? Documented in QA wiki "HowtoGetResponseTime". But I think Cancan should put details here. Aye, the resume time for the device itself is dramatically improved by that patch. But that is not the whole story if userspace measures a long resume process. (In reply to comment #4) > During resume: > > good -> [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs > bad -> [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs > > So switchless is indeed doing its job in speeding up resume. So I guess we > need to look at what happens after the kernel finishes its resume. How are > you measuring the resume time? 1. After booting a machine, I will first do a S3 suspend-resume cycle, then dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" We will get some message as following(Here I used attachment good_commit.dmesg, dmesg of good commit): [ 1.381662] calling i915_init+0x0/0x68 [i915] @ 1249 [ 1.826391] initcall i915_init+0x0/0x68 [i915] returned 0 after 434521 usecs [ 30.819536] calling 0000:00:02.0+ @ 3917, parent: pci0000:00 [ 30.860073] call 0000:00:02.0+ returned 0 after 39605 usecs [ 30.936632] PM: suspend of devices complete after 124.805 msecs [ 30.936957] PM: late suspend of devices complete after 0.322 msecs [ 30.981010] calling 0000:00:02.0+ @ 3841, parent: pci0000:00 [ 30.981011] call 0000:00:02.0+ returned 0 after 0 usecs [ 30.981071] PM: noirq suspend of devices complete after 44.137 msecs [ 31.199416] calling 0000:00:02.0+ @ 3841, parent: pci0000:00 [ 31.210059] call 0000:00:02.0+ returned 0 after 10398 usecs [ 31.298228] PM: noirq resume of devices complete after 98.936 msecs [ 31.298385] PM: early resume of devices complete after 0.129 msecs [ 31.298404] calling 0000:00:02.0+ @ 3914, parent: pci0000:00 [ 31.337787] call 0000:00:02.0+ returned 0 after 38479 usecs [ 36.313220] PM: resume of devices complete after 5017.604 msecs So, we will get resume time: 10398 usecs + 38479 usecs = 48877 usecs = 48.877 msecs 2. For the bad_commit.dmesg, [ 1.871578] initcall i915_init+0x0/0x68 [i915] returned 0 after 478124 usecs [ 24.050262] calling 0000:00:02.0+ @ 3861, parent: pci0000:00 [ 24.080930] call 0000:00:02.0+ returned 0 after 29964 usecs [ 24.167496] PM: suspend of devices complete after 123.843 msecs [ 24.167819] PM: late suspend of devices complete after 0.321 msecs [ 24.211848] calling 0000:00:02.0+ @ 3753, parent: pci0000:00 [ 24.211849] call 0000:00:02.0+ returned 0 after 0 usecs [ 24.211909] PM: noirq suspend of devices complete after 44.112 msecs [ 24.431277] calling 0000:00:02.0+ @ 3753, parent: pci0000:00 [ 24.441924] call 0000:00:02.0+ returned 0 after 10402 usecs [ 24.530095] PM: noirq resume of devices complete after 98.940 msecs [ 24.530252] PM: early resume of devices complete after 0.129 msecs [ 24.530276] calling 0000:00:02.0+ @ 25, parent: pci0000:00 [ 24.789721] call 0000:00:02.0+ returned 0 after 253503 usecs [ 29.546061] PM: resume of devices complete after 5018.598 msecs resume time is: 10402 usecs + 253503 usecs = 263905 usecs = 263.905 msecs Now, I find our IVB machines also have this issue, so I will change the subject which will be more reasonable... (In reply to comment #5) > ALSA snd-intel resume time is 5s. > > Despite being switchless, we spend a good 1s after resuming doing the > hotplug dance to reconfigure the fbcon. I hope that is sped up by the > hotplug encoder work, so would be good to also see the current state of > affairs in -nightly. > > But, I see nothing in that log that correlates with your measurement - so > please do describe your experimental procedure. My experimental procedure: 1. plug in a HDMI (or eDP, or DP), the dmesg which I attached used HDMI 2. reboot the machine with kernel command: initcall_debug drm.debug=0xe 3. do a suspend-resume cycle of S3 4. dmesg | grep -i -e "i915_init\|PM:.*complete\|00:02.0+" Then, we can calculate the resume time. I had my good/bad logs reversed. This just seems to be a pipe_config reconstruction issue along with the usual slow modesetting sequence due to waiting on inactive vblanks. The extra time is mostly due to the 4 missed vblanks during the modeset. VT-switchless resume simply moved the modeset time from fbcon into the driver. Of course we then still suck down the fail which is called fbcon output probing :( In any case this isn't a regression (and so are any other resume time regressions bisected to this commit) but just an artifact of how the code worked previously and that we didn't actually measure the full gfx resume sequence properly. Speeding up modeset to fit into the envelope Jesse set will require Real Work which needs to be properly tracked. Hence I'll close this. Still, please keep a close eye on these numbers, and if they regress by a significant amount (anything above +5ms) please report these regressions. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.