Summary: | [SKL/KBL] Init time is high | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | cprigent <christophe.prigent> | ||||||||
Component: | DRM/Intel | Assignee: | cprigent <christophe.prigent> | ||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | major | ||||||||||
Priority: | high | CC: | intel-gfx-bugs | ||||||||
Version: | unspecified | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | KBL, SKL | i915 features: | |||||||||
Attachments: |
|
Description
cprigent
2016-03-27 10:20:46 UTC
What are the results with i915.fastboot=1 module parameter? At the moment I checked on SKL-Y with same SW environment: init: 404139 usecs Suspend: 293523 usecs Resume: 508757 usecs Created attachment 123043 [details] kbl-fasboot-4.6.0-rc3_d9131d6-kern.log Results with 4.6.0-rc3 d9131d6: init: 315.30 ms suspend: 320.84 ms resume: 501.59 ms Reproduced with fastboot (kern.log attached): init 326.53 msecs suspend: 297.38 msecs resume: 497.26 msecs Hardware Platform: KABY LAKE-U CPU : Intel(R) Core(TM) @ 2.60GHz MCP : KBL-U G0 2+2 (ou ULT-G0) QDF : QYQ8 Chipset PCH: SPT-LP C1 CRB : KABY LAKE U DDR3L RVP7 CRB FAB1 Software BIOS : KBLSE2R1.R00.X015.B01.1511271314 ME FW : 11.5.0.1008 Ksc (EC FW): 1.20 Linux distribution: Ubuntu 15.10 64 bits Kernel: drm-intel-nightly 4.6.0-rc3_d9131d6 from http://cgit.freedesktop.org/drm-intel/ commit d9131d62d18ba94fb3ca019f1156c22b5f4ce23c Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Fri Apr 15 14:54:26 2016 +0100 drm-intel-nightly: 2016y-04m-15d-13h-53m-44s UTC integration manifestdrm: tag libdrm-2.4.66-33-gf884af9 libdrm 2.4.67-25 cc9a53f from git://git.freedesktop.org/git/mesa/drm mesa 11.1.2 7bcd827 from git://git.freedesktop.org/git/mesa/mesa cairo 1.15.2 db8a7f1 from git://git.freedesktop.org/git/cairo xorg/xserver 1.18.0-274 8437955 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel 2.99.917-634 81029be from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel vaapi/libva 1.7.0-1 2339d10 from git://git.freedesktop.org/git/vaapi/libva vaapi/intel-driver 1.7.0-8 2c1bec0 from git://git.freedesktop.org/git/vaapi/intel-driver Was SKL always slow and we never noticed or is this a new regression? Does the time change when you completely remove the firmware files? In order to fully remove the firmware from the picture besides removing /lib/firmware/i915 and updating the initrd I'd also change i915_drv.h - #define HAS_CSR(dev) (IS_GEN9(dev)) + #define HAS_CSR(dev) false - #define HAS_GUC_UCODE(dev) (IS_GEN9(dev)) - #define HAS_GUC_SHED(dev) (IS_GEN9(dev)) - #define HAS_GUC_UCODE(dev) false - #define HAS_GUC_SHED(dev) false ops: - #define HAS_GUC_UCODE(dev) (IS_GEN9(dev)) - #define HAS_GUC_SHED(dev) (IS_GEN9(dev)) + #define HAS_GUC_UCODE(dev) false + #define HAS_GUC_SHED(dev) false (In reply to Rodrigo Vivi from comment #4) > Was SKL always slow and we never noticed or is this a new regression? This is known on SKL: bug 90892 (In reply to Rodrigo Vivi from comment #7) > ops: > > - #define HAS_GUC_UCODE(dev) (IS_GEN9(dev)) > - #define HAS_GUC_SHED(dev) (IS_GEN9(dev)) > + #define HAS_GUC_UCODE(dev) false > + #define HAS_GUC_SHED(dev) false I did following changes, is it correct? - #define HAS_CSR(dev) (IS_GEN9(dev)) + #define HAS_CSR(dev) false - #define HAS_GUC_UCODE(dev) (IS_GEN9(dev) && !IS_KABYLAKE(dev)) - #define HAS_GUC_SCHED(dev) (IS_GEN9(dev) && !IS_KABYLAKE(dev)) + #define HAS_GUC_UCODE(dev) false + #define HAS_GUC_SCHED(dev) false I measured: init: 238643 usecs suspend: 280102 usecs resume: 507243 usecs Tested with kernel drm-intel-nightly 4.6.0-rc5 e005db1 from http://cgit.freedesktop.org/drm-intel/ commit e005db1cb2c60d18abe837ac683d8993ea77b239 Author: Jani Nikula <jani.nikula@intel.com> Date: Tue Apr 26 15:52:20 2016 +0300 drm-intel-nightly: 2016y-04m-26d-12h-51m-57s UTC integration manifest Milestone criteria blocker so increasing priority Something that make a lot of difference here: How is the display connected, and what kind of display is it? eDP, DP, HDMI, CRT? Do you have more than one display connected? Any adapters, or is the display connected using its native mode? I'm not sure whether the screen resolution affects things in any significant manner, but that might be useful information too. As can be seen from the log, Mar 25 18:30:21 KBLU1 kernel: [ 1.449273] [drm:intel_dp_init_panel_power_sequencer] panel power up delay 200, power down delay 50, power cycle delay 600 Mar 25 18:30:21 KBLU1 kernel: [ 1.449274] [drm:intel_dp_init_panel_power_sequencer] backlight on delay 1, panel off delay 200 The backlight off-delay is 200ms and the display power down delay is 50ms. That makes the target of 62.5ms totally unrealistic; with such delays a target of 300ms is realistic (and since we hit 285ms that seems reasonable). As far as resume goes, backlight on delay is 1ms and panel on delay is 200ms, so the target is still unrealistic, however there's definitely a larger gap here. On resume, however, we also do much more work, so we definitely should expect to use more time than for suspend. I cannot say without examining the code further how much we can shave off from the current times, but as long as we have a panel on delay of 200ms we can definitely not expect to meet a target of 187.5ms... Just my 2ยข There are a couple of other things to consider. The init/resume time being measured here is how long we are blocking the boot/resume from starting userspace. It is quite acceptable for userspace to start before the panel is lit, for example. From boot, the target is set assuming that the BIOS has already prelit all the outputs, in which case we are measuring and trying to minimise the latency of the driver taking over the existing hw state. How much of an improvement is it to set i915.fastboot=1? We still haven't enabled that by default because we had a regresion on that. BTW, do you use some special patch to get that "call 0000:00:02.0+ returned 0 after" debug message? I cannot find it ian my logs at least. (In reply to Chris Wilson from comment #12) > There are a couple of other things to consider. The init/resume time being > measured here is how long we are blocking the boot/resume from starting > userspace. It is quite acceptable for userspace to start before the panel is > lit, for example. > > From boot, the target is set assuming that the BIOS has already prelit all > the outputs, in which case we are measuring and trying to minimise the > latency of the driver taking over the existing hw state. That only applies to the init-target, however, not the suspend/resume targets also listed here; I don't think we can rely on just taking over the existing hw state from BIOS for S0/S0ix/S3. Agreed? (In reply to David Weinehall from comment #14) > BTW, do you use some special patch to get that "call 0000:00:02.0+ returned > 0 after" debug message? I cannot find it ian my logs at least. initcall_debug kernel parameter. I've tested with and without fastboot; the difference is -- sadly -- negligible.(In reply to Jani Nikula from comment #13) > How much of an improvement is it to set i915.fastboot=1? We still haven't > enabled that by default because we had a regresion on that. On both Broadwell and KBL-Y i915.fastboot=1 actually takes around twice as *long*. So clearly something isn't quite right with fastboot... I must've messed up with Friday's measurements with/without fastboot. The real outcome is that it doesn't make any discernible different on/off. Sorry for the false alarm. Measurements using SuspendResume on a KBL-Y yields roughly these numbers: suspend: 320ms -- out of this 300ms is spent turning off panel + backlight; the delays here cannot be optimised away, so at most we can shave off a few milliseconds. This values will vary greatly depending on the panel used. resume: 470ms -- out of this 30ms is spent in restore_gtt, 230ms is spent turning on panel + backlight, 170ms on probing for new displays, and 30ms in one of the opregion-related calls. The probing for new displays can be parallelised once the atomic modeset code is finalised. There are various other things that can be optimised here too, as the numbers probably show, but -- and here's the important part, for resume we are not blocking the critical path. There are several components that take longer to resume, and they are resuming in parallel to our driver. Finally, for init, after consultation with Daniel Vetter it seems unlikely that the init-time target is realistic. According to him none of our earlier platforms have been able to meet this target, and the basis for the requirement isn't fully understood. It should, however, be noted that on startup all modules load in parallel, so we're not actually blocking anything. Conclusion -- the situation is much better than the initial bug report indicates. A lot of the time spent during init/suspend/resume is spent in delays that are specified by hardware (enabling/disabling/probing) and thus not amenable to optimisation. The main remaining area is likely to see improvements once the atomic modeset is finalised. (In reply to David Weinehall from comment #19) > Measurements using SuspendResume on a KBL-Y yields roughly these numbers: > > suspend: 320ms -- out of this 300ms is spent turning off panel + backlight; > the delays here cannot be optimised away, so at most we can shave off a few > milliseconds. This values will vary greatly depending on the panel used. > > resume: 470ms -- out of this 30ms is spent in restore_gtt, 230ms is spent > turning on panel + backlight, 170ms on probing for new displays, and 30ms in > one of the opregion-related calls. Could you test https://patchwork.freedesktop.org/patch/82422/ that should help the restore_gtt slightly. > The probing for new displays can be parallelised once the atomic modeset > code is finalised. There are various other things that can be optimised here > too, as the numbers probably show, but -- and here's the important part, for > resume we are not blocking the critical path. There are several components > that take longer to resume, and they are resuming in parallel to our driver. Actually improving the logic to show the graph and the length of the critical path would help your argument immensely and should allow efforts to be focused on shortening the critical path. i.e. replace using initcall_debug alone with scripts/bootgraph.pl. But we should assume the ideal case where all other time is 0, and i915.ko is the critical path. (In reply to Chris Wilson from comment #20) > (In reply to David Weinehall from comment #19) > > Measurements using SuspendResume on a KBL-Y yields roughly these numbers: > > > > suspend: 320ms -- out of this 300ms is spent turning off panel + backlight; > > the delays here cannot be optimised away, so at most we can shave off a few > > milliseconds. This values will vary greatly depending on the panel used. > > > > resume: 470ms -- out of this 30ms is spent in restore_gtt, 230ms is spent > > turning on panel + backlight, 170ms on probing for new displays, and 30ms in > > one of the opregion-related calls. > > Could you test https://patchwork.freedesktop.org/patch/82422/ that should > help the restore_gtt slightly. Not just slightly. On my Broadwell it cuts restore_gtt from 15ms to less than 5ms. The Kabylake-Y I've been testing on is busy today -- will run similar tests on it tomorrow to see whether the savings are similar there. No milestone blocker, reducing priority I'm currently in the process of investigating of what actions are necessary for our driver to perform on suspend/resume, to allow us to provide more accurate targets. The targets will, necessarily, need to vary a bit from platform to platform (due to hardware workarounds, etc.), and should exclude the panel/backlight on/off timings. With Chris's patch from earlier applied, as well as a patch for another issue I discovered yesterday, we're already very close to optimal; most time spent is caused by delays in hardware; ACPI/BIOS/hardware) suspend-timings on the platforms I've tested so far (with one bug that needs further investigation). An area to keep a watchful eye on, though, is if we end up with multiple panels/backlights(/segmented panel displays?) that need disabling, since this is currently done in a serial manner. Next up to resume, where Chris is already doing a lot of work. Measurement on KBL-Y with my patch applied: Total suspend: 308ms 200ms is spent waiting for the backlight to disable (timing from VBT). 16ms is spent disabling the pipe. 60ms is spent waiting for the panel to disable (timing from VBT). 12ms is spent in late suspend. 20ms is spent doing various other things (reading from/writing to hardware, ACPI, etc.) The late suspend is for disabling power domains and setting the PCI state to D3_hot. So, after subtracting the 260ms from panel/backlight timings, we actually meet the expected result for suspend (it's reasonable to assume that such values would be the outcome if we were to measure without a display connected via eDP). For the resume case: Total resume: 460ms 13.5ms in resume noirq (PCI powerup and state restore) 13ms in resume early (init powerdomains) 40ms setting up + initialising opregions 3ms to restore gtt mappings 200ms to enable the panel (timing from VBT) + 4ms linktraining 2 * 88ms for HDMI detect (2 ports) 10.5ms doing other things Reasonably the HDMI detection should be possible to parallelise (either only with each other, thus saving us 88ms, or with the eDP, thus saving us 176ms on platforms that uses eDP, and 88ms on platforms that don't). The opregion setup looks like a target we could optimise (possibly even postpone). If we subtract the 200ms panel timing, we still have 260ms left (of which 176ms is HDMI detect), so clearly there's some room for improvement, but also the targets might need readjusting (if the total time allotment for suspend + resume is 187.5ms and suspend takes ~43ms, we've got 144.5ms to resume. The PCI powerup/restore/powerdomain bits are non-negotiable (to my understanding), so that's 26.5ms we cannot shave off. At the very least the device is bound to be connected to one display, so we'll have at least either 88ms or 200ms to contend with. Link training is also unavoidable, as well as restoring gtt mappings. The "doing other things" bit is quite likely to contain mostly necessary things too. Thus, unless we can optimise away some bits of the opregion setup cost, the target needs adjusting (also, I think it makes sense to split the target into separate suspend & resume values). Preliminary suggestions for targets would 50ms for suspend (might need adjusting for hardware) on a non-eDP device, and maybe something like 150ms for resume. But this assumes that it is feasible to optimise things, obviously. I have tested this issue on KBL wit the following config: |=== Software information ===| ++ Kernel version : 4.6.0-rc7-patch-82422+ ++ Linux distribution : Ubuntu 15.10 ++ Architecture : 64-bit ++ xf86-video-intel version : 2.99.917 ++ Xorg-Xserver version : 1.18.99.1 ++ DRM version : 2.4.68 ++ Cairo version : 1.15.2 ++ Intel GPU Tools version : Tag [intel-gpu-tools-1.14-319-gb88bce4] / Commit [b88bce4] ++ Kernel driver in use : i915 ++ Hardware acceleration : ++ Bios revision : 36.2 ++ KSC revision : 1.20 |=== Hardware information ===| ++ Platform : KBL (RVP7) ++ Motherboard model : Kabylake Client platform ++ Motherboard type : Skylake U DDR3L RVP7 Laptop ++ Motherboard manufacturer : Intel Corporation ++ CPU family : Other ++ CPU information : Genuine Intel(R) CPU 0000 @ 2.60GHz ++ GPU Card : Intel Corporation Device 5916 (prog-if 00 [VGA controller]) |== Kernel parameters ==| quiet splash drm.debug=0xe i915.preliminary_hw_support=1 initcall_debug i915.fastboot=1 and i get the following results: results without i915.fastboot=1 module parameter init: [ 1.727641] initcall i915_init+0x0/0x99 [i915] returned 0 after 387249 usecs Suspend: [ 180.621025] call 0000:00:02.0+ returned 0 after 295806 usecs [ 180.621118] PM: suspend of devices complete after 321.303 msecs REsume: [ 180.639935] call 0000:00:02.0+ returned 0 after 13731 usecs [ 180.651224] call 0000:00:02.0+ returned 0 after 1 usecs [ 180.903152] call 0000:00:02.0+ returned 0 after 18360 usecs [ 180.922977] call 0000:00:02.0+ returned 0 after 6197 usecs [ 181.472492] call 0000:00:02.0+ returned 0 after 498686 usecs [ 181.472629] PM: resume of devices complete after 510.904 msecs ====================================================================================== results with i915.fastboot=1 module parameter init: [ 1.718740] initcall i915_init+0x0/0x99 [i915] returned 0 after 385555 usecs Suspend: [ 111.358279] call 0000:00:02.0+ returned 0 after 1214290 usecs [ 111.358384] PM: suspend of devices complete after 1251.146 msecs Resume: [ 111.378979] call 0000:00:02.0+ returned 0 after 15174 usecs [ 111.383042] call 0000:00:02.0+ returned 0 after 0 usecs [ 111.635269] call 0000:00:02.0+ returned 0 after 11823 usecs [ 111.661837] call 0000:00:02.0+ returned 0 after 6190 usecs [ 112.216082] call 0000:00:02.0+ returned 0 after 503149 usecs [ 112.216266] PM: resume of devices complete after 515.524 msecs As additional comment, the platform only has an eDP panel attached, and kernel has the patch -82422-. After adding the parameter i915.fastboot=1 the platform is able to boot up, but eDP gets blank screen. You can find the full dmesg_Patch_KBL_U.log as attached file. Created attachment 123924 [details]
dmesgPatch_KBL_U.log
Improvement done by Chris on resume and submitted here: https://patchwork.freedesktop.org/series/11557/ Patches are already upstreamed. I see good improvement in the resume. Module init [ 2.912242] initcall i915_init+0x0/0x5c [i915] returned 0 after 65486 usecs [ 252.874012] initcall i915_init+0x0/0x5c [i915] returned 0 after 850954 usecs [ 253.937338] initcall i915_init+0x0/0x5c [i915] returned 0 after 124 usecs [ 254.245110] initcall i915_init+0x0/0x5c [i915] returned 0 after 58279 usecs [ 254.573160] initcall i915_init+0x0/0x5c [i915] returned 0 after 58709 usecs [ 254.865172] initcall i915_init+0x0/0x5c [i915] returned 0 after 64745 usecs [ 255.970036] initcall i915_init+0x0/0x5c [i915] returned 0 after 845737 usecs Suspend and resume [ 450.737868] call 0000:00:02.0+ returned 0 after 302597 usecs [ 450.761468] call 0000:00:02.0+ returned 0 after 20290 usecs [ 450.766728] call 0000:00:02.0+ returned 0 after 0 usecs [ 451.090349] call 0000:00:02.0+ returned 0 after 18898 usecs [ 451.264150] call 0000:00:02.0+ returned 0 after 4975 usecs [ 451.676679] call 0000:00:02.0+ returned 0 after 291634 usecs [ 513.460443] call 0000:00:02.0+ returned 0 after 296654 usecs [ 513.486126] call 0000:00:02.0+ returned 0 after 22126 usecs [ 513.490065] call 0000:00:02.0+ returned 0 after 0 usecs [ 513.814883] call 0000:00:02.0+ returned 0 after 18913 usecs [ 513.990726] call 0000:00:02.0+ returned 0 after 6952 usecs [ 514.396310] call 0000:00:02.0+ returned 0 after 291534 usecs [ 576.946503] call 0000:00:02.0+ returned 0 after 290107 usecs [ 577.094638] call 0000:00:02.0+ returned 0 after 22783 usecs [ 577.097770] call 0000:00:02.0+ returned 0 after 0 usecs [ 577.443501] call 0000:00:02.0+ returned 0 after 18824 usecs [ 577.619260] call 0000:00:02.0+ returned 0 after 6854 usecs [ 578.024994] call 0000:00:02.0+ returned 0 after 292845 usecs [ 640.460094] call 0000:00:02.0+ returned 0 after 310371 usecs [ 640.631295] call 0000:00:02.0+ returned 0 after 22593 usecs [ 640.637127] call 0000:00:02.0+ returned 0 after 0 usecs [ 640.920048] call 0000:00:02.0+ returned 0 after 18874 usecs [ 641.095886] call 0000:00:02.0+ returned 0 after 6960 usecs [ 641.537489] call 0000:00:02.0+ returned 0 after 292647 usecs Platform: KABY LAKE-U Processor : Genuine Intel(R) CPU 0000 @ 1.80GHz (cpu family: 6, model: 142, stepping: 9) MCP : KBL-U J0 2+3e QDF : QL9J PCH: PCH-LP C1 CRB : KABY LAKE U DDR3L RVP7 Rework: O-16 Software BIOS: 45.1 3KBLSE2R1.R00.X045.P01.1606291634 from https://ubit-artifactory-ba.intel.com/artifactory/owr-repos/Submissions/ifwi/KBL_ORANGE_IFWI_2016_WW27_3_03_SR'17/ ME FW: 11.6.0.1065 EC FW: 1.24 KSC: 1.24 Linux distribution: Ubuntu 16.04 64 bits DMC 1.01 from https://01.org/sites/default/files/downloads/intelr-graphics-linux/kbldmcver101.tar.bz2 GuC 9.14 Kernel: 4.9.0-rc2 1e1c92a from http://cgit.freedesktop.org/drm-intel/ commit: 1e1c92a58e4a4a88c90dc274d495caae7898475e author: Daniel Vetter <daniel.vetter@ffwll.ch> committed_date: Mon Oct 10 08:36:38 2016 +0000 'drm-intel-nightly: 2016y-10m-10d-08h-36m-04s UTC integration manifest' libdrm-2.4.71 9e24d0c from git://anongit.freedesktop.org/mesa/drm mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo xorg-server-1.18.99.901-80 5dcb066 from git://git.freedesktop.org/git/xorg/xserver xf86-video-intel 2.99.917-720 388fd4a from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva vaapi-intel-driver: 1.7.2-140 852cea1 from git://git.freedesktop.org/git/vaapi/intel-driver External screen: Dell U2311hb (DP) I propose we close it until new criteria is defined and it will be checked again using a production device. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.