Bug 96319 - [CHT] Starting X on Surface 3 takes ~15 seconds
Summary: [CHT] Starting X on Surface 3 takes ~15 seconds
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-02 02:00 UTC by Stephen Just
Modified: 2017-07-24 22:41 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT
i915 features:


Attachments
dmesg with drm-intel-nightly (Jun 1 2016) (87.08 KB, text/x-log)
2016-06-02 02:00 UTC, Stephen Just
no flags Details
dmesg with timestamps (99.43 KB, text/plain)
2016-06-02 18:31 UTC, Stephen Just
no flags Details
xorg log with timestamps (44.12 KB, text/plain)
2016-06-02 18:31 UTC, Stephen Just
no flags Details
dmesg with timestamps, after blacklist (89.26 KB, text/plain)
2016-06-02 19:06 UTC, Stephen Just
no flags Details
xorg log with timestamps, after blacklist (44.12 KB, text/plain)
2016-06-02 19:07 UTC, Stephen Just
no flags Details
dmesg, Comment 10 (96.46 KB, text/plain)
2016-06-02 20:08 UTC, Stephen Just
no flags Details
xorg log, Comment 10 (22.83 KB, text/plain)
2016-06-02 20:09 UTC, Stephen Just
no flags Details
dmesg, gdm, comment 14 (348.71 KB, text/plain)
2016-06-02 20:59 UTC, Stephen Just
no flags Details
dmesg, startx, comment 14 (211.01 KB, text/plain)
2016-06-02 21:00 UTC, Stephen Just
no flags Details
perf timechart output (3.14 MB, application/octet-stream)
2016-06-03 20:24 UTC, Stephen Just
no flags Details

Description Stephen Just 2016-06-02 02:00:03 UTC
Created attachment 124252 [details]
dmesg with drm-intel-nightly (Jun 1 2016)

When starting X on the Surface 3, either through starting GDM or a window manager directly, it takes approximately 15-20 seconds for the display to show up after executing the start command, which seems very slow.

This might be a duplicate to Bug 93929, but since then the dmesg has gotten much less noisy.

It's not clear to me exactly what part of the gpu initialization is taking so long from the logs, there are some gaps of about 2s in logging while the card initializes, but they don't seem to correspond with any one log message.
Comment 1 Chris Wilson 2016-06-02 09:47:01 UTC
Xorg.0.log would be relevant as well.
Comment 2 Stephen Just 2016-06-02 18:31:23 UTC
Created attachment 124280 [details]
dmesg with timestamps
Comment 3 Stephen Just 2016-06-02 18:31:59 UTC
Created attachment 124281 [details]
xorg log with timestamps
Comment 4 Stephen Just 2016-06-02 18:34:05 UTC
I've attached another dmesg and xorg log with matching timestamps, so that they can hopefully be followed together.
Comment 5 Ville Syrjala 2016-06-02 18:45:58 UTC
> [Thu Jun  2 08:23:21 2016] [Firmware Bug]: battery: (dis)charge rate invalid.

Tried removing the battery driver? IIRC that was bogging down on of my CHV machines at some point. That was a long time ago, so might have just been an issue with early firmware. Audio driver was another culprit, but that too was long ago.
Comment 6 Stephen Just 2016-06-02 19:05:42 UTC
I blacklisted that, and several other "noisy" modules (mwifiex_pcie, snd_soc_*), and did not seem to help. I'll update the xorg log and dmesg.
Comment 7 Stephen Just 2016-06-02 19:06:20 UTC
Created attachment 124283 [details]
dmesg with timestamps, after blacklist
Comment 8 Stephen Just 2016-06-02 19:07:24 UTC
Created attachment 124284 [details]
xorg log with timestamps, after blacklist
Comment 9 Chris Wilson 2016-06-02 19:47:53 UTC
Hmm, just happens the last thing in the log before the pause is libinput, then logind transferring a bunch of fd.

Maybe

Section "ServerFlags"
  Option "NoAutoAddDevices"
EndSection

?

You won't have any input unless you manually add one, but for a quick test of whether it starts any quicker may be worthwhile if you have a remote login. Or running Xorg under strace -f -tt and see if there is a long wait for a device.
Comment 10 Stephen Just 2016-06-02 20:07:54 UTC
(In reply to Chris Wilson from comment #9)
> Maybe
> 
> Section "ServerFlags"
>   Option "NoAutoAddDevices"
> EndSection

I tried this, and X didn't come up any faster. Will attach another log pair.
Comment 11 Stephen Just 2016-06-02 20:08:37 UTC
Created attachment 124285 [details]
dmesg, Comment 10
Comment 12 Stephen Just 2016-06-02 20:09:11 UTC
Created attachment 124286 [details]
xorg log, Comment 10
Comment 13 Chris Wilson 2016-06-02 20:25:50 UTC
The logs more or less say nothing is happening. You see the same delay when launching X with startx? Could you use drm.debug=0xf (so we have atomics and ioctls) then

$ echo -e "\nstartx\n" > /dev/kmsg
$ startx
...
in a terminal or whatever: $ echo -e "\nhello world\n" > /dev/ksmg

just so we have timestamps on everything. You may need to increase the dmesg logbuffer size.
Comment 14 Stephen Just 2016-06-02 20:59:05 UTC
Actually, "startx" seems to be pretty immediate, it's just starting gdm that's really really slow.

I've got a dmesg for each case. In the gdm case, there are several second breaks above some drm_stob_open calls... In the "startx" case that just launches twm, there are no large breaks in logging.

Search for "startx" as suggested. (With gdm, I switched back to a VT to add the "hello world" log.)
Comment 15 Stephen Just 2016-06-02 20:59:55 UTC
Created attachment 124289 [details]
dmesg, gdm, comment 14
Comment 16 Stephen Just 2016-06-02 21:00:25 UTC
Created attachment 124290 [details]
dmesg, startx, comment 14
Comment 17 Chris Wilson 2016-06-03 18:14:18 UTC
There are some long delays in there (setgama > 0.5s, getconnector > 0.2s), but the pauses do not correlate with device or driver activity. I can confidently say userspace is just being slow. Maybe stalling for disk?
Comment 18 Stephen Just 2016-06-03 18:58:55 UTC
(In reply to Chris Wilson from comment #17)
> There are some long delays in there (setgama > 0.5s, getconnector > 0.2s),
> but the pauses do not correlate with device or driver activity. I can
> confidently say userspace is just being slow. Maybe stalling for disk?

It's possible. I tried caching all of /bin and /lib in memory prior to starting GDM (using vmtouch) and it did bring the delay down to approx 10s, and I am using a slow disk.

If you guys are confident that this is just the result of a slow disk, then I must apologize for the noise.

Thanks for your help!
Comment 19 Chris Wilson 2016-06-03 19:02:51 UTC
Perhaps "perf timechart"? That's certainly a tool I wish got move love.
Comment 20 Stephen Just 2016-06-03 20:24:36 UTC
Created attachment 124309 [details]
perf timechart output

I'm not totally sure I'm reading the output correctly, but as far as I can tell, from the point where GDM is launched until when it shows up on screen, there is pretty much constant disk IO.
Comment 21 Chris Wilson 2016-06-03 23:08:28 UTC
Yes, it does look that way. stracing the suspects may reveal a clue in the pattern of syscall that are slow (which files etc), or investigate the blktrace output, I wonder if that will you tell which driver is blocking and why. More than likely it is one slow device clogging the queues. Or the disk-io is just a red herring.
Comment 22 Stephen Just 2016-06-14 03:55:03 UTC
I installed to a faster medium, and now GDM starts almost immediately.

So, it seems like it was just a slow disk after all.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.