Bug 99147

Summary: xorg hangs at initial startup on linux-4.9.0
Product: xorg Reporter: Gleb Nemshilov <gleb>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bastian.beischer, gleb, peter
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
nouveau
none
kernel syslog
none
1st attempt: before lscpi
none
1st attempt: after lscpi
none
bisect.log none

Description Gleb Nemshilov 2016-12-19 19:48:55 UTC
Created attachment 128563 [details]
nouveau

The issue start to appear after upgrading to Kernel 4.9.0. Before that, on 4.8.14 everything seems to be fine.

I have Lenovo B560 laptop with Intel Core i5 560M and Iron Lake GPU, Nvidia 310M, so it's an Optimus laptop. I have external monitor connected to HDMI port.

Components when the probelm could be reproduced:
- gentoo almost stable
- mesa 12.0.1 or 13.0.2;
- libdrm 2.4.68 or 2.4.73
- xorg 1.18.4 or 1.19.0
- intel xf86 driver snapshot or modesetting driver
- nouveau 1.0.12 or 1.0.13
- libinput 1.5.3, libevdev 1.5.2
- xfce, polkit, consolekit, dbus, etc.


So just after all services started and lightdm expected to appear with login screen, instead I can only see console login screen and nothing appears on tty7 also.

Xorg.log.0 shows no errors, dmesg is also empy (even with debug level 4). I can't reboot system after that as Xorg has red D symbol and cannot be killed, so SysRq is used.

I blacklisted nouveau driver and it helped, so I can boot to the system normally and after that I can load the nouveau module manually and even restart Xorg.

Xorg.log.0 hangs on the following item:
[    14.361] (II) xfree86: Adding drm device (/dev/dri/card1)

syslog kernel with nouveau filtered (only one time I enabled debug level when the I had the problem).
Comment 1 Gleb Nemshilov 2016-12-19 19:49:42 UTC
Created attachment 128564 [details]
kernel syslog
Comment 2 Karol Herbst 2016-12-19 20:01:58 UTC
does booting with nouveau not blacklisted and with "nouveau.runpm=0" help?
Comment 3 Gleb Nemshilov 2016-12-19 20:12:54 UTC
Yes, kernel boots with this option. The only disadvantage is that in this case nvidia GPU will always stay powered on, probably could be solved with bbswitch.
Comment 4 Karol Herbst 2016-12-19 21:34:40 UTC
(In reply to Gleb from comment #3)
> Yes, kernel boots with this option. The only disadvantage is that in this
> case nvidia GPU will always stay powered on, probably could be solved with
> bbswitch.

bbswitch won't help if nouveau is loaded.

If this is indeed a regression in 4.9, could you try and bisect this?
Comment 5 Gleb Nemshilov 2016-12-19 21:58:16 UTC
Well, I may need some guide or a at least some sort of explanation (probably short) what to do as this is completely new for me. Will this guide work for this task?
https://wiki.gentoo.org/wiki/Kernel_git-bisect
Comment 6 Peter Wu 2016-12-19 22:02:56 UTC
Nothing unusual in logs. Can you reproduce the lockup issue if you boot to a console (disable sddm), then invoke "lspci", wait for at least 5 seconds, then repeat "lspci" again? (Have a look at dmesg for unusual messages.)

If lspci hangs, then there is a problem (and hopefully dmesg is more helpful). Can you also confirm that downgrading just the kernel (while keeping the rest of userspace the same) makes the problem disappear?
Comment 7 Gleb Nemshilov 2016-12-19 22:33:00 UTC
lspci doesn't hang when I invoke it from console. I also tried startx and startxfce4 from console and it worked, so it probably some other thing?


I can confirm that the issue occured on 4.9.0 and did not occur on 4.8.14 on the same userspace stuff. I only started upgrading it after multiple unseccessfull reboots into 4.9.0. So I had stable stuff, tested 4.8.14 (no problem) and 4.9.0 (problem persist). Then upgraded stuff, tested on 4.8.14 (no problem) and 4.9.0 (problem persist).


I also recompiled 4.9.0 at some point before testing lspci (but not specifically for that, I will try to revert the changes) so that it includes nouveau into kernel and not as a module, though it still have the same issue. Does it matter in what form nouveau is presented?


I want to clarify that the only process that hangs is Xorg (and probably another which I noticed right now -- libvirtd, it also has red D symbol). The system continue to work, I have access to console and can run stuff from there.
Comment 8 Gleb Nemshilov 2016-12-20 14:17:54 UTC
Created attachment 128590 [details]
1st attempt: before lscpi

TL;DR `lspci` hangs on first invocation

I tried few more things, step by step:
1) removed kernel 4.9.0, modules, config, etc
2) tried 4.8.15 -- everything is fine, no hanging
3) installed 4.9.0 from scratch:
- redownloaded sources (gentoo-sources)
- copied old config from 4.8.{14,15}, so nouveau built as external module
- make silentoldconfig, for everything new I just press <enter>
- built, installed the kernel as did it before with the same steps

Conclusion:
- 4.8.14 -- OK
- 4.8.15 -- OK
- 4.9.0 -- fail (xorg hangs)

Then I tired suggested steps:
- disabled xdm, system boots to text console
- in 2nd attempt additionally disabled libvirtd

In both cases `lspci` just hangs without any output with red D status. At first attempt libvirtd also was with red D status, at second there wasn't libvirtd started.

Userspace stuff with for all actions was the same:
- mesa-13.0.2
- libdrm-2.4.74
- xorg-1.19.0
- xf86-video-intel (snapshot from 12 December 2016)
- xf86-video-nouveau-1.0.13
- libinput-1.5.3
- libevdev 1.5.2

dmesg attached, with "drm.debug=0x1e log_buf_len=1M", two attemps with 2 log files (before invoking lspci and after).
Comment 9 Gleb Nemshilov 2016-12-20 14:18:52 UTC
Created attachment 128591 [details]
1st attempt: after lscpi

I think those logs will be enough. If necessary I can attach more, but I think that 2nd attempt looks the same.
Comment 10 Gleb Nemshilov 2016-12-21 14:57:20 UTC
Created attachment 128607 [details]
bisect.log

So I bisected a kernel and come to the same conclusin as in this bug https://bugs.freedesktop.org/show_bug.cgi?id=98690

bisect.log attached. I assume that this bug probably can be closed as duplicate and I can move to the previous one?
Comment 11 Peter Wu 2017-01-30 14:17:35 UTC
Seems related indeed, marking as dup.

*** This bug has been marked as a duplicate of bug 98690 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.