Summary: | lspci blocks forever with a GP107M | ||
---|---|---|---|
Product: | xorg | Reporter: | Kenneth Graunke <kenneth> |
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> |
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | normal | ||
Priority: | medium | CC: | andrey+freedesktop, eurbah, jan.public, kai.heng.feng, peter, rhyskidd |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=100228 https://bugs.freedesktop.org/show_bug.cgi?id=104621 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Kenneth Graunke
2017-06-30 17:13:52 UTC
I believe including a full dmesg (from boot) would be helpful to show what may be going wrong. Note that running with nouveau.runpm=0 will prevent the suspend from happening. However that will, of course, cause the GPU to remain on. [I believe it will remain on without nouveau loading as well, but with this new PCIe PM stuff, I'm not sure anymore.] With the "new PCIe PM stuff", if nouveau is not loaded and something else enabled automatic runtime PM (via powertop, via TLP or manually by writing "auto" to /sys/bus/pci/devices/.../power/control) for the Nvidia PCI devices, then indeed the problematic ACPI methods could be triggered. Kenneth, can you upload your acpidump? sudo pacman -S acpidump && sudo acpidump > acpidump.txt Most likely you are affected by https://bugzilla.kernel.org/show_bug.cgi?id=156341 this is because lspci reads the config file, which then triggers a full GPU wake up, which is a silly thing to do in the first place. What we need is something like this in the kernel: https://github.com/karolherbst/linux/commit/cb918e4c926990dfcfce92e1ecd905e0896de605 and then make use of those in userspace, so that we don't need to read config every time anymore. I am trying to use 'nouveau' with GP107M [GeForce GTX 1050 Ti Mobile]. With Linux Kernel 4.13.0-16 from Ubuntu 17.10 Artful, 'lspci' systematically makes immediately 1 CPU freeze. Therefore, I am testing Linux kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline With Linux kernel 4.14.0-rc7, this issue does NOT show up : $ lspci -nn -v -s 1:0 01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] [10de:1c8c] (rev a1) Subsystem: Micro-Star International Co., Ltd. [MSI] GP107M [GeForce GTX 1050 Ti Mobile] [1462:11c8] Flags: bus master, fast devsel, latency 0, IRQ 134 Memory at de000000 (32-bit, non-prefetchable) [size=16M] Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at df000000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nvidiafb, nouveau With Linux kernel 4.14.0-rc8, 'lspci' systematically makes immediately the whole computer freeze. With Linux kernel 4.14.0 (released yesterday), 'lspci' systematically fails to answer, and makes the whole computer freeze after some time. So, there is probably a regression. I have also reported this issue at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729736 With following Linux kernels, 'lspci' systematically fails to answer, and makes the whole machine immediately freeze : - 4.13.0-17 from Ubuntu 17.10 (Artful) - 4.14.1 from http://kernel.ubuntu.com/~kernel-ppa/mainline With Linux kernel 4.15.0-041500rc1 from http://kernel.ubuntu.com/~kernel-ppa/mainline, 'lspci' systematically fails to answer, and makes the whole machine freeze after some time. With Linux kernel 4.15.0-041500rc2 from http://kernel.ubuntu.com/~kernel-ppa/mainline, 'lspci' systematically fails to answer, and makes the whole machine freeze after some time. With Linux kernel 4.15.0-041500rc3 from http://kernel.ubuntu.com/~kernel-ppa/mainline, 'lspci' systematically fails to answer, and makes the whole machine freeze after some time. Besides, inside 'kern.log', I have detected following messages : nouveau 0000:01:00.0: DRM: BIT table 'A' not found nouveau 0000:01:00.0: DRM: BIT table 'L' not found nouveau 0000:01:00.0: DRM: Pointer to TMDS table invalid @Étienne Could you please provide the information that was asked in comment #1 and comment #2 of this bug report? Adding `nouveau.runpm=0` to the kernel command line should avoid the freeze but will prevent the NVIDIA card from being suspended. Looking at the bug report mentioned in comment #2, could you try booting with `acpi_osi=! acpi_osi="Windows 2009"` and/or `acpi_rev_override=5` on the kernel command line (without `nouveau.runpm=0`)? Created attachment 136105 [details]
lspci for GP107M [GeForce GTX 1050 Ti Mobile]
Lot of thanks to Pierre Moreau for his suggestions of options in the kernel command line :
- Adding just 'nouveau.runpm=0' prevents the whole machine to freeze, but 'nouveau' FAILS to manage an external display with resolution 3840 x 2160 at 60Hz through DisplayPort.
- Adding just 'acpi_rev_override=5' does NOT prevent the whole machine to freeze.
- Adding just 'acpi_osi=! acpi_osi="Windows 2009"' permits 'lspci' to succeed, and 'nouveau' to successfully manage an external display with resolution 3840 x 2160 at 60Hz through DisplayPort.
With Linux kernel 4.16.0-041600rc5 from http://kernel.ubuntu.com/~kernel-ppa/mainline, 'lspci' systematically fails to answer, and makes the whole machine freeze after some time. With Linux kernel 4.17.0-041700rc1 from http://kernel.ubuntu.com/~kernel-ppa/mainline, graphical login fails, and the machine is frozen. With Linux kernel 4.17.0-041700rc1 from http://kernel.ubuntu.com/~kernel-ppa/mainline, I confirm that systematically : - Inside a Linux console, 'lspci' fails to answer, and makes the whole machine immediately freeze. - Graphical login fails, and makes the whole machine immediately freeze. I'm experiencing the same issue on an XPS 9560 with Ubuntu 18.04 (same symptoms, same dmesg output). Any new information required? With Linux kernel 4.18.0-041800rc1 from http://kernel.ubuntu.com/~kernel-ppa/mainline : I tried to open a Linux console with Ctrl Alt F2, but did NOT succeed. Systematically, graphical login fails, and makes the whole machine immediately freeze. Created attachment 143031 [details]
dmesg from thinkpad x1 extreme / GTX 1050 Ti on kernel 4.20 with hybrid graphics
I may be blind but I haven't seen anyone attaching full dmesg output, sorry if I missed it.
The laptop in question is Thinkpad X1 Extreme with NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1). It has a BIOS setting to switch between "Hybrid" and "Discrete only" graphics.
This dmesg output is from a boot with "Hybrid" graphics, where running 'lspci' hangs and causes the system fans to spin. X server hangs on start too (using modesetting DDX)
Created attachment 143032 [details]
dmesg from thinkpad x1 extreme / GTX 1050 Ti on kernel 4.20 with discrete only graphics
Not sure if that helps with the diagnostics, but on the same Thinkpad X1 Extreme laptop with "Discrete only" BIOS setting, lspci works fine, X starts and works, but there's a timeout logged by nouveau.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/358. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.