Summary: | Memory corruption on Lenovo t440p with runpm | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Nikolay Amiantov <nikoamia> | ||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | medium | CC: | dion, jaak, peter, tim | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | Other | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Description
Nikolay Amiantov
2014-05-10 15:22:28 UTC
Also, Dell XPS 15z with recent BIOSes is also affected. (reported by mattdistro in github thread) Just to confirm -- this happens without bumblebee as well, right? Does it happen with the blob driver? This happens even without any drivers at all: just acpi_call to make _PS3, _PS0 calls. Manually making acpi calls isn't the most prudent thing to do. Please confirm that this happens (a) With just nouveau loaded. No bumblebee anywhere at all. (b) With the blob driver. Okay -- I should try blob with bumblebee, right? (In reply to comment #5) > Okay -- I should try blob with bumblebee, right? I'm not familiar with the blob situation wrt runtime pm. If they have any runtime pm-style support, please use that instead of bumblebee. If they have no support for that, then I guess it's fine to try it with bumblebee. (In reply to comment #6) > I'm not familiar with the blob situation wrt runtime pm. If they have any > runtime pm-style support, please use that instead of bumblebee. If they have > no support for that, then I guess it's fine to try it with bumblebee. I'm not too familiar with it too, but I thought that they haven't added feature like this yet -- just asked to confirm. I'll try bumblebee then. I've tested two configurations on kernel 3.14.3, bbswitch 0.8 and nvidia 337.12: (1) disabled acpi_call and my custom script, disabled bbswitch and bumblebeed (all bumblebee components), modprobe'd nouveau, tried to start X (2) started bumblebeed, loaded bbswitch, started X without nouveau, ran "primusrun glxgears" With both cases, I've got fs corruption issues, iwlwifi errors and other distinctive errors pointing at memory corruption. This problem appears to be fixed in recent kernels after adding "Windows 2013" to kernel built-in ACPI OSI list. Bisection to resolving kernel commit shows: https://github.com/Bumblebee-Project/bbswitch/issues/78#issuecomment-48600484 Nikolay — would you mind closing the bug after verifying it's resolved for you? The OSI fix indeed solves the issue. Closing the bug. Unfortunately, it wasn't a fix -- we've got another ACPI problem which prevented nvidia from disabling at all, so everything "started to work". You can find more about new problem at https://github.com/Bumblebee-Project/bbswitch/issues/78#issuecomment-48768044. I don't think we need another bug for this, do we? Hi, I also have affected T440p machine that corrupts everything once runtime PM is enabled or after calling ACPI method to resume card. It was stated in bumblebee github thread, that adding "memmap=99G$0x100000000" to kernel fixes issues on affected systems. My case looks a bit interesting because I have only 4GB of RAM right now, so disabling everything above 4GB should not change behavior. But it changes! Adding memmap= magic fixes issue for me. I've tried to compare /proc/iomem with and without boot options and found one difference. Once booted with memmap=99G$0x100000000 I'm getting one large reserved region: bceff000-18ffffffff : reserved bda00000-bf9fffff : Graphics Stolen Memory bfa00000-febfffff : PCI Bus 0000:00 c0000000-d1ffffff : PCI Bus 0000:02 c0000000-cfffffff : 0000:02:00.0 d0000000-d1ffffff : 0000:02:00.0 e0000000-efffffff : 0000:00:02.0 ... All PCI devices are inside this one large region. But if I boot with default options, iomem is different: bceff000-bf9fffff : reserved bda00000-bf9fffff : Graphics Stolen Memory bfa00000-febfffff : PCI Bus 0000:00 c0000000-d1ffffff : PCI Bus 0000:02 c0000000-cfffffff : 0000:02:00.0 d0000000-d1ffffff : 0000:02:00.0 e0000000-efffffff : 0000:00:02.0 f0000000-f0ffffff : PCI Bus 0000:02 f0000000-f0ffffff : 0000:02:00.0 f1000000-f13fffff : 0000:00:02.0 f1400000-f14fffff : PCI Bus 0000:04 So now this reserved region starting at bceff000 covers all PCI devices. [ I'm attaching both iomap files ] To check this I've tried to explicitly reserve whole region by booting with memmap=1100M$0xbfa00000 parameter. And got pretty similar to mem"map=99G" iomap. But system still crashes after runtime pm. I also was able to capture PCI configuration space for NVIDIA card from Win8 (where everything works). So I can confirm that after acpi_call windows also shows just 0xFF bytes. But once resumed, it's a bit different from linux. Both files attached. Any ideas? Maybe card is somehow misconfigured? Thanks Created attachment 105598 [details]
iomem when booted with memmap=99G$0x40000000
Created attachment 105599 [details]
dmesg when booted with memmap=99G$0x40000000
Created attachment 105600 [details]
iomem default
Created attachment 105601 [details]
dmesg default
Created attachment 105602 [details]
pci config space linux
Created attachment 105603 [details]
pci config space windows
Created attachment 105604 [details]
pci space diff (linux vs win)
Any ideas on this? Have anybody tried new BIOS 1.27-1.28? WARN: Once updated there will be no way to revert it back to pre-1.26. @doudou on Github managed to solve this problem[1][2] -- Nouveau can port the same fix, I think. [1]: https://github.com/Bumblebee-Project/bbswitch/issues/78#issuecomment-67741841 [2]: https://github.com/Bumblebee-Project/bbswitch/pull/102 Fixed in v4.8-rc1 commit 692a17dcc2922a91c6bcf11b3321503a3377b1b1 Author: Peter Wu <peter@lekensteyn.nl> Date: Fri Jul 15 15:12:18 2016 +0200 drm/nouveau/acpi: fix lockup with PCIe runtime PM It was confirmed to fix the memory corruption, if it still happens, please re-open. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.