Summary: | [NV92] Regression in Linux 3.15: GPU lockup after suspend | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Agustín Dall'Alba <agustin> | ||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | CC: | bugs, freedesktop, freedesktop, gordon, svenjoac | ||||||||||
Version: | git | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
Agustín Dall'Alba
2014-07-10 00:35:07 UTC
Created attachment 102509 [details]
Logs for git kernel with noaccel=1 nofbaccel=1
Created attachment 102510 [details]
Logs for Linux 3.15.4 with commit ecf24de reverted
I've experienced a similar issue when resuming from suspend-to-ram status. The screen was blank and in dmesg, I have several kernel messages from the nouveau module. I'm running Linux 3.16.0 (gentoo-sources package from Gentoo) with xorg-server 1.16.0, x11-drivers/xf86-video-nouveau-1.10.0-r1 and libdrm-2.4.54. I will attach part of /var/log/messages with the nouveau errors. Created attachment 104373 [details]
kernel messages during wake-up from resume
Forgot to mention my card model: # lspci -v|fgrep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce 8600 GT] (rev a1) (prog-if 00 [VGA controller]) Same problem here on NV86 [GeForce 8500 GT], reverting commit ecf24de071f4f6cea79ecef5d990794df5875ee1 in 3.16.0 helps. Update: I got tired of reverting the ecf24de commit on every linux update, so I tried booting with nouveau.nofbaccel=1 (instead of nofbaccel=1). It works fine. The system still does not resume properly without it on Linux v3.16.1, but that boot option is a better workaround than reverting. Same issue. Dell M4800 with QHD+ display -- NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1), 3.16.6-gentoo (I tried 3.17, that didn't even give me a usable display). None of the workarounds were effective for me: nouveau.nofbaccel=1 causes suspend to fail, and so did reverting ecf24de071f4f6cea79ecef5d990794df5875ee1: A dependency job for suspend.target failed. See 'journalctl -xn' for details. ... Oct 25 15:21:16 hostname kernel: WARNING: CPU: 0 PID: 2852 at lib/iomap.c:43 bad_io_access+0x36/0x38() Oct 25 15:21:16 hostname kernel: Bad IO access at port 0x24 (outl(val,port)) Another update: I tried running with a secondary monitor. Unfortunately under that setup the nouveau.nofbaccel=1 workaround doesn't cut it anymore, and only one monitor works after resume. Trying to unplug and replug or use xrandr after this has happened doesn't make the other monitor work and once even left me with no screen. I found some new kernel messages, in particular: <6>[ 0.336621] nouveau [ PFB][0000:01:00.0] RAM type: GDDR3 <6>[ 0.336623] nouveau [ PFB][0000:01:00.0] RAM size: 512 MiB <3>[ 0.336620] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000007 FAULT at 0x00e180 --snip-- <6>[ 0.365519] nouveau [ DRM] VRAM: 512 MiB <6>[ 0.365521] nouveau [ DRM] GART: 1048576 MiB --snip-- <3>[ 0.366886] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070 <3>[ 0.368257] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070 right after nouveau loads, another: <6>[ 75.935933] nouveau [ DRM] suspending console... <6>[ 75.935944] nouveau [ DRM] suspending display... <6>[ 75.936012] nouveau [ DRM] evicting buffers... <6>[ 76.206568] nouveau [ DRM] waiting for kernel channels to go idle... <6>[ 76.206573] nouveau [ DRM] suspending client object trees... <6>[ 76.207261] nouveau [ DRM] suspending kernel object tree... <3>[ 76.267516] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070 immediately before suspend and: <6>[ 78.110864] nouveau [ DRM] re-enabling device... <6>[ 78.110870] nouveau [ DRM] resuming kernel object tree... <6>[ 78.110882] nouveau [ VBIOS][0000:01:00.0] running init tables <3>[ 78.200040] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e074 <6>[ 78.274292] nouveau [ VOLT][0000:01:00.0] GPU voltage: 1000000uv <6>[ 78.274303] nouveau [ PTHERM][0000:01:00.0] fan management: automatic <6>[ 78.274378] nouveau [ CLK][0000:01:00.0] --: core 399 MHz shader 810 MHz memory 499 MHz <3>[ 78.275977] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070 <3>[ 78.277301] nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070 <6>[ 78.277474] nouveau [ DRM] resuming client object trees... <6>[ 78.277902] nouveau [ DRM] resuming display... on resume. Maybe this is another bug? So now I'm using linux-lts 3.14.22. No problems there, suspend and multi monitor setups work great. Ed, Considering that the workarounds mentioned do not work in your case and that you have a different card (reporter has nv92, while yours is gk106) we can safely conclude that you're having a different issue. Please open another bug report and let us know if it is a regression, and if so which commit broke it. Agustín, These two should be non-fatal and the fix for them is in 3.18. Should end up in 3.16, 3.17 as well. > FAULT at 0x00e070 > FAULT at 0x00e074 Now this one, I have no idea. Do you get this error with 3.14 and dual monitors ? > FAULT at 0x00e180 Linux 3.17 includes quite a few fixes in the area of s/r, can you give it a try. On 3.14.22 I get no FAULTs and suspend works fine. On linux 3.17.1 I get a FAULT at 0x00e070 and 0x00e074 on boot, suspend, resume, and when plugging the second monitor for the first time. But I can't reproduce a FAULT at 0x00e180 in any way. Checking the logs it looks like it's quite rare (it happens every twenty or so FAULTs) and unrelated to the second monitor. If I use the nouveau.nofbaccel=1 (only) one of the monitors comes back after resume. If I don't I get the gabled display, 'GPU lockup' and PGRAPH errors as in the original post. I'm downloading linux mainline now to test. The upstream commit addressing the e07{0,4} messages (ignore the typo in the commit message) is https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/nouveau?id=b485a7005faba38286bc02ab1d80e2cbf61c1002 ^^ is just in case 3.18 causes some other unwanted behaviour. Brilliant, Linux 3.18-rc2 resumes both monitors with nouveau.nofbaccel=1. :D So it indeed was a different issue. The original GPU lockup bug is still there, though. Fixed in Linux 3.19 :) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.