Summary: | [GF108][Regression] Unable to handle NULL pointer dereference in nouveau_mem_host since kernel 4.15.3 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Dominik 'Rathann' Mierzejewski <dominik> | ||||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||
Severity: | normal | ||||||||||||||||||||
Priority: | medium | CC: | auxsvr, gabriel, hugh, philip.raets, tiwai | ||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||
OS: | All | ||||||||||||||||||||
See Also: |
https://bugzilla.opensuse.org/show_bug.cgi?id=1082308 https://bugzilla.redhat.com/show_bug.cgi?id=1551401 https://bugs.freedesktop.org/show_bug.cgi?id=105626 https://bugs.freedesktop.org/show_bug.cgi?id=105319 |
||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||
Attachments: |
|
Description
Dominik 'Rathann' Mierzejewski
2018-02-20 11:06:05 UTC
After logging in I get no output on the second screen attached to HDMI port and the Xorg session doesn't start fully. I can only see the wallpaper on the built-in display. Mouse cursor moves, but doesn't respond to clicks. Machine remains accessible via ssh. There are no errors or warnings in Xorg log. 4.14.18-300.fc27.x86_64 is the last working Fedora kernel. Thank you for the bug report. Is this easily reproducible (and if yes, how)? Otherwise, what were you doing when this happened? It happens on every login. My user display configuration is such that it switches the HDMI output on and internal LCD off, while the login screen (lightdm) uses the internal LCD only. Immediately after logging in, I get the display freeze when it tries to drive the HDMI output. The mouse cursor is still moving. Apparently the nternal LCD is driven by i915 directly while the HDMI output goes via nouveau. Created attachment 137532 [details]
Log from journalctl
I have the same error in the kernel with the nouveau driver on openSUSE Tumbleweed (kernel 4.15.3 and 4.15.4). This is on a desktop (Dell Optiplex 790 with an NVIDIA GT218 (NVS300) dual display) They seem to happen at random in my system. I can't pinpoint an action when this occurs. Attached the logs from journalctl My bugreport on openSUSE: https://bugzilla.opensuse.org/show_bug.cgi?id=1082308 Greetings, Philip I got here by googling for "IP: nouveau_mem_host+0x47/0x1b0 [nouveau]". This leads me to think that my problem is (partly) this problem. As I type this (using a different computer) my screen is hung. But my computer is working. The machine is running Fedora 27 with all updates (except the latest proprietary nvidia driver, which fails). The kernel is 4.15.4-300.fc27.x86_64 I'm intending that the nouveau driver be suppressed in favour of the nvidia driver. Historically (i.e. for about 3 years) nouveau didn't work on this setup (GTX 650 driving a Seiki UltraHD monitor at 30Hz). For some reason, nouveau is running and the nvidia driver is not. Even though I have these kernel parameters: rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 In any case, nouveau is now running, and I can log in, but when I start firefox (with a LOT of tabs), nouveau dereferences a NULL and the screen freezes (but not the mouse). I seem to remember that when I tried nouveau in the past, it would hang in a similar way. My hypothesis (untested) was that my large number of firefox tabs would exhaust some nouveau resource. I did not have that problem with the nvidia proprietary driver. So I have multiple problems, but one of them is this nouveau bug. I don't expect a solution to my other problems to come up in this bz. I will attach a dmesg. PS: why is the status NEEDSINFO? I don't see where there is an outstanding request for info. I will try to change the status to NEW. Created attachment 137583 [details]
dmesg from hang on Hugh's system
Created attachment 137731 [details] [review] Proposed patch Could anyone please try the attached patch (from https://github.com/skeggsb/nouveau/pull/1)? (In reply to D. Hugh Redelmeier from comment #7) > PS: why is the status NEEDSINFO? I don't see where there is an outstanding > request for info. I will try to change the status to NEW. Simply because no one changed the status since the information was provided. :-) Created attachment 137793 [details] Bootlog patched kernel 4.15.7 Hi, I've tried a patched kernel provided by Takashi Iwai on openSUSE (see http://bugzilla.opensuse.org/show_bug.cgi?id=1082308 ) But then my system would crash at startup Included my bootlog with that kernel. (In reply to Pierre Moreau from comment #9) > Created attachment 137731 [details] [review] [review] > Proposed patch > > Could anyone please try the attached patch (from > https://github.com/skeggsb/nouveau/pull/1)? I can confirm that the patch fixes the bug for me when applied to Fedora kernel (4.15.7-300.fc27 tested this time). Thanks! See also <a href=https://bugzilla.redhat.com/show_bug.cgi?id=1551401>https://bugzilla.redhat.com/show_bug.cgi?id=1551401</a> Created attachment 138127 [details] Log from patched opensuse I've installed a patched kernel for opensuse (details: https://bugzilla.opensuse.org/show_bug.cgi?id=1082308) But the problem still occurs when opening JPG's like https://www.dropbox.com/s/gex21o67q31aytx/PRP_5808.jpg?dl=0 This are JPG's that I exported from Darktable attached is the error from journalctl I have to login through ssh with my phone and then I can force a reboot. (the graphics freeze, only the cursor is working) Fyi, the attached patch has been submitted along fixes to DRM. It doesn’t look like it has landed yet, but might be part of 4.16-rc6. (In reply to Philip Raets from comment #13) > Created attachment 138127 [details] > Log from patched opensuse > > I've installed a patched kernel for opensuse (details: > https://bugzilla.opensuse.org/show_bug.cgi?id=1082308) > > But the problem still occurs when opening JPG's like > https://www.dropbox.com/s/gex21o67q31aytx/PRP_5808.jpg?dl=0 > > This are JPG's that I exported from Darktable > > attached is the error from journalctl > > I have to login through ssh with my phone and then I can force a reboot. > (the graphics freeze, only the cursor is working) Since the patch works for the bug report author but not for you, I think you are experiencing another (or an additional) issue. Please open a separate bug report. And thank you all for your replies and trying out the patch :-) (In reply to Pierre Moreau from comment #15) > And thank you all for your replies and trying out the patch :-) Thank you for the quick fix! Created attachment 138305 [details] dmesg with kernel-4.15.11-300.fc27 I am on Fedora 27 x86_64 with MATE Desktop Environment. The fix posted here has been included in the new package kernel-4.15.11-300.fc27.x86_64.rpm. (Font: https://bugzilla.redhat.com/show_bug.cgi?id=1547037 ) I installed kernel-4.15.11-300.fc27 from updates-testing repository but it doesn't resolve. $ LANG=en dnf list kernel-4.15.11-300.fc27 Failed to set locale, defaulting to C Last metadata expiration check: 0:00:15 ago on Thu Mar 22 15:50:44 2018. Installed Packages kernel.x86_64 4.15.11-300.fc27 @updates-testing Same freeze after login via lightdm. Dmesg attached. Created attachment 138354 [details]
journalctl (kernel-4.15.12-301.fc27.x86_64
Fedora kernel-4.15.12-301.fc27.x86_64 from updates-testing repository still doesn't resolve.
Display adapter (from lspci) is:
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
Attach is an excerpt from "journalctl -k -b -1 --no-pager --no-hostname".
(In reply to Stefano Biagiotti from comment #18) > Created attachment 138354 [details] > journalctl (kernel-4.15.12-301.fc27.x86_64 > > Fedora kernel-4.15.12-301.fc27.x86_64 from updates-testing repository still > doesn't resolve. > > Display adapter (from lspci) is: > 01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS > Rev. 2] (rev a1) > > Attach is an excerpt from "journalctl -k -b -1 --no-pager --no-hostname". The patch was reported as working by the person who opened this bug report, so I am changing this bug report back to fixed. Since it does not seem to be the case for you (and you are using a GPU from a different family, Tesla vs Fermi), you should open a different bug report. There has been other reports of the patch not being enough on another Tesla card (though on a different chipset): you might want to look at https://bugs.freedesktop.org/show_bug.cgi?id=105626 and https://bugs.freedesktop.org/show_bug.cgi?id=105687. Also, please try to avoid posting excerpts of logs: there can other errors happening before this NULL pointer dereference, and seeing the different messages outputed by Nouveau during its initialisation can help shed some light on what is going wrong; for example, there is a bug report, also on G98, of EVO timing out since updating to 4.15 (https://bugs.freedesktop.org/show_bug.cgi?id=105319), maybe you are experiencing that as well? Should fix the nouveau_mem_host issue: https://github.com/skeggsb/nouveau/commit/bdc36dcf3fe469e6bb2a1366452dcb16b84e8bcf |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.