Bug 88893

Summary: [NV46] GPU lockup with Quadro NVS 110M
Product: xorg Reporter: Richard <hobbes1069>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: jwrdegoede, russianneuromancer
Version: 7.7 (2012.06)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg log
none
System journal in lieu of X log
none
Logs with nouveau_noaccel=1
none
Logs with nouveau_noaccel=1 none

Description Richard 2015-02-01 02:56:48 UTC
System Info:
Dell Latitude D620
Nvidia Quadro NVS 110M
Fedora 21 x86_64

With initial system install with 1.0.9 I could get a usable system with nouveau.noaccel=1.

After an update which included 1.0.11 I would get the GPU lockup regardless of the kernel parameter. I also tried current git with the same result.

libdrm is current 2.4.58.
Comment 1 Richard 2015-02-01 14:01:05 UTC
Created attachment 113016 [details]
dmesg log
Comment 2 Richard 2015-02-01 14:02:10 UTC
Created attachment 113017 [details]
System journal in lieu of X log

X seems to log to the journal instead of /var/log.
Comment 3 Ilia Mirkin 2015-02-01 14:06:53 UTC
FWIW I don't see anywhere that you're disabling acceleration with nouveau.noaccel=1...
Comment 4 Richard 2015-02-01 14:09:06 UTC
I turned it off to get a unmolested boot. I can reboot with and attach another log.
Comment 5 Richard 2015-02-01 14:23:17 UTC
Created attachment 113018 [details]
Logs with nouveau_noaccel=1

Interesting thing to note... Without noaccel I could get back to a console and have a "usable" system. With noaccel set once I started X I could not get back to console and hand to force power off the machine.
Comment 6 Ilia Mirkin 2015-02-01 14:30:31 UTC
(In reply to Richard from comment #5)
> Created attachment 113018 [details]
> Logs with nouveau_noaccel=1
> 
> Interesting thing to note... Without noaccel I could get back to a console
> and have a "usable" system. With noaccel set once I started X I could not
> get back to console and hand to force power off the machine.

You had "nouveau.accel=1". That doesn't do anything (you can even see 'parameter accel ignored' in the logs).
Comment 7 Richard 2015-02-01 14:49:14 UTC
Created attachment 113019 [details]
Logs with nouveau_noaccel=1

Whoops...

I rebooted with the correct setting I could log in through gdm but it only got as far as a dark textured background and a cursor...
Comment 8 Ilia Mirkin 2015-02-01 14:53:55 UTC
If you're sure it's a new issue in xf86-video-nouveau, please bisect that and figure out what the guilty change was. However I'm more inclined to believe it was a change in the kernel that brought whatever issue about.

If you're bisecting the kernel, you can limit it to nouveau with

git bisect start v3.18 v3.17 -- drivers/gpu/drm/nouveau

assuming that v3.18 is bad and v3.17 is good.
Comment 9 Richard 2015-02-01 15:03:19 UTC
I still had kernel 3.17.4 installed and booted to it with the same result. Dark background with cursor but nothing else.
Comment 10 Richard 2015-02-01 17:38:24 UTC
For the heck of it I tried an older kernel from F19, 3.14.27 and no different.

I haven't mentioned it before and I don't think it showed up in the logs but when I try to reboot after getting GPU lockup I also get errors like "failed to idle channel 0xcccc0000"
Comment 11 Ilia Mirkin 2015-02-01 17:40:52 UTC
(In reply to Richard from comment #10)
> For the heck of it I tried an older kernel from F19, 3.14.27 and no
> different.
> 
> I haven't mentioned it before and I don't think it showed up in the logs but
> when I try to reboot after getting GPU lockup I also get errors like "failed
> to idle channel 0xcccc0000"

Hm, that's surprising if acceleration is disabled. Perhaps fb accel is breaking things... try also adding nouveau.nofbaccel=1 . If it still happens, try removing all traces of vdpau from your system (just removing libvdpau.so.1 should be enough).
Comment 12 Richard 2015-02-01 19:57:01 UTC
(In reply to Ilia Mirkin from comment #11)
> (In reply to Richard from comment #10)
> > For the heck of it I tried an older kernel from F19, 3.14.27 and no
> > different.
> > 
> > I haven't mentioned it before and I don't think it showed up in the logs but
> > when I try to reboot after getting GPU lockup I also get errors like "failed
> > to idle channel 0xcccc0000"
> 
> Hm, that's surprising if acceleration is disabled. Perhaps fb accel is
> breaking things... try also adding nouveau.nofbaccel=1 . If it still
> happens, try removing all traces of vdpau from your system (just removing
> libvdpau.so.1 should be enough).

I need to be more complete in my statements :) Sorry, busy at home so jumping back to the office when I can. I tried a "normal" boot with 3.14.27. I am trying to see if I can find a point where everything just "works". I can find issues that were "fixed" with this specific laptop back in 2010 so I assume this problem is different.

I did go back and try with both options on 3.18.3 and I can login with gdm but no gnome-shell desktop.
Comment 13 Richard 2015-02-03 02:47:53 UTC
Ok, a bit more info!

It turns out that X wasn't completely hanging but it was taking a REALLY long time to get to gnome-shell, long enough that it actually sleeps the monitor and then I get the lock screen when I wake it up. 

Current versions:
nouveau: current git (as of a few days ago)
libdrm: 2.4.59

kernel options: nouveau.noaccel=1

I can attach the journal log but nothing is jumping out except maybe:
kernel: perf interrupt took too long (10103 > 9615), lowering kernel.perf_event_max_sample_rate to 13000

But I'm not sure that's related.
Comment 14 Richard 2015-02-11 03:52:52 UTC
I'm willing to provide additional information if it will help. I understand if there's no real motivation to get a card this old to work.

I finally got a workaround but I don't like it. I installed the nvidia 304xx drivers from RPM Fusion but gdm/gnome-shell doesn't like it either. I finally switched to lightdm and cinnamon and now I have a usable system.
Comment 15 russianneuromancer 2015-10-21 13:00:29 UTC
Richard, try out Linux 4.3 (look into bug 89775 for more info) and KDE with OpenGL ES 2 rendering engine (after login hit Alt+Shift+F12 to disable OpenGL effects, get into System Settings > Display > Effects, switch to OpenGL ES 2 here) or Gnome Shell Wayland session (works for me here on Ubuntu Gnome 15.10 and GeForce 7300 Go).
OpenGL usage for some reason give me issues on NV46 (desktop content doesn't get updated with enabled OpenGL desktop effects) but OpenGL ES 2 suddenly works fine (at least I doesn't notice issues with OpenGL ES 2 yet).
Comment 16 Richard 2015-10-21 13:12:37 UTC
Thanks for the heads up. It looks like 4.3 has landed in Fedora Rawhide so I'll keep an eye on it. Currently running Fedora 21 but I'll probably upgrade when 23 is out of beta. Hopefully 4.3 will make it in, otherwise I can just install the rawhide kernel.
Comment 17 Hans de Goede 2015-10-21 13:23:20 UTC
(In reply to russianneuromancer from comment #15)
> Richard, try out Linux 4.3

Or try an older kernel with nouveau.config=NvMSI=0 on the commandline -- NV46's are reported to have unspecified issues with MSI-style interrupt processing. We've disabled MSI in 4.3 for them, adding that cmdline option should give the same result.
Comment 18 Richard 2015-10-23 13:13:31 UTC
Thanks Hans, that worked out well. I was able to switch back to gnome-shell without any issue.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.