Bug 65827

Summary: computer often fails to boot with nouveau and GeForce GT 425M on newest Linux kernel
Product: xorg Reporter: j.pertres
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
screen shot of my screen on boot
none
dmesg on kernel 3.10.3 none

Description j.pertres 2013-06-16 13:49:33 UTC
Created attachment 80911 [details]
screen shot of my screen on boot

When I turn my computer on, at some point of the booting process I get an endless cascade of almost-identical error messages like "[   45.935830] nouveau E(   PFIFO][0000:01:00.0] SUBFIFO0: (unknown bits 0x00004000)" (see image attached). Apparently the computer doesn't respond to any input, so the only thing I can do is force it to shut down by pressing the power button long.

Most of the times, the system boots OK at the second try, but not even always. If I'm not successful after several attempts, I choose the LTS kernel (3.0.82 instead of 3.9.6) from my GRUB and then this won't happen.

It looks like these error logs messages are not properly kept as log files, probably because I have to interrupt the boot process this way. A 'grep -R "unknown bits"' from /var/log only finds matches in five *.journal binary files in the directory /var/log/journal. This is why I attach a real-world picture of my screen instead of a log file.

One of the last times that I managed to boot with the 3.9 kernel, I noticed that in the file /proc/fb I only had "0 inteldrmfb". Now I am on the 3.0 kernel and I have
"0 inteldrmfb
1 nouveaufb"

On the other hand, the command 'lspci -v | grep VGA' returns similar output on both kernels:
- On 3.9 kernel:
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 425M] (rev ff) (prog-if ff)

- And on 3.0 kernel:
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 425M] (rev a1) (prog-if 00 [VGA controller])

This has been happening for several months already, so the problem didn't start with kernel 3.9 but earlier.

All this said, even if I manage to boot, it looks like my NVIDIA card is never used by any program, only the Intel one. GPU never seems to be available.
Comment 1 Emil Velikov 2013-07-27 00:17:52 UTC
Hi there

First and foremost what is your experience with 3.10.x kernels ? There has been quite a few issues addressed, and a few other in the 3.11-rc2+. Can you try them both ?

(In reply to comment #0)
> Created attachment 80911 [details]
> screen shot of my screen on boot
> 
> When I turn my computer on, at some point of the booting process I get an
> endless cascade of almost-identical error messages like "[   45.935830]
> nouveau E(   PFIFO][0000:01:00.0] SUBFIFO0: (unknown bits 0x00004000)" (see
> image attached). Apparently the computer doesn't respond to any input, so
> the only thing I can do is force it to shut down by pressing the power
> button long.
> 
Does the system respond to REISUB, ssh ?

> Most of the times, the system boots OK at the second try, but not even
> always. If I'm not successful after several attempts, I choose the LTS
> kernel (3.0.82 instead of 3.9.6) from my GRUB and then this won't happen.
> 
Would be great if you can attach dmesg of 3.0.82 and 3.9.6

> It looks like these error logs messages are not properly kept as log files,
> probably because I have to interrupt the boot process this way. A 'grep -R
> "unknown bits"' from /var/log only finds matches in five *.journal binary
> files in the directory /var/log/journal. This is why I attach a real-world
> picture of my screen instead of a log file.
> 
journal is kind of funny on my system wrt correctly storing the logs :\ 

You can filter journal to display messages, coming from the kernel by using "journalctl _TRANSPORT=kernel"

> One of the last times that I managed to boot with the 3.9 kernel, I noticed
> that in the file /proc/fb I only had "0 inteldrmfb". Now I am on the 3.0
> kernel and I have
> "0 inteldrmfb
> 1 nouveaufb"
> 
> On the other hand, the command 'lspci -v | grep VGA' returns similar output
> on both kernels:
> - On 3.9 kernel:
> 00:02.0 VGA compatible controller: Intel Corporation Core Processor
> Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
> 01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT
> 425M] (rev ff) (prog-if ff)
> 
The above looks a bit interesting. Seems like lspci cannot read the full data of the device, thus the 0xff's. Maybe the device is switched off and/or not initialised by the kernel

> - And on 3.0 kernel:
> 00:02.0 VGA compatible controller: Intel Corporation Core Processor
> Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
> 01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT
> 425M] (rev a1) (prog-if 00 [VGA controller])
> 
> This has been happening for several months already, so the problem didn't
> start with kernel 3.9 but earlier.
> 
Would be great if you can be more specific. If 3.10 and 3.11 still fails on your system would be appreciated if you can bisect the commit that introduced this issue

> All this said, even if I manage to boot, it looks like my NVIDIA card is
> never used by any program, only the Intel one. GPU never seems to be
> available.
I'm assuming that that this is an optimus laptop, is that correct ?

I would target the kernel module issue first and then work on on "how to make my optimus laptop work under linux" :P

Cheers
Emil
Comment 2 Emil Velikov 2013-07-27 00:20:32 UTC
Original issue is not mesa related. Correcting component
Comment 3 j.pertres 2013-08-04 20:43:16 UTC
Created attachment 83628 [details]
dmesg on kernel 3.10.3
Comment 4 j.pertres 2013-08-04 20:52:30 UTC
Hi,

I'm running the 3.10 kernel and, even though not everything is perfect, now at least this ugly problem seems gone. I didn't experience it, at least, the last 12 times (within 6 days) that I booted my laptop.

The result of 'lspci | grep VGA' is now:
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 425M] (rev ff)

/proc/fb is still only
0 inteldrmfb

I attached a dmesg on my current kernel (3.10.3).

If I have time I'll try to provide more information as you suggest. Thanks for your work!

Joan
Comment 5 Ilia Mirkin 2013-08-27 15:23:42 UTC
Per the reporter's comments, sounds like the issue is fixed. Feel free to re-open if I've misunderstood.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.