Bug 23445 - NV46: GPU lockup with gdm prompt AND logged in fbcon
Summary: NV46: GPU lockup with gdm prompt AND logged in fbcon
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-21 16:01 UTC by Renato Caldas
Modified: 2011-12-05 11:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg log (247.59 KB, text/plain)
2010-03-14 16:26 UTC, Alexander Trubitsyn
no flags Details
lsmod log (3.17 KB, text/plain)
2010-03-14 16:27 UTC, Alexander Trubitsyn
no flags Details
lspci -vv log (20.92 KB, text/plain)
2010-03-14 16:27 UTC, Alexander Trubitsyn
no flags Details
xorg.log (33.28 KB, text/x-log)
2010-03-14 16:28 UTC, Alexander Trubitsyn
no flags Details

Description Renato Caldas 2009-08-21 16:01:34 UTC
I got these errors on the logs after switching from Xorg to another VT, and logging in:

(...)
Aug 21 23:39:50 pinguinus login[5571]: ROOT LOGIN  on '/dev/tty1'
Aug 21 23:40:01 pinguinus cron[5579]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Aug 21 23:40:02 pinguinus [   47.488542] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 0/3 Mthd 0x1268 Data 0x00000000
Aug 21 23:40:02 pinguinus [   47.488548] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 21 23:40:02 pinguinus [   47.488555] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 0/2 Mthd 0x0c6c Data 0x00000000
Aug 21 23:40:02 pinguinus [   47.488561] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 21 23:40:02 pinguinus [   47.488587] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 21 23:40:03 pinguinus [   47.730059] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 21 23:40:03 pinguinus [   47.977559] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 21 23:40:03 pinguinus [   48.220945] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon

The "GPU lockup" message goes on and on, and the terminal is damn slow. After trying to change back to the X's VT, the system seemed to hang. Ctrl+Alt+SysRq still worked, so I managed to sync disk and reboot:

(...)
Aug 21 23:42:58 pinguinus [  223.124393] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 21 23:42:58 pinguinus [  223.351695] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 21 23:43:03 pinguinus [  228.097813] SysRq : Emergency Sync
Aug 21 23:43:42 pinguinus syslog-ng[5220]: syslog-ng starting up; version='2.1.4'
(...)


Today I've switched to "all-git" versions of Xorg components, and it has been a quite a ride so far :)
Comment 1 Renato Caldas 2009-08-21 16:04:34 UTC
Oh, I'm using the nouveau git kernel:

$ uname -r
2.6.31-rc6-g8fa1abd

I've made the git switch before rebooting, so I've never tested this kernel with the stable Xorg versions... My mistake. I can switch it to a previous commit and test, if this bug is not obvious to you.
Comment 2 Renato Caldas 2009-08-21 16:06:04 UTC
(In reply to comment #1)
> I've made the git switch before rebooting, so I've never tested this kernel
I meant this particular commit. I've been using the nouveau git kernel for quite some time, and updated it yesterday.
Comment 3 Renato Caldas 2009-08-21 16:08:15 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > I've made the git switch before rebooting, so I've never tested this kernel
> I meant this particular commit. I've been using the nouveau git kernel for
> quite some time, and updated it yesterday.
.. not considering today's update. I need to sleep...

Comment 4 Maarten Maathuis 2009-08-21 16:14:24 UTC
Why do you have 4 channels, normally you have channel 0 for the kernel and channel 1 for xorg. Have you installed the (unsupported) nv40 gallium driver?
Comment 5 Renato Caldas 2009-08-21 16:53:47 UTC
(In reply to comment #4)
> Why do you have 4 channels, normally you have channel 0 for the kernel and
> channel 1 for xorg. Have you installed the (unsupported) nv40 gallium driver?
Not anymore, but the problem seems to remain, except a bit different:

Aug 22 00:47:44 pinguinus login[5527]: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
Aug 22 00:47:44 pinguinus login[5569]: ROOT LOGIN  on '/dev/tty2'
Aug 22 00:47:48 pinguinus [   40.444285] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 22 00:47:48 pinguinus [   40.444292] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 22 00:47:48 pinguinus [   40.444298] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000

(...)

Aug 22 00:47:48 pinguinus [   40.445339] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 22 00:47:48 pinguinus [   40.445344] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 22 00:47:48 pinguinus [   40.685178] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 22 00:48:04 pinguinus [   56.864228] SysRq : Emergency Sync
Aug 22 00:48:04 pinguinus [   56.866521] Emergency Sync complete
Aug 22 00:48:16 pinguinus [   68.854374] SysRq : Emergency Sync
Aug 22 00:48:16 pinguinus [   68.854854] Emergency Sync complete
Aug 22 00:48:59 pinguinus syslog-ng[5235]: syslog-ng starting up; version='2.1.4'
Comment 6 Renato Caldas 2009-08-22 05:42:24 UTC
VT switching isn't causing the problem, it only happens after logging in to the fb console. Could this be a kernel bug?

My next test will be to disable the automatic X startup, and see if it still locks up.
Comment 7 Renato Caldas 2009-08-22 06:13:47 UTC
Strangest thing: I can only reproduce this when X is showing the GDM prompt, not after I'm logged in to X.

So to sum up:
  -> Start X system-wide, and leave it in the GDM prompt
  -> Change VT to a console.

  -> Log in
  OR if already logged in,
  -> run dmesg

Problem shows up.
Comment 8 Pekka Paalanen 2009-08-22 06:38:49 UTC
Can you reproduce the "GPU lockup" message completely without X?

Log in, run some commands that print lots of text, and see if it appears in dmesg. It would be nice to rule out interference with X.
Comment 9 Renato Caldas 2009-08-22 08:28:34 UTC
(In reply to comment #8)
> Can you reproduce the "GPU lockup" message completely without X?

It's not reproducible without X.. I've ran things like "dmesg && dmesg && dmesg && dmesg && dmesg", played a hi-res movie in mplayer using "-vo fbdev" and "-vo caca". Nothing.

I've tried with a "gtkterm" session, and the problem didn't occur. Then I tried starting a firefox session, and changed VT while firefox was starting up: full lock up, SysRq wouldn't even work (I was root, not sure if it matters).

Then I tested a gnome-session as a regular user, and the problem didn't show up.

So it seems to be related with changing VT in some circumstances. It seems that its not possible to rule out X after all...
Comment 10 Renato Caldas 2009-08-22 10:20:13 UTC
Further testing proves that it may be gfx-workload related:

I've started a firefox-only session via startx, and then switched VT and issued dmesg a couple of times. It only crashed while the switching was done in the middle of a big page render, namely Gmail. Whenever the switching was done when the page was fully loaded, no lockup.

BTW, what are these PFIFO_INTR for? Who handles them? Could it be that the handler is uninstalled when switching VT, and the lockups occur when the interrupt is fired outside of X?
Comment 11 Maarten Maathuis 2009-08-22 10:27:23 UTC
The interrupt handler isn't uninstalled, we just don't know every interrupt and this is one of them.

So this is purely a problem that occurs at/around VT switching?
Comment 12 Renato Caldas 2009-08-22 10:32:53 UTC
(In reply to comment #11)
> So this is purely a problem that occurs at/around VT switching?

Yes.

I've also been experiencing lockups with xrandr rotation, but I don't know if they're related. I'll check the logs to see if it's the same thing. Brb.
Comment 13 Renato Caldas 2009-08-22 10:39:19 UTC
No, it's not related. There is no actual lockup or similar messages on the logs, just a black display. I'll submit another report for it.

So it is just VT switching related.
Comment 14 Renato Caldas 2009-08-28 15:46:23 UTC
The last kernel commit (b3f15458cb2660f430a68a9134a8427f7838d450) changed the output a bit:

Aug 28 23:35:31 pinguinus login[5576]: ROOT LOGIN  on '/dev/tty1'
Aug 28 23:35:33 pinguinus [   39.975293] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 0/3 Mthd 0x1430 Data 0x00000000
Aug 28 23:35:33 pinguinus [   39.975305] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 28 23:35:33 pinguinus [   39.975312] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
(...)
Aug 28 23:35:33 pinguinus [   39.975678] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 28 23:35:33 pinguinus [   39.975702] nouveau 0000:01:00.0: AIII, invalid/inactive channel id 32
Aug 28 23:35:33 pinguinus [   39.975709] nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 28 23:35:33 pinguinus [   39.975714] nouveau 0000:01:00.0: PGRAPH_ERROR - Ch -1/2 Class 0x0000 Mthd 0x0c3c Data 0x00000000:0x00000000
Aug 28 23:35:33 pinguinus [   39.975724] nouveau 0000:01:00.0: AIII, invalid/inactive channel id 32
Aug 28 23:35:33 pinguinus [   39.975731] nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 28 23:35:33 pinguinus [   39.975735] nouveau 0000:01:00.0: PGRAPH_ERROR - Ch -1/2 Class 0x0000 Mthd 0x0be4 Data 0x00000000:0x00200108
Aug 28 23:35:33 pinguinus [   39.975746] nouveau 0000:01:00.0: AIII, invalid/inactive channel id 32
(...)
Aug 28 23:35:33 pinguinus [   39.976048] nouveau 0000:01:00.0: AIII, invalid/inactive channel id 32
Aug 28 23:35:33 pinguinus [   39.976054] nouveau 0000:01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 28 23:35:33 pinguinus [   39.976058] nouveau 0000:01:00.0: PGRAPH_ERROR - Ch -1/2 Class 0x0000 Mthd 0x0c20 Data 0x00000000:0x00000000
Aug 28 23:35:33 pinguinus [   39.976085] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 0/0 Mthd 0x0c24 Data 0x00000000
Aug 28 23:35:33 pinguinus [   39.976122] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 1
Aug 28 23:35:33 pinguinus [   39.976128] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 28 23:35:33 pinguinus [   39.976156] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 28 23:35:33 pinguinus [   39.976163] nouveau 0000:01:00.0: Unhandled PFIFO_INTR - 0x00010000
Aug 28 23:35:33 pinguinus [   40.039606] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Aug 28 23:36:07 pinguinus [   74.083930] SysRq : Emergency Sync
Aug 28 23:36:46 pinguinus syslog-ng[5219]: syslog-ng starting up; version='2.1.4'

And the console didn't slowdown anymore, but that was the purpose :) I'm not sure if this output is related to this bug though. It may be a bug with the software fallbacks... Comments?
Comment 15 Renato Caldas 2009-09-06 03:21:47 UTC
Heads up, I can reproduce this in Fedora Rawhide.
Comment 16 Alexander Trubitsyn 2010-03-14 16:26:15 UTC
Created attachment 34044 [details]
dmesg log

1. Install Open Suse 11.2 64-bit, 2. download & install latest kernel 2.6.33-31-desktop.
On every boot in dmesg see GPU lockup message
Comment 17 Alexander Trubitsyn 2010-03-14 16:27:05 UTC
Created attachment 34045 [details]
lsmod log
Comment 18 Alexander Trubitsyn 2010-03-14 16:27:35 UTC
Created attachment 34046 [details]
lspci -vv log
Comment 19 Alexander Trubitsyn 2010-03-14 16:28:01 UTC
Created attachment 34047 [details]
xorg.log
Comment 20 Marcin Slusarz 2010-10-13 12:36:04 UTC
Can you test current git? I think it's already fixed (2.6.35 might be good too)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.