Bug 40406 - Nouveau NVIDIA NV46: Freeze
Summary: Nouveau NVIDIA NV46: Freeze
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: 7.11
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-26 13:36 UTC by Marco
Modified: 2013-08-18 18:09 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
system information (5.08 KB, text/plain)
2011-08-26 13:36 UTC, Marco
Details
glxinfo output (23.69 KB, text/plain)
2011-08-26 13:37 UTC, Marco
Details
lspci (704 bytes, text/plain)
2011-08-26 13:37 UTC, Marco
Details
Xorg.log (31.10 KB, text/plain)
2011-08-26 13:38 UTC, Marco
Details
dmesg after reboot (64.35 KB, text/plain)
2011-08-26 13:38 UTC, Marco
Details
screenshot (418.84 KB, image/jpeg)
2011-08-26 13:41 UTC, Marco
Details
syslog file after crash (39.85 KB, text/plain)
2011-09-08 13:01 UTC, Marco
Details
system freeze xorg-server-1.11.0 (518.62 KB, image/jpeg)
2011-10-01 08:15 UTC, Marco
Details
syslog file after suspend2ram (43.87 KB, text/plain)
2011-10-04 01:11 UTC, Marco
Details
Xorg.log after suspend2ram (55.60 KB, text/plain)
2011-10-04 01:12 UTC, Marco
Details
dmesg shortly after black screen (154.59 KB, text/plain)
2011-10-09 13:14 UTC, Marco
Details
syslog file (without cron job entries) shortly after black screen (12.21 KB, text/plain)
2011-10-09 13:14 UTC, Marco
Details
Xorg.log shortly after black screen (43.55 KB, text/plain)
2011-10-09 13:15 UTC, Marco
Details
screen after crash (341.28 KB, image/jpeg)
2011-10-11 05:25 UTC, Marco
Details
syslog file (without pam messages) (306.17 KB, text/plain)
2011-10-11 05:26 UTC, Marco
Details
oops in iput (1.71 KB, text/plain)
2011-10-11 08:56 UTC, Marcin Slusarz
Details

Description Marco 2011-08-26 13:36:20 UTC
Created attachment 50594 [details]
system information

Hi.

As already mentioned in Bug 40336, I see sporadic freezes of my computer. So far, it happened while using mplayer2, kde konsole, and emacs. Kde desktop effects are deactivated.

Course of events:
At first, I notice kind of a "delay"/unresponsiveness of the system. The mouse works but some programs don't. Mostly, I can still use the SysRq keys. Usually, the result of a SysRq-Kill All Tasks is a frozen system.

Fortunately, today, I still had a screen and I could take pictures, but no log files. Maybe it is enough information to get a rough idea of the problem. The system freeze happened after I tried to log in and save the log files. ;)

I use:
* GeForce Go 7300
* Linux samson 3.0.3-gentoo #1 SMP Thu Aug 25 08:49:17 IDT 2011 i686 Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz GenuineIntel GNU/Linux
* mesa-7.11, libdrm-2.4.26, and xf86-video-nouveau-0.0.16_pre20110801, xorg-server-1.10.2
* suspend2ram
Comment 1 Marco 2011-08-26 13:37:25 UTC
Created attachment 50595 [details]
glxinfo output
Comment 2 Marco 2011-08-26 13:37:52 UTC
Created attachment 50596 [details]
lspci
Comment 3 Marco 2011-08-26 13:38:24 UTC
Created attachment 50597 [details]
Xorg.log
Comment 4 Marco 2011-08-26 13:38:52 UTC
Created attachment 50598 [details]
dmesg after reboot
Comment 5 Marco 2011-08-26 13:41:48 UTC
Created attachment 50599 [details]
screenshot
Comment 6 Marco 2011-09-08 13:01:04 UTC
While I watched a movie (mplayer) today, the same(?) problem resulted in a kernel oops. This time, the syslog daemon was able to write the data to the harddisk. I attached the syslog file.
Comment 7 Marco 2011-09-08 13:01:41 UTC
Created attachment 50990 [details]
syslog file after crash
Comment 8 Emil Velikov 2011-09-22 13:52:27 UTC
Hi Marco

The slowness issue appears to be X related - most likely "[mi] EQ overflowing. The server is probably stuck in an infinite loop."

If/when it happens try to sync and reboot the system (sysrq) and attach xorg and dmesg logs

But before that I would recommend you to move/remove nouveau_dri.so as 3d is currently not supported on your card [1]

Whereas for the kernel oops stored in your log it does not appear to be nouveau related. On the contrary nouveau "locked up" due to it

Cheers
Emil

[1] http://nouveau.freedesktop.org/wiki/MesaDrivers
Comment 9 Marco 2011-09-28 08:08:33 UTC
Hi Emil,

thanks for your reply.

Actually, it try that every time, but a SYSRQ-SYNC locks the system completely. The syslog file was the only file I could save on harddisk so far.

And thanks for the hint. I didn't realize that NV30-NV40 indicates the families and not the code names.

I am just asking, because I haven't found information about the why: why should I not file bug reports? Because the cards are too old and you gave up fixing/improving the nvfx driver? Because you are working on it?
Comment 10 Emil Velikov 2011-09-28 12:52:05 UTC
(In reply to comment #9)
> Hi Emil,
> 
> thanks for your reply.
> 
> Actually, it try that every time, but a SYSRQ-SYNC locks the system completely.
> The syslog file was the only file I could save on harddisk so far.
Now that sounds unusual, any chance of a serial console?

> 
> And thanks for the hint. I didn't realize that NV30-NV40 indicates the families
> and not the code names.
> 
> I am just asking, because I haven't found information about the why: why should
> I not file bug reports? Because the cards are too old and you gave up
> fixing/improving the nvfx driver? Because you are working on it?

A number of people have worked on the 3d/mesa/gallum driver for nv30/40

The latest person Luca Barbieri (lb1) did quite an impressive/substantial work but vanished after that.
Since then no one has stepped in to develop/bugfix (with some exceptions) that code, thus the lack of support

Sorry
Comment 11 Marco 2011-09-29 01:59:55 UTC
Ok, I removed nouveau_dri.so. Unfortunately, it didn't help. I know that probability theory can be tricky. :) But the system lock occurred three times within a period of 24 hours, whereas the past frequency was approximately once a week. One could conclude that removing nouveau_dri.so made it worse.

Here is a short log of uptimes that were affected by this bug.
0 days, 01:49:19  | Linux 3.0.4-gentoo        Thu Sep 29 09:59:24 2011 (running)
9 days, 14:39:05  | Linux 3.0.3-gentoo        Thu Sep  8 22:38:29 2011
11 days, 08:28:40 | Linux 3.0.3-gentoo        Fri Aug 26 22:53:11 2011
6 days, 14:00:20  | Linux 3.0.1-gentoo        Mon Aug 15 19:39:40 2011

Thus, I moved nouveau_dri.so to /usr/lib/mesa again.

This bug is not that bad since I am still be able to work. But I am a little bit concerned about data loss after a crash.

> Now that sounds unusual, any chance of a serial console?

No, unfortunately not. I work abroad at the moment and I just have this laptop without additional equipment. I might check it, when I'll be back home in January, though. ;)

Is there anything else I can do?
I tried to save Xorg.log after a crash. I am not 100% sure if the log file is complete, but I attached it nevertheless. I can't see anything unusual.

> A number of people have worked on the 3d/mesa/gallum driver for nv30/40
> 
> The latest person Luca Barbieri (lb1) did quite an impressive/substantial work
> but vanished after that.
> Since then no one has stepped in to develop/bugfix (with some exceptions) that
> code, thus the lack of support

No problem. I have no experience in writing drivers. I guess, I am of little help here. But the drivers are very impressive. Thanks anyway.
Comment 12 Marco 2011-09-29 02:05:58 UTC
Damn, I can't find the log file. It seems, I copied messages instead of Xorg.log. Sorry. But the last few lines are always the same:

[ 78793.917] (II) Power Button: Close
[ 78793.917] (II) UnloadModule: "evdev"
[ 78793.917] (II) Unloading evdev
[ 78793.949] (II) Sleep Button: Close
[ 78793.949] (II) UnloadModule: "evdev"
[ 78793.949] (II) Unloading evdev
[ 78793.977] (II) AT Translated Set 2 keyboard: Close
[ 78793.977] (II) UnloadModule: "evdev"
[ 78793.977] (II) Unloading evdev
[ 78794.426] (II) UnloadModule: "synaptics"
[ 78794.426] (II) Unloading synaptics
[ 78794.451] (II) Logitech USB-PS/2 Optical Mouse: Close
[ 78794.451] (II) UnloadModule: "evdev"
[ 78794.451] (II) Unloading evdev
[ 78796.215] (II) NOUVEAU(0): NVLeaveVT is called.
[ 78796.216] (II) NOUVEAU(0): Closed GPU channel 1

I used SYSRESQ-R/S/R/E/I/S/U/B
Comment 13 Marco 2011-10-01 08:11:40 UTC
Ok, now, it gets weird.

I upgraded xorg-server a few days ago:
Thu Sep 29 10:14:50 2011 >>> x11-base/xorg-drivers-1.11
Thu Sep 29 10:21:09 2011 >>> x11-base/xorg-server-1.11.0
Thu Sep 29 10:25:48 2011 >>> x11-drivers/xf86-input-synaptics-1.4.0
Thu Sep 29 10:26:14 2011 >>> x11-drivers/xf86-input-evdev-2.6.0
Thu Sep 29 10:27:10 2011 >>> x11-drivers/xf86-video-nouveau 0.0.16_pre20110801

The result is a completely new behaviour. Instead of a beginning sluggishness of the system and a system lock after SYSRQ, I got an immediate system freeze. My music stopped working, no X, no mouse, no keyboard, no log files. Just a screenshot (attached).
Comment 14 Marco 2011-10-01 08:15:59 UTC
Created attachment 51850 [details]
system freeze xorg-server-1.11.0
Comment 15 Emil Velikov 2011-10-01 10:24:07 UTC
Now that is nice :P

There are a couple of things
1. Is there is reliable way of reproducing the lock
2. Can you try without mesa/gallium/(nouveau_dri.so) and provide some logs

Initially it would be great to try and resolve/narrow down the "GRAPH ERROR(s)"
Comment 16 Marco 2011-10-01 10:43:31 UTC
(In reply to comment #15)

> Now that is nice :P
I like surprises. ;)

> There are a couple of things
I really appreciate your help. Thanks.

> 1. Is there is reliable way of reproducing the lock
No. At the beginning, I thought that mplayer2 triggers the bug. It also happens when I am using chromium. Once, it happened when I was typing in a terminal. But this is a nice excuse to watch movies with mplayer instead of working. ;) I will try my best. ;)

> 2. Can you try without mesa/gallium/(nouveau_dri.so) and provide some logs
Could you be a little bit more precise here? Just removing nouveau_dri.so and using nouveau? And do you only need dmesg/messages/Xorg.log?
Comment 17 Emil Velikov 2011-10-01 11:15:25 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > 2. Can you try without mesa/gallium/(nouveau_dri.so) and provide some logs
> Could you be a little bit more precise here? Just removing nouveau_dri.so and
> using nouveau? And do you only need dmesg/messages/Xorg.log?

The idea is to narrow down who is causing this errors - X or GL

* Move/remove the gallium driver (nouveau_dri.so) and make sure that you have swrast_dri.so in the same directory
* Try and reproduce the issue
* Append dmesg/messages or a picture

Thanks
Comment 18 Marco 2011-10-04 01:11:00 UTC
Ok, I haven't experienced any crashes since I switched to sw_rast. But I noticed syslog messages after suspend2ram which might be connected to this problem (cf. line no 280). What do you say?
Comment 19 Marco 2011-10-04 01:11:55 UTC
Created attachment 51917 [details]
syslog file after suspend2ram
Comment 20 Marco 2011-10-04 01:12:32 UTC
Created attachment 51918 [details]
Xorg.log after suspend2ram
Comment 21 Marco 2011-10-04 01:32:02 UTC
Ok, 5min after I sent the last reply, I encountered the problem again. Black screen, no mouse, no logs, even no blinking wifi light.

It might be necessary to downgrade xorg-server from 1.11.0 to 1.10.4.
Comment 22 Marco 2011-10-04 03:41:44 UTC
As an aside: the same messages appear after a suspend2disk.
Comment 23 Marco 2011-10-09 13:12:26 UTC
Ok, fortunately (after downgrading Xorg server), I was able to save log files after a crash. But I am afraid that the crash was not as bad as in the past because a hard reboot was not necessary and the screen was visible again after I killed all tasks (SYSRQ-E). Nevertheless, I attached syslog, Xorg.log, and dmesg.
Comment 24 Marco 2011-10-09 13:14:00 UTC
Created attachment 52154 [details]
dmesg shortly after black screen
Comment 25 Marco 2011-10-09 13:14:49 UTC
Created attachment 52155 [details]
syslog file (without cron job entries) shortly after black screen
Comment 26 Marco 2011-10-09 13:15:58 UTC
Created attachment 52156 [details]
Xorg.log shortly after black screen
Comment 27 Marco 2011-10-11 05:24:41 UTC
Since I switched to sw_rast, I have crashes on a daily basis. It is very annoying. Do you see a chance to find the problem now? I attached a new screenshot&syslog file (again, no information in Xorg.log).
Comment 28 Marco 2011-10-11 05:25:30 UTC
Created attachment 52209 [details]
screen after crash
Comment 29 Marco 2011-10-11 05:26:20 UTC
Created attachment 52210 [details]
syslog file (without pam messages)
Comment 30 Marcin Slusarz 2011-10-11 08:56:57 UTC
Created attachment 52218 [details]
oops in iput

The oops seems to be unrelated to nouveau. I think it has something to do with use of 32-bit kernel and HIGHMEM - please report it to bugzilla.kernel.org or LKML.
Comment 31 Marco 2011-10-12 00:38:39 UTC
(In reply to comment #30)
> The oops seems to be unrelated to nouveau. I think it has something to do with
> use of 32-bit kernel and HIGHMEM - please report it to bugzilla.kernel.org or
> LKML.

Ok. I am going to wait for 3.1 vanilla and try that first. If the bug resists, I will report it. Or I might temporarily switch to the nvidia driver.

But I am not 100% convinced that it is unrelated to nouveau. I am aware that the following points can also be an indicator for a bug of the kernel, but nevertheless, I want to summarize:

a) I receive lots of messages like

Oct  3 16:42:12 localhost kernel: [147178.508173] [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Oct  3 16:42:12 localhost kernel: [147178.508185] [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x000c3000) subc 3 class 0x0039 mthd 0x0bf8 data 0x00100008

after suspend2ram/suspend2disk. So far, the kernel oops only occurred after these messages.

b) Not every black screen led to a kernel oops. Sometimes, I can restart my computer without seeing what I am typing.

c) The kernel oops is not always the same. 2 examples:
IP: [<c10c828e>] shrink_icache_memory+0x1bd/0x236
IP: [<c10c5006>] iput+0x32/0x106

d) The Xorg server and the dri driver strongly influence the behavior.

e) I haven't seen this bug before I started using nouveau.
Comment 32 Ilia Mirkin 2013-08-18 18:09:27 UTC
It appears that this bug report has laid dormant for quite a while. Sorry we haven't gotten to it. Since we fix bugs all the time, chances are pretty good that your issue has been fixed with the latest software. Please give it a shot. (Linux kernel 3.10.7, xf86-video-nouveau 1.0.9, mesa 9.1.6, or their git versions.) If upgrading to the latest isn't an option for you, your distro's bugzilla is probably the right destination for your bug report.

In an effort to clean up our bug list, we're pre-emptively closing all bugs that haven't seen updates since 2011. If the original issue remains, please make sure to provide fresh info, see http://nouveau.freedesktop.org/wiki/Bugs/ for what we need to see, and re-open this one.

Thanks,

The Nouveau Team


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.