Bug 39980 - [NV86] freezing nouveau
Summary: [NV86] freezing nouveau
Status: RESOLVED INVALID
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: 7.6 (2010.12)
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 45230 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-08-10 06:44 UTC by Christoph Anton Mitterer
Modified: 2014-07-14 01:38 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg log (34.88 KB, text/x-log)
2011-08-10 06:44 UTC, Christoph Anton Mitterer
no flags Details

Description Christoph Anton Mitterer 2011-08-10 06:44:39 UTC
Created attachment 50095 [details]
xorg log 

Hi.

I experience quite frequent freezes of the system since I've switched over to nouveau... :(

This system here runs Debian unstable on AMD64, the GPU is a G86 [GeForce 8400M G].

Symptom is that X suddenly freezes (although the system seems to keep running in the back). Mouse continues to work, keyboard doesn't although SysRq is possible to some extent (well sync/umount/boot).
Going back to console does not work.

I've attached an Xorg.log (although this didn't contain any errors).


Also some kernel log messages, collected from various events:
kern.log.2.xz:Jul 30 20:19:44 heisenberg kernel: [16890.395990] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 30 20:19:44 heisenberg kernel: [16890.395995] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 30 20:19:44 heisenberg kernel: [16890.396031] [drm] nouveau 0000:01:00.0: validate: -12
kern.log.2.xz:Jul 30 20:21:38 heisenberg kernel: [17004.414063] [drm] nouveau 0000:01:00.0: EvoCh 2 Mthd 0x0080 Data 0x00000000 (0x000b 0x05)
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.482834] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.482840] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.482867] [drm] nouveau 0000:01:00.0: validate: -12
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.495997] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.496136] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 30 20:22:55 heisenberg kernel: [17081.496155] [drm] nouveau 0000:01:00.0: validate: -12
kern.log.2.xz:Jul 30 20:22:58 heisenberg kernel: [17084.554033] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 30 20:22:58 heisenberg kernel: [17084.554036] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 30 20:22:58 heisenberg kernel: [17084.554056] [drm] nouveau 0000:01:00.0: validate: -16


kern.log.3.xz:Jul 21 20:12:56 heisenberg kernel: [29940.042744] [drm] nouveau 0000:01:00.0: Error allocating channel PRAMIN: -28
kern.log.3.xz:Jul 21 20:12:56 heisenberg kernel: [29940.042748] [drm] nouveau 0000:01:00.0: init pramin
kern.log.3.xz:Jul 21 20:12:56 heisenberg kernel: [29940.042750] [drm] nouveau 0000:01:00.0: gpuobj -28


kern.log.2.xz:Jul 29 20:48:19 heisenberg kernel: [ 5362.017674] [drm] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 4/1 Mthd 0x0060 Data 0xbeef0201


kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.943625] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.943627] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.943632] [drm] nouveau 0000:01:00.0: validate: -12
kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.981260] [drm] nouveau 0000:01:00.0: fail ttm_validate
kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.981261] [drm] nouveau 0000:01:00.0: validate vram_list
kern.log.2.xz:Jul 29 12:39:11 heisenberg kernel: [ 5129.981267] [drm] nouveau 0000:01:00.0: validate: -12


kern.log.1:Aug  5 21:14:12 heisenberg kernel: [11333.244292] [drm] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 2/1 Mthd 0x0060 Data 0xd8000001


kern.log.1:Aug  5 16:34:38 heisenberg kernel: [12607.195869] [drm] nouveau 0000:01:00.0: PFIFO_CACHE_ERROR - Ch 4/1 Mthd 0x0060 Data 0xbeef0201
k



In this case, it apparently just killed my window manager, falling back to metacity, and I could somehow cleanly reboot the system
Aug 10 14:43:40 heisenberg kernel: [ 1061.085190] [drm] nouveau 0000:01:00.0: fail ttm_validate
Aug 10 14:43:40 heisenberg kernel: [ 1061.085195] [drm] nouveau 0000:01:00.0: validate vram_list
Aug 10 14:43:40 heisenberg kernel: [ 1061.085200] [drm] nouveau 0000:01:00.0: validate: -12
Aug 10 14:53:25 heisenberg kernel: [ 1646.803467] [drm] nouveau 0000:01:00.0: fail ttm_validate
Aug 10 14:53:25 heisenberg kernel: [ 1646.803472] [drm] nouveau 0000:01:00.0: validate vram_list
Aug 10 14:53:25 heisenberg kernel: [ 1646.803477] [drm] nouveau 0000:01:00.0: validate: -12
Aug 10 14:53:25 heisenberg kernel: [ 1646.861090] [drm] nouveau 0000:01:00.0: fail ttm_validate
Aug 10 14:53:25 heisenberg kernel: [ 1646.861096] [drm] nouveau 0000:01:00.0: validate vram_list
Aug 10 14:53:25 heisenberg kernel: [ 1646.861102] [drm] nouveau 0000:01:00.0: validate: -12
Aug 10 15:10:38 heisenberg kernel: [ 2679.960989] [drm] nouveau 0000:01:00.0: fail ttm_validate
Aug 10 15:10:38 heisenberg kernel: [ 2679.960994] [drm] nouveau 0000:01:00.0: validate vram_list
Aug 10 15:10:38 heisenberg kernel: [ 2679.961027] [drm] nouveau 0000:01:00.0: validate: -12
Aug 10 15:10:44 heisenberg kernel: [ 2685.310110] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000000 warp 0, opcode
00000000 00000000
Aug 10 15:10:44 heisenberg kernel: [ 2685.310117] [drm] nouveau 0000:01:00.0: PGRAPH - TRAP 
Aug 10 15:10:44 heisenberg kernel: [ 2685.310121] [drm] nouveau 0000:01:00.0: PGRAPH - ch 5 (0x0006496000) subc 7 class 0x8297 mthd 0x1a1c data 0x00001111
Aug 10 15:10:44 heisenberg kernel: [ 2685.346157] [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000000 warp 0, opcode 00000000 00000000
Aug 10 15:10:44 heisenberg kernel: [ 2685.346165] [drm] nouveau 0000:01:00.0: PGRAPH - TRAP
Aug 10 15:10:44 heisenberg kernel: [ 2685.346169] [drm] nouveau 0000:01:00.0: PGRAPH - ch 5 (0x0006496000) subc 7 class 0x8297 mthd 0x1a1c data 0x00001111

(was just doing some libreoffice stuff,... so there shouldn't have been any 3D problems included!?)



There usually follows some filesystem corruption from this...
Which makes the whole thing barely usable.


Not sure whether this is related, but I had some severe problems of freezing nouveau, which seemed to be related with the system going out of main memory.
(Had several VirtualBox VMs started,...) This was quite easily reproduceable.



Any ideas? Or is there at least some way to certainly kill nouveau/X and go back to a working console?


If you need further data, please tell me which.


Cheers,
Chris.
Comment 1 Christoph Anton Mitterer 2011-08-10 06:47:32 UTC
Oh and some versions:

linux 3.0
$ apt-cache show xserver-xorg | grep Version
Version: 1:7.6+7
$ apt-cache show xserver-xorg-core | grep Version
Version: 2:1.10.3-1
$ apt-cache show xserver-xorg-video-nouveau | grep Version
Version: 1:0.0.16+git20110411+8378443-1+b1
$ apt-cache show libdrm-nouveau1a | grep Version
Version: 2.4.26-1
$ apt-cache show libgl1-mesa-dri-experimental | grep Version
Version: 7.10.3-4
Comment 2 Emil Velikov 2011-08-10 11:24:11 UTC
Hi Christoph

Would you mind removing the following package in order to establish if it's related/triggers the issue

libgl1-mesa-dri-experimental

Note that by doing so your compositing manager will/should fallback to software rendering (you will not have any 3d/GL hardware acceleration) thus your desktop experience is not going to be as smooth

For future reference would you mind attaching the whole log (dmesg) rather than pasting fragments of it in the report

Can you please take a look at our Bugs [1] and FAQ [2] section to eliminite one of the most common root cause

Cheers
Emil

[1] http://nouveau.freedesktop.org/wiki/Bugs
[2] http://nouveau.freedesktop.org/wiki/FAQ
Comment 3 Christoph Anton Mitterer 2011-08-10 11:53:54 UTC
Hi.

>Would you mind removing the following package in order to
>establish if it's related/triggers the issue
>libgl1-mesa-dri-experimental
I feared you'd ask this ^^...
Well I can give it a try, of course I'll loose compiz, which I'm particularly used to... might very well happen that it doesn't happen without this.

>For future reference would you mind attaching the whole log (dmesg)
>rather than pasting fragments of it in the report
ok,.. that was just a huge mess... and apart for the usual log messages detection the card/etc. there was not DRI related output at all.


I just hat the same(?) issue on a nother machine, with basically the same software config but a G94 (Geforce 9600 GT) with two monitors attached.
But this time,.. no output to the logs at all. Neither xorg.log, nor kernel log.
And even the SysRq messages made it into the syslog before the system rebooted.
This particular system shows also a very easily reproducable bug, but I'll report that in a spearate bug report.


>Can you please take a look at our Bugs [1] and FAQ [2]
>section to eliminite one of the most common root cause
I've actually read them,.. but honestly,... some things you suggest there are rather "difficult" (and I've studied computer science) for end users to do... especially using git head for everything is quite an effort, especially when you want to keep rather in sync with your distro.


Cheers,
Chris.
Comment 4 Christoph Anton Mitterer 2011-08-10 11:55:52 UTC
Oh btw: isn't it somehow possible to add functionallity if nouveau detects some longer lasting lockup, that it kills itself,.. going back to some basic graphic mode? Magic-sysrq + g seem to not work :(
Comment 5 Christoph Anton Mitterer 2012-03-26 19:42:09 UTC
*** Bug 45230 has been marked as a duplicate of this bug. ***
Comment 6 Christoph Anton Mitterer 2012-03-26 19:47:35 UTC
This still persists basically.

Now with:
Linux 3.2.12
libgl1-mesa-dri/libgl1-mesa-glx 7.11.2-1
libdrm-nouveau1a 2.4.32-1
xserver-xorg-video-nouveau 1:0.0.16+g


One thing I've noticed:
It's possible to see the problems approaching (even when not looking at the
kernel output)...
The screen starts to flicker (especially when switching virtual desktops, due
to the compiz animation stuff) in some areas,.. if you then continue to work it
will usually freeze.
I've also noticed that closing windows (e.g. some terminals) usually helps then
and also stops the flickering.

When the freeze however happens, one has now usually a few seconds time to
Ctrl+Alt+F1 to the console and Ctrl+Alt+Enf or ACPI Power Button Event.
Thereby you at least shut down cleanly though still loosing all your work.

:-(
Comment 7 freeclimbing 2013-04-25 13:56:46 UTC
Hello,

I have the same problem: X freezes, mouse is working, keyboard does (mostly) not work. But: I can still switch to console with Ctrl+Alt+F1 or sth like that. 

In my syslog i find messages like that:

Apr 25 14:27:05 foo kernel: [1390386.065164] [drm] nouveau 0000:0f:00.0: fail ttm_validate
Apr 25 14:27:05 foo kernel: [1390386.065168] [drm] nouveau 0000:0f:00.0: validate vram_list
Apr 25 14:27:05 foo kernel: [1390386.065177] [drm] nouveau 0000:0f:00.0: validate: -12
Apr 25 14:28:00 foo kernel: [1390440.453306] [drm] nouveau 0000:0f:00.0: fail ttm_validate
Apr 25 14:28:00 foo kernel: [1390440.453310] [drm] nouveau 0000:0f:00.0: validate vram_list
Apr 25 14:28:00 foo kernel: [1390440.453337] [drm] nouveau 0000:0f:00.0: validate: -12
Apr 25 14:28:54 foo kernel: [1390494.359876] [drm] nouveau 0000:0f:00.0: fail ttm_validate
Apr 25 14:28:54 foo kernel: [1390494.359880] [drm] nouveau 0000:0f:00.0: validate vram_list
Apr 25 14:28:54 foo kernel: [1390494.359887] [drm] nouveau 0000:0f:00.0: validate: -12
Apr 25 14:34:01 foo gdm3][17808]: GLib-GIO-WARNING: Dropping signal ActiveSessionChanged of type (s) since the type from the expected interface is (o)
Apr 25 14:34:03 foo acpid: client 17732[0:0] has disconnected

My solution is then normally to restart gdm, then its working again (for some time). Its absolutely not reproducable, for me its totally random occurance.

Some of my system settings:
Debian testing
% uname -a
Linux foo 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 GNU/Linux
% lspci G NVIDIA
0f:00.0 VGA compatible controller: NVIDIA Corporation G98 [Quadro NVS 295] (rev a1)
% show xserver-xorg G Version
Version: 1:7.7+2
% show xserver-xorg-core G Version
Version: 2:1.12.4-6
% show xserver-xorg-video-nouveau G Version
Version: 1:1.0.1-5
% show libdrm-nouveau1a G Version
Version: 2.4.40-1~deb7u2
% show libgl1-mesa-dri-experimental G Version
Version: 8.0.5-4

I try to give any information you need! Thx for any help!
Markus
Comment 8 Ilia Mirkin 2013-09-06 22:04:01 UTC
Does this still happen with recent software (kernel 3.11, mesa 9.2, xf86-video-nouveau 1.0.9)? If so, please post fresh dmesg/xorg logs.
Comment 9 Ilia Mirkin 2013-10-08 17:01:03 UTC
No response to re-test request after a month. Closing as invalid.
Comment 10 Christoph Anton Mitterer 2014-07-14 01:38:37 UTC
Sorry for not having responded,... forgot this somehow. Anyway, I no longer have nvidia cards, so I couldn't have tested it anymore :(


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.