Bug 71382

Summary: [NV4C] Nouveau dmesg error, computer hangs
Product: xorg Reporter: Martin <martin.kaffanke>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: martin.kaffanke
Version: unspecified   
Hardware: All   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=87361
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output as long as I can get ssh on my machine
none
dmesg output until hang started none

Description Martin 2013-11-08 10:02:54 UTC
Created attachment 88884 [details]
dmesg output as long as I can get ssh on my machine

Hi there,

Slowly I'm getting sick of that many hangs on my computer.  Sometimes, at most if i login to my account (using ubuntu and unity) and imediatelly start firefox and thunderbird at the same time, my computer gets bad stripes and is not reachable via ssh anymore.

~$ uname -a
Linux martin-desktop 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I will add a dmesg, maybe you can tell me if this is a nouveau bug or does it belong to unity or compiz?  I hope you can tell me how to get additional information about that.

The dmesg output is of course not all, because I cannot get a dmesg when it allready hangs.  This is just before it started to hang.

Thanks,
Martin
Comment 1 Ilia Mirkin 2013-11-08 16:03:20 UTC
I see you're using Ubuntu, I think its newer packages are compiled with gcc-4.8, and we recently fixed a nastly bug that shows up on libdrm compiled with gcc-4.8: http://cgit.freedesktop.org/mesa/drm/commit/?id=482abbfafb56cbceaf5355c026434e638cddd0f1

Could you either try this with libdrm compiled with clang or gcc-4.7 (or earlier), or alternatively try a libdrm with that patch applied?

There's a deb here that should have the fix: http://ftp.de.debian.org/debian/pool/main/libd/libdrm/libdrm-nouveau2_2.4.46-4_amd64.deb

Although no idea if it'll install correctly on your system.
Comment 2 Martin 2013-11-09 11:18:55 UTC
Thanks so much, this seems to be the solution.

I took the original ubuntu saucy (13.10) package, created the patch to linked for me by hand, compiled it and installed it, so it differs realy just with that patch from the original ubuntu package.

I put it on launchpad 
https://launchpad.net/~martin-kaffanke/+archive/personalbugfixes

So ubuntu saucy (13.10) users can use it by doing

# sudo add-apt-repository ppa:martin-kaffanke/personalbugfixes
# sudo apt-get update && sudo apt-get upgrade

in the terminal.

Thanks.

PS. If the bug comes again because I tested it just a few hours till now,
I'll come back. :)
Comment 3 Martin 2013-11-09 13:22:06 UTC
Ok, I was to fast, I had another hang with the bad stripes, but I don't know how to debug that.

Do you think there could be aditional log for that somewhere in the logfiles?


Do you think its that?

Nov  9 13:09:41 martin-desktop kernel: [  129.247153] nouveau E[    PBUS][0000:00:0d.0] MMIO write of 0x00000000 FAULT at 0x00b020
Nov  9 13:09:41 martin-desktop kernel: [  129.251668] nouveau E[    PBUS][0000:00:0d.0] MMIO write of 0x0a220001 FAULT at 0x00b020

(maybe you know how to patch my packages to get rid of that?)

Where is more information?

Thanks,
Martin
Comment 4 Martin 2013-11-09 13:38:30 UTC
It could also be that here:

Nov  9 14:28:11 martin-desktop kernel: [ 4839.864023] nouveau E[Xorg[1047]] failed to idle channel 0xcccc0000 [Xorg[1047]]

or here:

Nov  9 14:28:26 martin-desktop kernel: [ 4855.176026] nouveau E[compiz[2301]] failed to idle channel 0xcccc0000 [compiz[2301]]

maybe you can also tell me how to patch that?

Thanks,
Martin
Comment 5 Emil Velikov 2013-11-09 13:59:34 UTC
Hi Martin

Can you do us a favour and attach complete dmesg output :) Posting random snippets does not provide any context or background.

AFAIK messages like the following should be (relatively) harmless.
nouveau E[    PBUS][0000:00:0d.0] MMIO write of 0x???????? FAULT at 0x00b???

With that said it would be great if you can get target the following questions
* How often does this happen
* How can we reproduce it

Cheers
Comment 6 Martin 2013-11-09 14:31:43 UTC
(In reply to comment #5)
> Can you do us a favour and attach complete dmesg output :) Posting random
> snippets does not provide any context or background.

Ok, I'll try to get a new one.

> AFAIK messages like the following should be (relatively) harmless.
> nouveau E[    PBUS][0000:00:0d.0] MMIO write of 0x???????? FAULT at 0x00b???

Ok, thanks.

> * How often does this happen

Sometimes its hard to work on my computer, but the fix above seems to catch many of the hangs, I will see that the next days.   Normally its about twice a day.

The last time it was, when I switched on HUD for unity.  I disabled this now, since then it works fine.  The hud gave me:

Nov  9 14:28:13 martin-desktop kernel: [ 4841.054362] traps: hud-service[2006] trap int3 ip:7f8ce47263d9 sp:7fff1d1bcde0 error:0

I don't know if this can hang a machine...

> * How can we reproduce it

That would be great if I could find that out. :)  For me I cannot see any systematic process behind this hangs.

Martin
Comment 7 Martin 2013-11-15 18:51:41 UTC
I found a way to reproduce it, when I was searching for unity debugging.

Doing

/usr/lib/nux/unity_support_test -p 

in my terminal immediately hangs my system.  But now I don't know how to get debug information out of this.  Maybe you can help me with that step?

Thanks,
Martin
Comment 8 Camilo Gonzalez 2014-03-22 13:45:10 UTC
I have got the same problem with my NV4C chipset.

I have been looking for the same bug in Fedora bugzilla, but have not found.

I am using 32-bits KDE Fedora 19 with latest version of nouveau.

There are 2 main ways to reproduce the hang, although it happens randomly:

1.- Trying Firefox startup.
2.- Launching KInfoCenter and clicking on Graphical Information -> OpenGL.

Attempting these 2 actions repeteadly should cause the hang.

There is an other trail: the hang began to happen when I enabled hardware acceleration (kernel parameter nouveau.noaccel=0).

I provoked the hang while ran the command 'dmesg -T -w -x > ./CGTdmesg.log' on a Konsole terminal.
I have attached the file CGTdmesg.log for more information.
However, there is no significant logs in it.
Everybody can see that a message was being written when the hang started.

I hope this information to be useful.
Comment 9 Camilo Gonzalez 2014-03-22 13:51:30 UTC
Created attachment 96199 [details]
dmesg output until hang started
Comment 10 Jan Jasper de Kroon 2014-12-16 16:19:12 UTC
Hello everybody.

I also filed the same bug only in the Mesa section.
It can be found over here: https://bugs.freedesktop.org/show_bug.cgi?id=87361
The solution to make the system workable is to append a kernel boot parameter.
The parameter is: nouveau.config=NvMSI=0
This seems to be incompatible with the NV4C chipset.
Someone on IRC pointed out this function may need to be blacklisted for this chipset, so it works out of the box for everybody.

Greetings Jan Jasper de Kroon
Comment 11 Christopher M. Penalver 2016-02-23 05:46:23 UTC
Martin, Saucy is EOL as of July 17, 2014. For more on this, please see https://wiki.ubuntu.com/Releases.

If this is reproducible in a supported release, it will help immensely if you filed a new report with Ubuntu by ensuring you have the package xdiagnose installed, and that you click the Yes button for attaching additional debugging information running the following from a terminal:
ubuntu-bug xorg

Also, please feel free to subscribe me to it.

For more on why this is helpful, please see https://wiki.ubuntu.com/ReportingBugs.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.