Created attachment 93341 [details] Full dmesg of a clean boot Kind of looks like bug #73445. For me, it happens: - almost at random with intervals of hours ranging to days (dammit...) - not (yet) in OpenGL apps (I don't play games anymore) - hangs in firefox , evince - happens while scrolling but also happens during 'stationary' display The symptoms are: - relatively small corruption (10 %) which changes over time somewhat slightly - disk writes only happen in periodic spurs with long pauses (SysRQ + S) did not help Eventually I'm dumped to terminal with an oops, but I don't see anything nouveau related in there. Attaching it anyway. nouveau.config=NvMSI=0 'fixes' it. Or let me put it this way: It never occurred while using this parameter.
Created attachment 93342 [details] Screenshot of garbled OOPS
I forgot to rotate the photo, but it's also unreadable. Sorry about that. Oopses occur even after the switch to TTY. So this is is probably not even the first one.
Yes, probably the same thing as that bug, although unfortunately the bug filer from #73445 never responded to my suggestion. Definitely sounds like something IRQ-ish is going bad. I think the ultimate resolution will be to just disable MSI on nv4e, but if you don't mind, could you attach the output of # lspci -vvvnn # cat /proc/interrupts (Run as root -- well, really just the lspci bit needs root.) for both the NvMSI=0 case as well as the default case. Even more ideally, do this when the corruption begins (for when MSI is enabled). Also, can you attach a dmesg of a boot without NvMSI=0? Do you see funny interrupt errors in dmesg (like "irq 16 nobody cared" sort of thing)?
Created attachment 93343 [details] Logs without msi
Created attachment 93344 [details] Logs with msi
Created attachment 93345 [details] Logs directly after corruption in tty I managed to reproduce it. I think it's reliable, but that will have to wait. I have to go in 30 minutes. I was able to reproduce it by scrolling like a madmen in tty2. It was just a guess =) . However, nouveau seems to be the quiet type: [ 122.487034] nouveau E[ DRM] GPU lockup - switching to software fbcon log_msi=1.txt is the logs without corruption corrupt.txt is the logs with corruption log_msi=0.txt is a clean boot
I don't see messages about irq's and all. The nice thing about the tty lockup is that the system is actually still responsive. However, reproducing is sometimes difficult. I also checked if it triggered under NvMSI=0 (in case this is a separate issue) and I was not able to reproduce it. Any pointers?
Looks like the nv4x igp's have some registers placed differently... I just cc'd you on a few patches, you can also get them at: http://lists.freedesktop.org/archives/nouveau/2014-February/016032.html http://lists.freedesktop.org/archives/nouveau/2014-February/016033.html http://lists.freedesktop.org/archives/nouveau/2014-February/016034.html
I noticed, thanks!
(In reply to comment #9) > I noticed, thanks! Were you able to test them out? Would be nice to confirm if they actually work as advertised. Although I do tend to trust mwk on such things :)
It's on my TODO list together with bug #70213. The laptop is being used right now, so this could take a while.
Patches applied, let's hope for the best. Furthermore, hibernate seems to work with this laptop! After the enablement of the kernel DRM for the nouveau driver this was not the case anymore. Up until now... progress!
It went berserk shortly after the last post. I have the laptop hooked up to another machine with netconsole. Last time it gave so much OOPSES BUGS and WARNS, well that should keep you all busy for the next week :) . Bug is hard to reproduce, this might take a while. It happens mostly while scrolling. Starts with parts of the screen not updating keeping stale contents of before the scroll.
It does not seem to send the errors over netconsole. It crashed twice this morning using a git pull from 2 hours ago. No output :/ . I'll keep trying though.
Is the system solid without MSI? Maybe we should just give up and disable MSI on it. NVIDIA never shipped drivers with MSI enabled for pre-nv50.
Yes, it never crashed with MSI disabled. I have the laptop hooked up anyway so I will still try hoping it will be able to send out what is going wrong.
Created attachment 94413 [details] Partial output from hang with MSI enabled I managed to capture some output during a hang. It's not much. The music stops playing when the pfifo warnings show up. Then it starts playing and stops at the next pfifo warning. Should I enable more verbose nouveau logs?
(In reply to Ilia Mirkin from comment #15) > Is the system solid without MSI? Maybe we should just give up and disable > MSI on it. NVIDIA never shipped drivers with MSI enabled for pre-nv50. MSI should be disabled for NV4C and NV4E in semi-recent kernels.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.