Created attachment 83404 [details]
Shortly after resume from suspend on kernel 3.10.4, nouveau crashes (hard lockup). This is the first nouveau crash I've seen since using xfwm4 and no compositing. I've been hanging out on the 3.4.x LTS series though, so it's been safe.
Would you be able to bisect this issue ? Hopefully the fix will be more trivial than the i2c one you've reported :]
Actually all the fence stuff was redone between... 3.4 and 3.5 or 3.5 and 3.6 (sorry, I forgot), and that created regressions for at least one other user (on a nvc0 card though). The unfortunate thing is that the fence redo actually had some suspend bugs in it that were fixed over time (but I guess not all of them!) so this may end up being tricky to bisect, if you indeed zero in on those fence commits.
Hi, I may try to biscect this but won't be able to get it done soon. There are a couple of hurdles to testing...
1) The bug itself causes a hard lockup requiring poweroff. I use a Crucial M4 SSD which has a known issue with power loss condition where it becomes unusable. This is risky, so I'll need to image my install over to an external drive to test the biscec kernel builds.
2) There is no reliable steps to reproduce the issue. I was running 3.10.3 for several days with multiple suspend cycles prior to triggering this crasher. I'm not sure I can dedicate the time to this. As much as I'd like to help the nouveau driver have one less bug, the gpu in question is aged and I have to just let this one slide as it's pretty low priority for newer kernels I'd assume :]
Please leave this report open. I'll try to update it at some point with results of a bisect. Can I bisect the mainline tree, or must I use the nouveau git tree?
Shouldn't matter -- the trees are identical for the purpose of bisection. I bet running glxgears while suspending will help trigger it more often. Also, it's interesting that it's a hard hang. I would have expected X to die and come back... can you not, e.g., ssh in when this happens?
Thanks for the info. Unfortunately, it's a CPU hard lockup - no ssh connectivity and cannot sysrq out safely. Additionally, I've got a raid that doesn't get synced on this crash, on top of the SSD issue, so this becomes quite troublesome.
I had reported a similar nouveau crasher via downstream Debian BTS last year.
That's why I switched over to XFCE and build my own vanilla 3.4.x kernels + i2c/pwm patch.
Thanks Ilia...glxgears while resuming from suspend triggered the crash on the first attempt. I'll get a test install up and running and start bisecting this soon.
Okay, I created a test environment and bisected this problem:
4f6029da58ba9204c98e33f4f3737fe085c87a6f is the first bad commit
Author: Ben Skeggs <email@example.com>
Date: Fri Nov 16 11:54:31 2012 +1000
drm/nv50-nvc0: switch to common disp impl, removing previous version
Signed-off-by: Ben Skeggs <firstname.lastname@example.org>
:040000 040000 9daeb0bd5ed3e9b22b53c21fab853bd2e392f6ed 4bdbb1d96e57d3f254affb8812788f04b7474bf7 M drivers
Created attachment 83948 [details]
Same bisect result as https://bugs.freedesktop.org/show_bug.cgi?id=67878 . NV98 vs NVA0 -- fairly similar cards, too.