Summary: | nouveau freeze Xorg on NV34 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | DarkRaven <drdarkraven> | ||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | highest | CC: | drdarkraven, hiyuh.root, mschiffer, w41ter | ||||||||||||||||
Version: | git | ||||||||||||||||||
Hardware: | x86 (IA32) | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Description
DarkRaven
2010-08-15 06:45:44 UTC
Created attachment 37887 [details]
Kernel log
Created attachment 37888 [details]
Kernel configuration
Created attachment 37908 [details]
Xorg log
After '2.6.36-rc1' is merged into nouveau,the same problem appears with nouveau kernel tree.
Here is my xorg.0.log
Created attachment 37909 [details]
Xorg log
Sorry,I submitted the wrong one.
Curious behavior of nouveau driver(though I don't know if it's connect with this problem): Once nouveau driver is loaded,after the screen resolution changed(which is normal),there is a area,about 640*480 in size,at the top-left corner of the screen,is totally white. This behavior started maybe after 2.6.36-rc1 (In reply to comment #3) > Created an attachment (id=37908) [details] > Xorg log > > After '2.6.36-rc1' is merged into nouveau,the same problem appears with nouveau > kernel tree. > You could try to bisect this problem (see "man git-bisect"). > Here is my xorg.0.log (In reply to comment #5) > Curious behavior of nouveau driver(though I don't know if it's connect with > this problem): > Once nouveau driver is loaded,after the screen resolution changed(which is > normal),there is a area,about 640*480 in size,at the top-left corner of the > screen,is totally white. > > This behavior started maybe after 2.6.36-rc1 That's an unrelated issue, it's already fixed in Andrew Morton's -mm tree, commit "vt: fix console corruption on driver hand-over". Bisect result: 58374713c9dfb4d231f8c56cac089f6fbdedc2ec is the first bad commit commit 58374713c9dfb4d231f8c56cac089f6fbdedc2ec Author: Arnd Bergmann <arnd@arndb.de> Date: Sat Jul 10 23:51:39 2010 +0200 drm: kill BKL from common code (In reply to comment #7) > Bisect result: > > 58374713c9dfb4d231f8c56cac089f6fbdedc2ec is the first bad commit > commit 58374713c9dfb4d231f8c56cac089f6fbdedc2ec > Author: Arnd Bergmann <arnd@arndb.de> > Date: Sat Jul 10 23:51:39 2010 +0200 > > drm: kill BKL from common code Ben may have fixed this with e644bb2066cc46c01e5f8902bf840f19e1f942c6, could you try again with latest git? Tried,doesn't fix the bug. Yeah, I didn't expect my earlier fix to fix this or the other issues that have been reported. They fix another issue I encountered however. I have been unable to reproduce these so far, but my kernel has additional changes which could possible effect things, I will try in more depth tomorrow. You could try to build a kernel with CONFIG_LOCKUP_DETECTOR enabled. The lockup detector logs a backtrace when the kernel stays stuck for a minute or so. It may be useful along with a serial/netconsole to track this problem down. The all-is-frozen situation described in the comment only appeared once. During my bisecting,while Xorg is frozen,I'm still able to move mouse and network is still working. So according to the definition of 'Hardlockups': Hardlockups are bugs that cause the CPU to loop in kernel mode for more than 60 seconds, without letting other interrupts have a chance to run. I don't think this is a hardlockup. (In reply to comment #12) > The all-is-frozen situation described in the comment only appeared once. > During my bisecting,while Xorg is frozen,I'm still able to move mouse and > network is still working. > So according to the definition of 'Hardlockups': > Hardlockups are bugs that cause the CPU to loop in kernel mode > for more than 60 seconds, without letting other interrupts have a > chance to run. > I don't think this is a hardlockup. Ah! Then CONFIG_DETECT_HUNG_TASK would be more appropriate. (In reply to comment #13) > (In reply to comment #12) > > The all-is-frozen situation described in the comment only appeared once. > > During my bisecting,while Xorg is frozen,I'm still able to move mouse and > > network is still working. Actually if it were some kind of kernel deadlock you wouldn't be able to move the mouse at all. Most likely the card has locked up for some reason, can you reboot with "drm.debug=3 log_buf_len=256k" in the kernel command line and attach new kernel logs after the hang? > > So according to the definition of 'Hardlockups': > > Hardlockups are bugs that cause the CPU to loop in kernel mode > > for more than 60 seconds, without letting other interrupts have a > > chance to run. > > I don't think this is a hardlockup. > > Ah! Then CONFIG_DETECT_HUNG_TASK would be more appropriate. Created attachment 38220 [details]
Kernel log
Only attach part of it,should be enough.
Don't think you would need the full 247k dmesg output.
BTW,X is running at R stat while hanging,and top shows sys takes all the cpu time. FYI,commit 58374713c9dfb4d231f8c56cac089f6fbdedc2ec changed lock_kernel() and unlock_kernel() before and after the func() call to mutex_lock() and mutex_unlock(). (In reply to comment #15) > Created an attachment (id=38220) [details] > Kernel log > > Only attach part of it,should be enough. > Don't think you would need the full 247k dmesg output. It looks like fallout from an earlier problem, so yeah, full kernel logs would be interesting. Created attachment 38224 [details]
More Kernel Log
(In reply to comment #19) > Created an attachment (id=38224) [details] > More Kernel Log > [...] > > ----Many similar lines---- > So you claim there are no errors there? Please, provide *full* kernel logs. BTW, can you reproduce this with the 3D drivers uninstalled? Full kernel log here (too large for bugzilla): http://pastebin.com/5TKurz6u And I can reproduce it without 3D driver (completely without 3D,nouveau_dri,libGL & mesa uninstalled) BTW,I won't have chance to use this computer (with NV34 graphic card) for the coming month. Created attachment 38240 [details] [review] nouveau_ttm_preemption.patch (In reply to comment #22) > BTW,I won't have chance to use this computer (with NV34 graphic card) for the > coming month. Most likely this is a race between the X server thread and the TTM delayed work queue. It couldn't happen before because taking the BKL disables preemption. Are you still in time to test patches? (In reply to comment #23) > Created an attachment (id=38240) [details] > nouveau_ttm_preemption.patch > > (In reply to comment #22) > > BTW,I won't have chance to use this computer (with NV34 graphic card) for the > > coming month. > > Most likely this is a race between the X server thread and the TTM delayed work > queue. It couldn't happen before because taking the BKL disables preemption. > Are you still in time to test patches? Sorry, can't. I've been having similar freezes with NV34 while using very recent kernels, so I would be very happy to test patches. I'm using the nouveau support in Linus.git kernel, though, not nouveau.git. I could start using nouveau.git if that's what you'd be creating patches from. (In reply to comment #25) > I've been having similar freezes with NV34 while using very recent kernels, so > I would be very happy to test patches. > > I'm using the nouveau support in Linus.git kernel, though, not nouveau.git. I > could start using nouveau.git if that's what you'd be creating patches from. Yeah, you could apply it over Linus' tree, if you want, but it has already been reported to solve the problem. Today's kernel from Linus seems to fix the problem for me too, fingers crossed. I've pushed the fix to master, closing. *** Bug 29809 has been marked as a duplicate of this bug. *** (In reply to comment #28) > I've pushed the fix to master, closing. The bug may be closed, but the dumb questions linger on forever :p Yesterday I noticed a very similar hang on my NV4 machine (which I didn't see before yesterday). Does it make sense that this same bug would also affect an NV4 chipset? Or should I start thinking about filing a different bug report? Many thanks! (In reply to comment #30) > (In reply to comment #28) > > I've pushed the fix to master, closing. > > The bug may be closed, but the dumb questions linger on forever :p > > Yesterday I noticed a very similar hang on my NV4 machine (which I didn't see > before yesterday). > > Does it make sense that this same bug would also affect an NV4 chipset? Or > should I start thinking about filing a different bug report? > Yeah it's probably the same issue, this bug affected the whole card range. > Many thanks! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.