Summary: | [NV98] hangs in nvbios_init on probe (worked in 3.2) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Darcy Brás da Silva <dardevelin> | ||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||
Severity: | normal | ||||||||||||||||
Priority: | medium | CC: | ionut.radu | ||||||||||||||
Version: | unspecified | ||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
Description
Darcy Brás da Silva
2013-12-21 03:28:15 UTC
Created attachment 91070 [details]
full syslog attachment
It's obviously getting stuck somewhere in the vbios execution logic. I should have thought of this while we were talking on IRC, but can you try this again with modprobe nouveau modeset=1 debug=trace That should produce a bunch more logs that show exactly what is being executed. Hopefully. Hi, sorry for not being able to reply earlier. I am now attaching a syslog.1 with debug=trace flag. Feel free to request me any testing/data which may help you help me. :D A copy of the log may also be found http://cidadecool.com/z-tunes/debian/problem/syslog.1-debug=trace PS: Dec 23 [around 5-6AM] Created attachment 91141 [details]
syslog with debug=trace flag
Not sure why you assigned this to yourself... you shouldn't need to touch any of those fields. Your latest log doesn't seem to contain any messages from nouveau. Did you remember to unload nouveau before running the modprobe command I had indicated? Perhaps the logs didn't hit disk... not sure. Maybe waiting longer before rebooting would allow the flush to go through or something. (Like a minute.) Hi, i have followed all previous steps and simply added the debug=trace flag. I also waited for at least 2 minutes before reboot. I will try have a second run at it though. Thank's for the heads up. Created attachment 91182 [details] syslog with debug=trace flag, more wait time to hit the disk I think it hit the disk this time. the date is DEC25. I am sorry if the process seems slow on getting back to you. but I am getting back as fast as I can :) . hope that does not turn you off/down. As usual the log file is also available under http://cidadecool.com/z-tunes/debian/problem/syslog.1-debug=trace-DEC25- It looks like I have the same issue. I'm a Fedora user and I was able to boot kernel 3.6.10 from Fedora 18 live image but not able to boot Fedora 19 and Fedora 20 live images. I'm also not able to boot kernel-3.13.0: https://bugzilla.redhat.com/show_bug.cgi?id=1026073 Thanks, Ionut Radu. (In reply to comment #7) > Created attachment 91182 [details] > syslog with debug=trace flag, more wait time to hit the disk > > I think it hit the disk this time. the date is DEC25. The only messages on Dec 25 are from 3.2. Perhaps you have a second computer and can use netconsole to send the messages? It'd be really useful to get a log with the trace since that should immediately identify the failure. Created attachment 91531 [details] [review] make jump execution conditional Please try this patch, I'm pretty sure it will help things out. The problem VBIOS has the following snippet: 0xd9d0: 74 64 00 TIME 0x0064 0xd9d3: 75 10 CONDITION 0x10 0xd9d5: 38 NOT 0xd9d6: 6e 24 e8 00 00 ff ff ff ff 00 00 20 00 NV_REG R[0x00e824] &= 0xffffffff |= 0x00200000 0xd9e3: 6e 20 e8 00 00 ff ff ff ff 00 00 00 80 NV_REG R[0x00e820] &= 0xffffffff |= 0x80000000 0xd9f0: 6e 18 e8 00 00 ff ff ff ff 00 00 00 08 NV_REG R[0x00e818] &= 0xffffffff |= 0x08000000 0xd9fd: 6e 18 e8 00 00 ff ff ff 7f 00 00 00 00 NV_REG R[0x00e818] &= 0x7fffffff |= 0x00000000 0xda0a: 6e 18 e8 00 00 ff ff ff 7f 00 00 00 80 NV_REG R[0x00e818] &= 0x7fffffff |= 0x80000000 0xda17: 74 64 00 TIME 0x0064 0xda1a: 5c d0 d9 JUMP 0xd9d0 With the old code, the JUMP was always executed and so there was no way to break out of the loop. The new code makes JUMP conditional the same way NV_REG/etc are. Created attachment 91612 [details] vmcore log excerpt Hi, For me the issue is not fixed. Please see the attached vmcore-excerpt.log. For vmcore, please check: https://www.dropbox.com/sh/e77p700zr8g1v4z/y3ldY3npQB thanks, Ionut Radu. (In reply to comment #11) > Created attachment 91612 [details] > vmcore log excerpt > > > Hi, > > For me the issue is not fixed. > Please see the attached vmcore-excerpt.log. That's very unfortunate. Can you please triple-check that you booted a kernel with that patch applied? Assuming that you did, mind adding "nouveau.debug=trace" to the kernel cmdline? That should reveal where it's looping. (Or perhaps the condition just never becomes true?) Do note that it will produce *vast* amounts of log lines if it is indeed looping the way I think it is... Ionut, also, please upload a copy of your vbios (see http://nouveau.freedesktop.org/wiki/DumpingVideoBios/ for instructions on retrieving it... your issue might be different than Darcy's) Hi Ilia, It's very likely to be the same issue. I have the same graphic card as Darcy and slightly different vbios version. I'm sure I have applied the patch correctly. In fact the original kernel is 3.13.0...fc21, while I have compiled it on Fedora 20 and it got the fc20 suffix. I'll try to obtain a vbios.rom file. Regarding debug=trace, I have issues with journalctl flooding, and the vmcore log gets truncated with no useful information added. Can't you use vmcore to debug the issue ? It should contain enough information. thanks, Ionut Radu. Well, I downloaded https://www.dropbox.com/sh/e77p700zr8g1v4z/kb_av1ZSS6/127.0.0.1-2014.01.07-20%3A43%3A35 Which includes kernel-debuginfo-common-x86_64-3.13.0-0.rc6.git0.1.fc20.x86_64.rpm Which contains the source. I can only assume that this is the source that you built from. And the drivers/gpu/drm/nouveau/core/subdev/bios/init.c in there does not appear to have the patch applied to init_jump. So I'd like to ask again... can you check that the kernel you're booting is the kernel that has the patch applied to it? Am I misunderstanding the situation? You could prove it to yourself by modifying some common print, for example, that you would be able to identify from kernel messages. Or you could just build the kernel directly without fancy tools that obscure these things. (I did try loading the nouveau.ko.debug in gdb, but I think it _only_ contains the debug symbols, since the init_jump function was full of add %al,(%rax) instructions which I think is just opcode 0.) Unfortunately I don't have another computer available at the hours I can test this. Will try to get one during this week, but i am very likely to need some assistance on getting the messages out using this netconsole. regarding the patch, to which source tree should I try to apply ? was that for me at all ? Thanks in advance Hi, Applying the patch provided by Ilia Mirkin to linux kernel 3.13.0-rc6 solves all the reported bug/behavior on my side. */me is impressed with Ilia Mirkin dedication* Thanks a million. Hi Ilia, I was wrong. The fix is good in my case too. Great work. Thanks a lot. Regards, Ionut. So after all there was the same issue as expected. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.