Bug 69928 - [NVAA] Boot of linux kernel 3.12-rc2 hangs
Summary: [NVAA] Boot of linux kernel 3.12-rc2 hangs
Status: RESOLVED INVALID
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-29 13:23 UTC by dirkneukirchen
Modified: 2015-10-22 04:15 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
3.12-rc2 fails to boot with NVAA (47.43 KB, text/plain)
2013-09-29 13:23 UTC, dirkneukirchen
no flags Details
3.11.0 boots fine (109.16 KB, text/plain)
2013-09-29 13:24 UTC, dirkneukirchen
no flags Details
dmesg 3.12-rc2 boots successful with nouveau.config=NvMSI=0 (436.43 KB, text/plain)
2013-09-29 19:27 UTC, dirkneukirchen
no flags Details
mmiotrace loading module nvidia 304 (1.85 KB, text/plain)
2013-10-15 16:48 UTC, dirkneukirchen
no flags Details
output /proc/interrupts every 1 second with nvidia-304 (308.53 KB, text/plain)
2013-10-16 08:53 UTC, dirkneukirchen
no flags Details
mmiotrace of proc/interrupts run (1.78 MB, application/x-xz)
2013-10-16 08:57 UTC, dirkneukirchen
no flags Details
mmiotrace modprobe nvidia (1.85 KB, text/plain)
2013-10-20 13:58 UTC, dirkneukirchen
no flags Details
Add some writes to 0x100c14 (764 bytes, patch)
2014-09-25 12:03 UTC, Pierre Moreau
no flags Details | Splinter Review

Description dirkneukirchen 2013-09-29 13:23:25 UTC
Created attachment 86798 [details]
3.12-rc2 fails to boot with NVAA

Steps to reproduce:
- Compiling Kernel 3.12-rc2 
- boot
- boot hangs at black screen/no signal

System:
- NVAA onboard (Jetway JNC62K)
- Linux Mint 15 (Ubuntu 13.04)

No error with 3.11.0 Kernel.

I compiled kernel 3.12-rc2 once as a dpkg and once with vanilla methods (make install). Both same behaviour.

Attached are a netconsole log of 3.12-rc2 and a dmesg of a "working" 3.11.0


Example error message of 3.12-rc2:
[   22.081905] nouveau E[     PFB][0000:02:00.0] trapped write at 0x01002b75b4 on channel 0x00007ee0 [unknown] BAR/PFIFO_WRITE/IN reason: PAGE_NOT_PRESENT
Comment 1 dirkneukirchen 2013-09-29 13:24:12 UTC
Created attachment 86799 [details]
3.11.0 boots fine
Comment 2 Emil Velikov 2013-09-29 15:46:17 UTC
Your 3.12-rc2 lod looks incomplete - note the timestamps.

AFAICS the only commit that should be affecting you is
commit a27e56996687e79416d69a7e6dc26f9d8fe06059
Author: Lucas Stach <dev@lynxeye.de>
Date:   Wed Aug 28 02:00:50 2013 +0200

    drm/nouveau: use MSI interrupts
    
    MSIs were only problematic on some old, broken chipsets. But now that we
    already see systems where PCI legacy interrupts are somewhat flaky, it's
    really time to move to MSIs.
    
    v2 (Ben Skeggs): blacklist BR02 boards


You should be able to toggle MSI off using "nouveau.config=NvMSI=0" appended to your kernel command line. Can you give it a try and attach the resulting dmesg
Comment 3 dirkneukirchen 2013-09-29 19:27:20 UTC
Created attachment 86806 [details]
dmesg 3.12-rc2 boots successful with nouveau.config=NvMSI=0

> note the timestamps
I noticed but that was my 1st usage of netconsole - maybe it was some issue with target system (a VM), router or something else.

> You should be able to toggle MSI off using "nouveau.config=NvMSI=0" appended to > your kernel command line. Can you give it a try and attach the resulting dmesg

see new attachment
Comment 4 Emil Velikov 2013-10-01 17:49:47 UTC
Thanks for confirming. The author of the patch has requested information wrt MSI on nvidia boards. I'm hoping that we can get reply soon(ish) otherwise we may have to disable MSI default on all boards.
Comment 5 Dmitry Chichkov 2013-10-11 22:12:13 UTC
Confirmed with 3.12-rc4. 

System:
 ASUS P6T SE, i7
 Nvidia GF100 [GeForce GTX 470] (rev a3).
 Ubuntu 13.04.

Details: No error with 3.10 kernel.   Boots / hangs in the text mode before unity-greater with 3.12-rc4.     Boots succesfully / works fine with 3.12-rc4 with nouveau.config=NvMSI=0 in the boot command line.


I would suggest rising a priority on this one.
Comment 6 Ben Skeggs 2013-10-14 23:25:46 UTC
(In reply to comment #5)
> Confirmed with 3.12-rc4. 
> 
> System:
>  ASUS P6T SE, i7
>  Nvidia GF100 [GeForce GTX 470] (rev a3).
>  Ubuntu 13.04.
> 
> Details: No error with 3.10 kernel.   Boots / hangs in the text mode before
> unity-greater with 3.12-rc4.     Boots succesfully / works fine with
> 3.12-rc4 with nouveau.config=NvMSI=0 in the boot command line.
> 
> 
> I would suggest rising a priority on this one.

MSI will be disabled by default for 3.12 as a result of this and a few other related regressions.

To help fix this in the future, would you be able to get me a trace of the nvidia binary driver?

Instructions of how to do so are here: http://nouveau.freedesktop.org/wiki/MmioTrace/

Thanks!
Comment 7 dirkneukirchen 2013-10-15 16:48:39 UTC
Created attachment 87677 [details]
mmiotrace loading module nvidia 304

i hope kernel version doesn't play a role in mmio tracing

trace was done from fresh ubuntu 13.04 install (on usb stick)
Linux vm1304 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:36:13 UTC 2013 i686 athlon i686 GNU/Linux

with nvidia blobs from ubuntu repos (nvidia-304)
Comment 8 dirkneukirchen 2013-10-15 17:02:58 UTC
additional mmio traces:

simple xinit with sleep:
mmiotrace_xinit.log.xz (3.1 MB)
https://mega.co.nz/#!u4RwFLjK!Yzi4UVujupmRdEIMFZtbQBOtjyoJFR1jrXheO1RUpew

starting glxgears:
mmiotrace_glxgears.log.xz (3.4 MB)
https://mega.co.nz/#!744wlQ7b!CBnbnEYfsIwOuXKFojgy3kuecVkC6U43Bye4nZkKMUk

starting nvidia-settings - since dual screen is active already just browse through options and then quit:
mmiotrace_dualsettings.log.xz (2.7 MB)
https://mega.co.nz/#!L1ZR3ApZ!XEuYymYmvLyQSQdpwQ0HaiBEipp2CBHz9y3PHNBNn6Q


glxgears posted because: https://bugs.freedesktop.org/show_bug.cgi?id=69952
Comment 9 Ben Skeggs 2013-10-15 21:39:41 UTC
Can you also show the contents of /proc/interrupts while X is running with the NVIDIA binary driver?

Thanks!
Comment 10 dirkneukirchen 2013-10-16 08:53:22 UTC
Created attachment 87714 [details]
output /proc/interrupts every 1 second with nvidia-304

/proc/interrupts is attached from one "test" run of 
with a parallel mmiotrace

xinit nvidia-settings
(browse through some menus, then quit)
(somehow xinit "sleep 20" wasnt really working this time because some shell issue )

from the recovery console
- Kernel was 3.8.0-19 default provided by Ubuntu 13.04 (like above)
like before

Do you need proc/interrupts from a normal user desktop too ?
Comment 11 dirkneukirchen 2013-10-16 08:57:57 UTC
Created attachment 87715 [details]
mmiotrace of proc/interrupts run

mmiotrace of 
xinit nvidia-settings of the proc/interrupts run



with nvidia-304, Kernel 3.8.0-19 Ubuntu 13.04 default (32 bit)
Comment 12 Ilia Mirkin 2013-10-16 15:04:13 UTC
The request to see /proc/interrupts was just to see if it was using MSI or not, not for the actual interrupt counts.

 20:       5187   IO-APIC-fasteoi   ehci_hcd:usb2, nvidia

Sadly it's not using MSI. (If it were, it'd say PCI-MSI-edge or something like that.) Would it be possible for you to get a version that enables MSI? I believe 325 and newer do this. If it does enable MSI, Ben could use a trace.
Comment 13 dirkneukirchen 2013-10-20 13:56:53 UTC
(In reply to comment #12)
> The request to see /proc/interrupts was just to see if it was using MSI or
> not, not for the actual interrupt counts.
> 
>  20:       5187   IO-APIC-fasteoi   ehci_hcd:usb2, nvidia
> 
> Sadly it's not using MSI. (If it were, it'd say PCI-MSI-edge or something
> like that.) Would it be possible for you to get a version that enables MSI?
> I believe 325 and newer do this. If it does enable MSI, Ben could use a
> trace.

MSI is indeed enabled in 325.15

Creating a test system (ubuntu 13.10) took a while my setup is:
- apply a patch to NVIDIA drivers because Kernel 3.11 is used (via google)
- disable modeset (kernel cmdline: nomodeset )
- blacklist vesafb, nouveau (modprobe.d entries and cmdline: rdblacklist=nouveau  nouveau.blacklist=1  nouveau.modeset=0)
- enable vga console (cmdline: video=vesa:off vga=normal )
- install nvidia custom driver (created by --apply-patch patch311.patch)
- reboot

two mmiotraces are attached
Comment 14 dirkneukirchen 2013-10-20 13:58:00 UTC
Created attachment 87876 [details]
mmiotrace modprobe nvidia

mmiotrace when doing modprobe nvidia
Comment 15 dirkneukirchen 2013-10-20 14:07:22 UTC
mmiotrace of xinit "sleep 10" is too big (>3000kB xz compressed)

System is a clean Ubuntu 13.10 with Nvidia 325.15 with the needed small patch


--- a/kernel/nv-linux.h
+++ b/kernel/nv-linux.h
@@ -957,7 +957,11 @@ static inline int nv_execute_on_all_cpus
 #endif
 
 #if !defined(NV_VMWARE)
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0)
+#define NV_NUM_PHYSPAGES                get_num_physpages()
+#else
 #define NV_NUM_PHYSPAGES                num_physpages
+#endif
 #define NV_GET_CURRENT_PROCESS()        current->tgid
 #define NV_IN_ATOMIC()                  in_atomic()
 #define NV_LOCAL_BH_DISABLE()           local_bh_disable()


here is the link:
mmiotrace_xinit_sleep10.log.xz (3.0 MB)
https://mega.co.nz/#!2hRgDRqb!BwPChW79v7SdYK0rvxbtT3EPmHVM300Flqfuw65Xpbs

without disabling of vesafb and enabling of vga console (see kernel cmdline in comment above) there will be lockups after logging into X (there is a NVRM warning about that in dmesg)
Comment 16 Ilia Mirkin 2013-11-09 20:34:35 UTC
FTR, the trace contains:

[1] 698.683063 MMIO32 R 0x088068 0x00810005 PPCI.MSI_HEAD => { CAP_ID = MSI | NEXT_CAP_PTR = 0 | ENABLE | QMASK = 0 | QSIZE = 0 | 64BIT }
[1] 698.683072 MMIO8 W 0x088068 0x000000ff PPCI.MSI_HEAD <= { CAP_ID = 0xff | NEXT_CAP_PTR = 0 | QMASK = 0 | QSIZE = 0 }

and then later

[1] 699.766366 MMIO32 R 0x000100 0x80000000 PMC.INTR_HOST => { SOFTWARE }
[1] 699.766375 MMIO32 W 0x000100 0x00000000 PMC.INTR_HOST <= { 0 }
[1] 699.766434 MMIO8 W 0x088068 0x000000ff PPCI.MSI_HEAD <= { CAP_ID = 0xff | NEXT_CAP_PTR = 0 | QMASK = 0 | QSIZE = 0 }
[1] 699.766443 MMIO32 W 0x000140 0x00000000 PMC.INTR_EN_HOST <= { 0 }

etc. So the blob does enable MSI on NVAA.
Comment 17 Pierre Moreau 2014-09-25 12:03:32 UTC
Created attachment 106850 [details] [review]
Add some writes to 0x100c14

Does this patch help (applying it to a recent kernel code would be better)? I'm not sure it will, but well, maybe worth the try.
Comment 18 Pierre Moreau 2015-01-18 20:09:30 UTC
Is this still an issue using kernel 3.19-rc4?
Comment 19 Ilia Mirkin 2015-10-22 04:15:23 UTC
No retest in over a year. Marking invalid.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.