Bug 101322

Summary: GM108/NV118: 0 MiB DDR3 and boot crash in gf100_ltc_oneinit_tag_ram
Product: xorg Reporter: Daniel Drake <dan>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: NEW --- QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
full dmesg log
none
dmesg with nouveau.debug=debug
none
test disabling pci link config twiddling none

Description Daniel Drake 2017-06-06 20:30:04 UTC
Created attachment 131742 [details]
full dmesg log

On the Acer Z20-730 laptop, the nouveau driver crashes during boot with:

[    4.041108] nouveau 0000:01:00.0: pci: failed to adjust cap speed
[    4.041167] nouveau 0000:01:00.0: pci: failed to adjust lnkctl speed
[    9.633613] nouveau 0000:01:00.0: fb: 0 MiB DDR3
[   20.811768] divide error: 0000 [#1] SMP
[   20.813654] Modules linked in: hid_generic usbmouse usbkbd usbhid i915 nouveau(+) mxm_wmi i2c_algo_bit drm_kms_helper sdhci_pci syscopyarea sysfillrect sysimgblt fb_sys_fops ttm sdhci drm ahci libahci wmi i2c_hid hid video
[   20.815697] CPU: 3 PID: 200 Comm: systemd-udevd Not tainted 4.11.0-2-generic #7+dev143.9f9ecd2beos3.2.2-Endless
[   20.817684] Hardware name: Acer Aspire Z20-730/IPMAL-BR3, BIOS D01 07/07/2016
[   20.819711] task: ffff8a3070288000 task.stack: ffffa3eb4103c000
[   20.821762] RIP: 0010:gf100_ltc_oneinit_tag_ram+0xba/0x100 [nouveau]
[   20.823789] RSP: 0018:ffffa3eb4103f6b8 EFLAGS: 00010206
[   20.825773] RAX: 00001000ffefdfff RBX: ffff8a306f915000 RCX: ffff8a3075570030
[   20.827820] RDX: 0000000000000000 RSI: dead000000000200 RDI: ffff8a307fd9b700
[   20.829797] RBP: ffffa3eb4103f6d0 R08: 000000000001e980 R09: ffff8a3077003900
[   20.831825] R10: ffffa3eb40cdbda0 R11: ffff8a307fd986a4 R12: 0000000000000000
[   20.833814] R13: 0000000100005fff R14: ffff8a306fa2e400 R15: ffff8a306f914e00
[   20.835882] FS:  00007f456d052900(0000) GS:ffff8a307fd80000(0000) knlGS:0000000000000000
[   20.837874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.839935] CR2: 00007fefceb1c020 CR3: 000000026fbc5000 CR4: 00000000003406e0
[   20.841918] Call Trace:
[   20.843972]  gm107_ltc_oneinit+0x7c/0x90 [nouveau]
[   20.845952]  nvkm_ltc_oneinit+0x13/0x20 [nouveau]
[   20.847991]  nvkm_subdev_init+0x50/0x210 [nouveau]
[   20.849977]  nvkm_device_init+0x151/0x270 [nouveau]
[   20.851997]  nvkm_udevice_init+0x48/0x60 [nouveau]
[   20.853944]  nvkm_object_init+0x40/0x190 [nouveau]
[   20.855924]  nvkm_ioctl_new+0x179/0x290 [nouveau]
[   20.857838]  ? nvkm_client_notify+0x30/0x30 [nouveau]
[   20.859794]  ? nvkm_udevice_rd08+0x30/0x30 [nouveau]
[   20.861674]  nvkm_ioctl+0x168/0x240 [nouveau]
[   20.863576]  ? nvif_client_init+0x42/0x110 [nouveau]
[   20.865449]  nvkm_client_ioctl+0x12/0x20 [nouveau]
[   20.867368]  nvif_object_ioctl+0x42/0x50 [nouveau]
[   20.869237]  nvif_object_init+0xc2/0x130 [nouveau]
[   20.871141]  nvif_device_init+0x12/0x30 [nouveau]
[   20.872994]  nouveau_cli_init+0x15e/0x1d0 [nouveau]
[   20.874873]  nouveau_drm_load+0x67/0x8b0 [nouveau]
[   20.876674]  ? sysfs_do_create_link_sd.isra.2+0x70/0xb0
[   20.878451]  drm_dev_register+0x148/0x1e0 [drm]
[   20.880302]  drm_get_pci_dev+0xa0/0x160 [drm]
[   20.882166]  nouveau_drm_probe+0x1d9/0x260 [nouveau]

This has been reproduced on 4.12-rc3. Please let us know how we can help with further debugging.
Comment 1 Karol Herbst 2017-06-06 20:52:26 UTC
(In reply to Daniel Drake from comment #0)
> Created attachment 131742 [details]
> full dmesg log
> 
> On the Acer Z20-730 laptop, the nouveau driver crashes during boot with:
> 
> [    4.041108] nouveau 0000:01:00.0: pci: failed to adjust cap speed
> [    4.041167] nouveau 0000:01:00.0: pci: failed to adjust lnkctl speed
> [    9.633613] nouveau 0000:01:00.0: fb: 0 MiB DDR3
> [   20.811768] divide error: 0000 [#1] SMP
> [   20.813654] Modules linked in: hid_generic usbmouse usbkbd usbhid i915
> nouveau(+) mxm_wmi i2c_algo_bit drm_kms_helper sdhci_pci syscopyarea
> sysfillrect sysimgblt fb_sys_fops ttm sdhci drm ahci libahci wmi i2c_hid hid
> video
> [   20.815697] CPU: 3 PID: 200 Comm: systemd-udevd Not tainted
> 4.11.0-2-generic #7+dev143.9f9ecd2beos3.2.2-Endless
> [   20.817684] Hardware name: Acer Aspire Z20-730/IPMAL-BR3, BIOS D01
> 07/07/2016
> [   20.819711] task: ffff8a3070288000 task.stack: ffffa3eb4103c000
> [   20.821762] RIP: 0010:gf100_ltc_oneinit_tag_ram+0xba/0x100 [nouveau]
> [   20.823789] RSP: 0018:ffffa3eb4103f6b8 EFLAGS: 00010206
> [   20.825773] RAX: 00001000ffefdfff RBX: ffff8a306f915000 RCX:
> ffff8a3075570030
> [   20.827820] RDX: 0000000000000000 RSI: dead000000000200 RDI:
> ffff8a307fd9b700
> [   20.829797] RBP: ffffa3eb4103f6d0 R08: 000000000001e980 R09:
> ffff8a3077003900
> [   20.831825] R10: ffffa3eb40cdbda0 R11: ffff8a307fd986a4 R12:
> 0000000000000000
> [   20.833814] R13: 0000000100005fff R14: ffff8a306fa2e400 R15:
> ffff8a306f914e00
> [   20.835882] FS:  00007f456d052900(0000) GS:ffff8a307fd80000(0000)
> knlGS:0000000000000000
> [   20.837874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   20.839935] CR2: 00007fefceb1c020 CR3: 000000026fbc5000 CR4:
> 00000000003406e0
> [   20.841918] Call Trace:
> [   20.843972]  gm107_ltc_oneinit+0x7c/0x90 [nouveau]
> [   20.845952]  nvkm_ltc_oneinit+0x13/0x20 [nouveau]
> [   20.847991]  nvkm_subdev_init+0x50/0x210 [nouveau]
> [   20.849977]  nvkm_device_init+0x151/0x270 [nouveau]
> [   20.851997]  nvkm_udevice_init+0x48/0x60 [nouveau]
> [   20.853944]  nvkm_object_init+0x40/0x190 [nouveau]
> [   20.855924]  nvkm_ioctl_new+0x179/0x290 [nouveau]
> [   20.857838]  ? nvkm_client_notify+0x30/0x30 [nouveau]
> [   20.859794]  ? nvkm_udevice_rd08+0x30/0x30 [nouveau]
> [   20.861674]  nvkm_ioctl+0x168/0x240 [nouveau]
> [   20.863576]  ? nvif_client_init+0x42/0x110 [nouveau]
> [   20.865449]  nvkm_client_ioctl+0x12/0x20 [nouveau]
> [   20.867368]  nvif_object_ioctl+0x42/0x50 [nouveau]
> [   20.869237]  nvif_object_init+0xc2/0x130 [nouveau]
> [   20.871141]  nvif_device_init+0x12/0x30 [nouveau]
> [   20.872994]  nouveau_cli_init+0x15e/0x1d0 [nouveau]
> [   20.874873]  nouveau_drm_load+0x67/0x8b0 [nouveau]
> [   20.876674]  ? sysfs_do_create_link_sd.isra.2+0x70/0xb0
> [   20.878451]  drm_dev_register+0x148/0x1e0 [drm]
> [   20.880302]  drm_get_pci_dev+0xa0/0x160 [drm]
> [   20.882166]  nouveau_drm_probe+0x1d9/0x260 [nouveau]
> 
> This has been reproduced on 4.12-rc3. Please let us know how we can help
> with further debugging.

does this error appear on older kernels?

Can you get a dmesg booted with "nouveau.debug=debug"?

thanks
Comment 2 Daniel Drake 2017-06-07 13:10:06 UTC
Created attachment 131772 [details]
dmesg with nouveau.debug=debug

Here is the debug log. By the way, this is exactly the same report as was posted on the nouveau list "Kernel panic on nouveau during boot on NVIDIA NV118 (GM108)" - I just duplicated it here after no response on the list (plus I think here is the more appropriate place?)

Thanks for your help so far!
Comment 3 Daniel Drake 2017-06-07 13:11:21 UTC
And yes this problem has been reproduced on v4.8, v4.11, v4.12-rc3. We don't know of any working kernels.
Comment 4 Karol Herbst 2017-06-07 13:33:24 UTC
can you also upload your vbios.rom file located in /sys/kernel/debug/dri/0/vbios.rom ? And if you are up for it, install envytools and do nvapeek 101000 as root? Second is optional, but may help us even more.
Comment 5 Daniel Drake 2017-06-08 14:17:00 UTC
vbios.rom is empty. We will try to get envytools running now.
Comment 6 Daniel Drake 2017-06-13 15:02:26 UTC
The nvapeek 101000 output is "PCI init failure".
Comment 7 Ben Skeggs 2017-06-14 00:40:34 UTC
Created attachment 131939 [details] [review]
test disabling pci link config twiddling

Can you give this patch a try?  It looks like things are working normally up until we try to fiddle with the PCIE link configuration, and I'd like to rule this in/out as a culprit.
Comment 8 Daniel Drake 2017-08-17 08:00:09 UTC
Sorry for the slow response.
We tested the patch against 4.13.rc5 and the issue is still there.
Comment 9 Daniel Drake 2017-10-12 08:31:11 UTC
Problem still present on 4.14-rc4
Comment 10 Ben Skeggs 2017-10-12 20:11:26 UTC
I don't suppose you'd be able to grab a mmiotrace[1] of the proprietary driver for me?  One of Nouveau might be useful also.

[1] https://nouveau.freedesktop.org/wiki/MmioTrace/
Comment 11 Daniel Drake 2017-10-18 08:00:04 UTC
Sent partial dump from nvidia proprietary driver to mmio.dumps address:


I loaded the module and then started an empty X session.
Unfortunately with tracing enabled, this results in an instant hard hang
upon starting X, before it has rendered the black screen.

However I managed to capture these messages over the network up until
the point of hang.

Captured on Linux 4.13. No external displays connected. The all-in-one PC
has one internal LCD display.
Comment 12 Ben Skeggs 2017-10-18 19:07:17 UTC
This patch should (hopefully) help with the issues faced while tracing the binary driver, it will apply to their kernel shim layer sourcecode that can be found if you extract the installer package with --extract-only (and run ./nvidia-installer inside the extracted directory to build the patched kernel module).

https://paste.fedoraproject.org/paste/LkiC1cJdfPGOLc~NlKWkcA
Comment 13 Daniel Drake 2017-10-20 08:33:32 UTC
That worked. sent an updated dump

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.