Summary: | [NV137/GP107] xorg-server-1.19.3 crashes when trying to enable HDMI output | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Pacho Ramos <pachoramos1> | ||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | medium | CC: | carlo, jan.public, paulo | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | Other | ||||||||||||||||||
OS: | All | ||||||||||||||||||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=100228 https://bugs.freedesktop.org/show_bug.cgi?id=104621 |
||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Created attachment 131550 [details]
backtrace.log
Created attachment 131551 [details]
Xorg.0.log
Created attachment 131552 [details]
core.xz
Full core dump (compressed with xz)
Looks like a nouveau driver issue — the SetSharedPixmapBacking member of its ScreenRec is NULL. I wonder if this is due to the fact that it's running in NoAccel mode. (Haven't actually looked at any of the code though.) Pacho, can you test 4.12-rcN with the nvidia-supplied firmware installed (part of linux-firmware) which will allow you to have acceleration on your board? (In reply to Ilia Mirkin from comment #5) > I wonder if this is due to the fact that it's running in NoAccel mode. Possibly, you might want something like https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=b19417e2fddf4df725951aea5ad5e9558338f59e (In reply to Ilia Mirkin from comment #5) > I wonder if this is due to the fact that it's running in NoAccel mode. > (Haven't actually looked at any of the code though.) > > Pacho, can you test 4.12-rcN with the nvidia-supplied firmware installed > (part of linux-firmware) which will allow you to have acceleration on your > board? I tried already, but it's even worse because nouveau causes a kernel Oops when running with acceleration and kernel 4.12-rc2... then, I needed to pass nouveau.noaccel=1 and, even getting X started, I got the same crash (additionally, I don't know why on kernel 4.12-rc2 my touchpad stops to move... it clicks but doesn't move... but that is probably a different bug :( ) Created attachment 131572 [details]
Photo as soon as I try to start X with kernel 4.12-rc3
I have tried with 4.12-rc3... but it's even worse and as soon as I try to start X I get a kernel Oops (I have taken a photo to show it)
Created attachment 131573 [details]
dmesg output with kernel 4.12-rc3
dmesg looks to contain lots of errors with kernel 4.12-rc3. I am using nouveau 1.0.15 and linux-firmware from 20170519
Thanks
gr fails to come up, and the rest of the board ends up dead too. On the off chance you're running with a kernel setup aimed at you all of a sudden needing to boot off some exotic RAID controller, thus needing to put your modules into initrd... make sure that the updated linux-firmware is in that initrd, because the originally-released firmware for GP107 was "wrong". I don't know how that RAID is done... this is a Dell Inspiron 15 7000 laptop that probably uses that setup to work with a small SSD as main device and a bigger HD ... but this is working nice with kernel 4.9.x from 4.11 :/ I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I thought I didn't need that as nouveau is compiled as a module and not into the kernel :| (In reply to Pacho Ramos from comment #11) > I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I > thought I didn't need that as nouveau is compiled as a module and not into > the kernel :| OK, well if modules are loaded off the FS and not initrd, then you're good on that front, assuming you really do have the updated linux-firwmare (April 4 or later -- https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b14134583c2a15d4404695f72cb523daedb877ab). Lots of people use initrd without understanding how it works. So there are two issues here... #1: we should probably disable reverse prime if you don't have acceleration (or figure out how to make it work without accel... should just be a very slow memcpy away... not great, but better than a non-working screen) #2: we should figure out what's going on with accel on your GP107 -- looks like stuff is just hanging (that's the "timeout" messages you see, we're waiting for some condition to become true, and it never does). I can help with the former, and hopefully Ben Skeggs can investigate the latter. (In reply to Ilia Mirkin from comment #12) > (In reply to Pacho Ramos from comment #11) > > I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I > > thought I didn't need that as nouveau is compiled as a module and not into > > the kernel :| > > OK, well if modules are loaded off the FS and not initrd, then you're good > on that front, assuming you really do have the updated linux-firwmare (April > 4 or later -- > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ > commit/?id=b14134583c2a15d4404695f72cb523daedb877ab). Lots of people use > initrd without understanding how it works. Yeah, my linux-firmware snapshot is the one up to https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=df40d15d6ad617e72ce7ea00b91d9117d92dcccc > > So there are two issues here... > > #1: we should probably disable reverse prime if you don't have acceleration > (or figure out how to make it work without accel... should just be a very > slow memcpy away... not great, but better than a non-working screen) > > #2: we should figure out what's going on with accel on your GP107 -- looks > like stuff is just hanging (that's the "timeout" messages you see, we're > waiting for some condition to become true, and it never does). > > I can help with the former, and hopefully Ben Skeggs can investigate the > latter. Great, thanks a lot:) (In reply to Pacho Ramos from comment #7) [...] > (additionally, I don't know why on kernel 4.12-rc2 my touchpad stops to > move... it clicks but doesn't move... but that is probably a different bug > :( ) This was indeed a different bug that got fixed in rc4 :)... but nouveau still keeps failing in the same way with rc4 :(, do you need updated logs with that kernel? (they seem quite similar though) Thanks Looks very closely related to, if not the same as, bz#100228 Created attachment 133924 [details] [review] [PATCH] Don't advertise any PRIME offloading capabilities without acceleration (In reply to Michel Dänzer from comment #6) > (In reply to Ilia Mirkin from comment #5) > > I wonder if this is due to the fact that it's running in NoAccel mode. > > Possibly, you might want something like > > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/ > ?id=b19417e2fddf4df725951aea5ad5e9558338f59e Something like this for nouveau? (In reply to Ilia Mirkin from comment #12) [...] > #2: we should figure out what's going on with accel on your GP107 -- looks > like stuff is just hanging (that's the "timeout" messages you see, we're > waiting for some condition to become true, and it never does). > > I can help with the former, and hopefully Ben Skeggs can investigate the > latter. As a side note, I am still unable to run with "accel" enabled even with kernel 4.13.0 :/ (In reply to Carlo Caione from comment #16) > Created attachment 133924 [details] [review] [review] > [PATCH] Don't advertise any PRIME offloading capabilities without > acceleration > > (In reply to Michel Dänzer from comment #6) > > (In reply to Ilia Mirkin from comment #5) > > > I wonder if this is due to the fact that it's running in NoAccel mode. > > > > Possibly, you might want something like > > > > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/ > > ?id=b19417e2fddf4df725951aea5ad5e9558338f59e > > Something like this for nouveau? ping on this patch. It's not the solution for the underlying problem but at least it's nice to have Xorg not crashing. Any news on this? I still need to run with noaccel with kernel 4.19.25, otherwise system ends up getting hung after showing this errors: feb 23 17:21:08 dell-2017 kernel: ------------[ cut here ]------------ feb 23 17:21:08 dell-2017 kernel: nouveau 0000:01:00.0: timeout feb 23 17:21:08 dell-2017 kernel: WARNING: CPU: 4 PID: 64 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:207 gf100_vmm_flush_+0x149/0x190 [nouveau] feb 23 17:21:08 dell-2017 kernel: Modules linked in: cmac ctr ccm bnep uvcvideo videobuf2_vmalloc btusb videobuf2_memops videobuf2_v4l2 btrtl videodev btbcm btintel videobuf2_common bluetooth ecdh_generic hid_m> feb 23 17:21:08 dell-2017 kernel: acpi_pad int340x_thermal_zone int3400_thermal acpi_thermal_rel vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) feb 23 17:21:08 dell-2017 kernel: CPU: 4 PID: 64 Comm: kworker/4:1 Tainted: G W O 4.19.25-gentoo #1 feb 23 17:21:08 dell-2017 kernel: Hardware name: Dell Inc. Inspiron 15 7000 Gaming/065C71, BIOS 1.6.0 03/27/2018 feb 23 17:21:08 dell-2017 kernel: Workqueue: pm pm_runtime_work feb 23 17:21:08 dell-2017 kernel: RIP: 0010:gf100_vmm_flush_+0x149/0x190 [nouveau] feb 23 17:21:08 dell-2017 kernel: Code: 5f e9 3b ae 56 e1 48 8b 7d 10 48 8b 5f 50 48 85 db 74 46 e8 09 7b 32 e1 48 89 da 48 89 c6 48 c7 c7 a4 a8 3a a0 e8 07 63 df e0 <0f> 0b eb c2 48 8b 7d 10 48 8b 5f 50 48 85 > feb 23 17:21:08 dell-2017 kernel: RSP: 0018:ffffc90001b53718 EFLAGS: 00010296 feb 23 17:21:08 dell-2017 kernel: RAX: 000000000000001d RBX: ffff88846cdae2d0 RCX: 0000000000000006 feb 23 17:21:08 dell-2017 kernel: RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88846f7153f0 feb 23 17:21:08 dell-2017 kernel: RBP: ffff88846bc37800 R08: 0000000000000001 R09: 00000000000004b3 feb 23 17:21:08 dell-2017 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff88846277e660 feb 23 17:21:08 dell-2017 kernel: R13: 0000000c6f79dd60 R14: ffff88846aa34020 R15: ffff88846aad3600 feb 23 17:21:08 dell-2017 kernel: FS: 0000000000000000(0000) GS:ffff88846f700000(0000) knlGS:0000000000000000 feb 23 17:21:08 dell-2017 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 feb 23 17:21:08 dell-2017 kernel: CR2: 00007fab22c8d4a0 CR3: 000000000200a002 CR4: 00000000003606e0 feb 23 17:21:08 dell-2017 kernel: Call Trace: feb 23 17:21:08 dell-2017 kernel: nvkm_vmm_iter.constprop.15+0x2cf/0x7e0 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? nvkm_vmm_map+0xb8/0x3e0 [nouveau] feb 23 17:21:08 dell-2017 kernel: nvkm_vmm_map+0x1a5/0x3e0 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? gp100_vmm_pgt_sgl+0x180/0x180 [nouveau] feb 23 17:21:08 dell-2017 kernel: nvkm_vram_map+0x43/0x50 [nouveau] feb 23 17:21:08 dell-2017 kernel: nvkm_uvmm_mthd+0x71e/0x850 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? lock_timer_base+0x62/0x80 feb 23 17:21:08 dell-2017 kernel: nvkm_ioctl+0x105/0x240 [nouveau] feb 23 17:21:08 dell-2017 kernel: nvif_object_mthd+0xd3/0xf0 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? dma_fence_wait_timeout+0x30/0x30 feb 23 17:21:08 dell-2017 kernel: nvif_vmm_map+0xef/0x110 [nouveau] feb 23 17:21:08 dell-2017 kernel: nouveau_mem_map+0x73/0xd0 [nouveau] feb 23 17:21:08 dell-2017 kernel: nouveau_vma_map+0x2f/0x40 [nouveau] feb 23 17:21:08 dell-2017 kernel: nouveau_bo_move_ntfy+0x6b/0xd0 [nouveau] feb 23 17:21:08 dell-2017 kernel: ttm_bo_handle_move_mem+0x3b1/0x590 [ttm] feb 23 17:21:08 dell-2017 kernel: ? drm_vma_offset_add+0x3c/0x60 feb 23 17:21:08 dell-2017 kernel: ttm_bo_evict+0x145/0x320 [ttm] feb 23 17:21:08 dell-2017 kernel: ? gf119_disp_chan_uevent_fini+0x3d/0x60 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau] feb 23 17:21:08 dell-2017 kernel: ? drm_vma_offset_add+0x3c/0x60 feb 23 17:21:08 dell-2017 kernel: ? drm_mode_std+0x479/0x4a0 feb 23 17:21:08 dell-2017 kernel: ttm_mem_evict_first+0x18b/0x210 [ttm] feb 23 17:21:08 dell-2017 kernel: ttm_bo_force_list_clean+0x8a/0x150 [ttm] feb 23 17:21:08 dell-2017 kernel: ? pci_pm_runtime_resume+0xc0/0xc0 feb 23 17:21:08 dell-2017 kernel: nouveau_do_suspend+0x76/0x2a0 [nouveau] feb 23 17:21:08 dell-2017 kernel: nouveau_pmops_runtime_suspend+0x3d/0xa0 [nouveau] feb 23 17:21:08 dell-2017 kernel: pci_pm_runtime_suspend+0x56/0x150 feb 23 17:21:08 dell-2017 kernel: ? next_online_pgdat+0x1d/0x40 feb 23 17:21:08 dell-2017 kernel: __rpm_callback+0xb3/0x1b0 feb 23 17:21:08 dell-2017 kernel: ? pci_pm_runtime_resume+0xc0/0xc0 feb 23 17:21:08 dell-2017 kernel: rpm_callback+0x1a/0x70 feb 23 17:21:08 dell-2017 kernel: ? pci_pm_runtime_resume+0xc0/0xc0 feb 23 17:21:08 dell-2017 kernel: rpm_suspend+0x110/0x520 feb 23 17:21:08 dell-2017 kernel: ? __update_idle_core+0x1b/0xb0 feb 23 17:21:08 dell-2017 kernel: pm_runtime_work+0x5f/0xa0 feb 23 17:21:08 dell-2017 kernel: process_one_work+0x1c3/0x340 feb 23 17:21:08 dell-2017 kernel: worker_thread+0x28/0x3c0 feb 23 17:21:08 dell-2017 kernel: ? set_worker_desc+0x90/0x90 feb 23 17:21:08 dell-2017 kernel: kthread+0x109/0x120 feb 23 17:21:08 dell-2017 kernel: ? kthread_create_worker_on_cpu+0x40/0x40 feb 23 17:21:08 dell-2017 kernel: ret_from_fork+0x1f/0x40 feb 23 17:21:08 dell-2017 kernel: ---[ end trace 3e9fb3a70dfda7a7 ]--- feb 23 17:21:08 dell-2017 kernel: [TTM] Buffer eviction failed Thanks The same with kernel 5.0.0 (In reply to Carlo Caione from comment #18) > (In reply to Carlo Caione from comment #16) > > Created attachment 133924 [details] [review] [review] [review] > > [PATCH] Don't advertise any PRIME offloading capabilities without > > acceleration > > > > (In reply to Michel Dänzer from comment #6) > > > (In reply to Ilia Mirkin from comment #5) > > > > I wonder if this is due to the fact that it's running in NoAccel mode. > > > > > > Possibly, you might want something like > > > > > > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/ > > > ?id=b19417e2fddf4df725951aea5ad5e9558338f59e > > > > Something like this for nouveau? > > ping on this patch. It's not the solution for the underlying problem but at > least it's nice to have Xorg not crashing. Carlo, can you confirm that you've tested this out? My concern is that, without further investigation, it's unclear that it's OK to mess with pScrn->capabilities in ScreenInit -- the other function does it in PreInit. I did test it. But that was more than 1 year ago and I don't have the hw anymore. So not sure what to suggest here. (In reply to Carlo Caione from comment #22) > I did test it. But that was more than 1 year ago and I don't have the hw > anymore. So not sure what to suggest here. Good enough for me. I'll give it a whirl myself too. Thanks! Still the same with 5.2.x kernels -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/351. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 131549 [details] dmesg output after the crash My setup is an Optimus one running modesetting driver for intel card and nouveau 1.0.15 for the nvidia card. I am running kernel 4.11.2 (but 4.12-rc2 fails in the same way). The problem is that each time I try to use my HDMI port relying on reverse PRIME, Xorg segfaults. I simply need to run the following: xrandr --setprovideroutputsource nouveau modesetting (this works) xrandr --output HDMI-1-1 --auto --above eDP-1 -> this causes the segfault