Bug 101220 - [NV137/GP107] xorg-server-1.19.3 crashes when trying to enable HDMI output
Summary: [NV137/GP107] xorg-server-1.19.3 crashes when trying to enable HDMI output
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-28 16:56 UTC by Pacho Ramos
Modified: 2019-07-31 09:51 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output after the crash (57.37 KB, text/plain)
2017-05-28 16:56 UTC, Pacho Ramos
no flags Details
backtrace.log (4.16 KB, text/plain)
2017-05-28 16:56 UTC, Pacho Ramos
no flags Details
Xorg.0.log (55.66 KB, text/plain)
2017-05-28 16:57 UTC, Pacho Ramos
no flags Details
core.xz (1.26 MB, application/x-xz)
2017-05-28 16:59 UTC, Pacho Ramos
no flags Details
Photo as soon as I try to start X with kernel 4.12-rc3 (6.67 MB, image/png)
2017-05-30 13:22 UTC, Pacho Ramos
no flags Details
dmesg output with kernel 4.12-rc3 (120.99 KB, text/plain)
2017-05-30 13:24 UTC, Pacho Ramos
no flags Details
[PATCH] Don't advertise any PRIME offloading capabilities without acceleration (1.03 KB, patch)
2017-09-01 10:49 UTC, Carlo Caione
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Pacho Ramos 2017-05-28 16:56:25 UTC
Created attachment 131549 [details]
dmesg output after the crash

My setup is an Optimus one running modesetting driver for intel card and nouveau 1.0.15 for the nvidia card. I am running kernel 4.11.2 (but 4.12-rc2 fails in the same way). 

The problem is that each time I try to use my HDMI port relying on reverse PRIME, Xorg segfaults. I simply need to run the following:
xrandr --setprovideroutputsource nouveau modesetting (this works)
xrandr --output HDMI-1-1 --auto --above eDP-1 -> this causes the segfault
Comment 1 Pacho Ramos 2017-05-28 16:56:44 UTC
Created attachment 131550 [details]
backtrace.log
Comment 2 Pacho Ramos 2017-05-28 16:57:02 UTC
Created attachment 131551 [details]
Xorg.0.log
Comment 3 Pacho Ramos 2017-05-28 16:59:44 UTC
Created attachment 131552 [details]
core.xz

Full core dump (compressed with xz)
Comment 4 Michel Dänzer 2017-05-29 02:34:59 UTC
Looks like a nouveau driver issue — the SetSharedPixmapBacking member of its ScreenRec is NULL.
Comment 5 Ilia Mirkin 2017-05-29 21:57:04 UTC
I wonder if this is due to the fact that it's running in NoAccel mode. (Haven't actually looked at any of the code though.)

Pacho, can you test 4.12-rcN with the nvidia-supplied firmware installed (part of linux-firmware) which will allow you to have acceleration on your board?
Comment 6 Michel Dänzer 2017-05-30 01:38:31 UTC
(In reply to Ilia Mirkin from comment #5)
> I wonder if this is due to the fact that it's running in NoAccel mode.

Possibly, you might want something like

https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=b19417e2fddf4df725951aea5ad5e9558338f59e
Comment 7 Pacho Ramos 2017-05-30 07:37:04 UTC
(In reply to Ilia Mirkin from comment #5)
> I wonder if this is due to the fact that it's running in NoAccel mode.
> (Haven't actually looked at any of the code though.)
> 
> Pacho, can you test 4.12-rcN with the nvidia-supplied firmware installed
> (part of linux-firmware) which will allow you to have acceleration on your
> board?

I tried already, but it's even worse because nouveau causes a kernel Oops when running with acceleration and kernel 4.12-rc2... then, I needed to pass nouveau.noaccel=1 and, even getting X started, I got the same crash (additionally, I don't know why on kernel 4.12-rc2 my touchpad stops to move... it clicks but doesn't move... but that is probably a different bug :( )
Comment 8 Pacho Ramos 2017-05-30 13:22:55 UTC
Created attachment 131572 [details]
Photo as soon as I try to start X with kernel 4.12-rc3

I have tried with 4.12-rc3... but it's even worse and as soon as I try to start X I get a kernel Oops (I have taken a photo to show it)
Comment 9 Pacho Ramos 2017-05-30 13:24:33 UTC
Created attachment 131573 [details]
dmesg output with kernel 4.12-rc3

dmesg looks to contain lots of errors with kernel 4.12-rc3. I am using nouveau 1.0.15 and linux-firmware from 20170519

Thanks
Comment 10 Ilia Mirkin 2017-05-30 13:42:17 UTC
gr fails to come up, and the rest of the board ends up dead too.

On the off chance you're running with a kernel setup aimed at you all of a sudden needing to boot off some exotic RAID controller, thus needing to put your modules into initrd... make sure that the updated linux-firmware is in that initrd, because the originally-released firmware for GP107 was "wrong".
Comment 11 Pacho Ramos 2017-05-30 14:14:09 UTC
I don't know how that RAID is done... this is a Dell Inspiron 15 7000 laptop that probably uses that setup to work with a small SSD as main device and a bigger HD ... but this is working nice with kernel 4.9.x from 4.11 :/

I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I thought I didn't need that as nouveau is compiled as a module and not into the kernel :|
Comment 12 Ilia Mirkin 2017-05-30 14:28:21 UTC
(In reply to Pacho Ramos from comment #11)
> I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I
> thought I didn't need that as nouveau is compiled as a module and not into
> the kernel :|

OK, well if modules are loaded off the FS and not initrd, then you're good on that front, assuming you really do have the updated linux-firwmare (April 4 or later -- https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=b14134583c2a15d4404695f72cb523daedb877ab). Lots of people use initrd without understanding how it works.

So there are two issues here...

#1: we should probably disable reverse prime if you don't have acceleration (or figure out how to make it work without accel... should just be a very slow memcpy away... not great, but better than a non-working screen)

#2: we should figure out what's going on with accel on your GP107 -- looks like stuff is just hanging (that's the "timeout" messages you see, we're waiting for some condition to become true, and it never does).

I can help with the former, and hopefully Ben Skeggs can investigate the latter.
Comment 13 Pacho Ramos 2017-05-31 06:46:42 UTC
(In reply to Ilia Mirkin from comment #12)
> (In reply to Pacho Ramos from comment #11)
> > I don't use any initrd... I could rely on CONFIG_EXTRA_FIRMWARE... but I
> > thought I didn't need that as nouveau is compiled as a module and not into
> > the kernel :|
> 
> OK, well if modules are loaded off the FS and not initrd, then you're good
> on that front, assuming you really do have the updated linux-firwmare (April
> 4 or later --
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> commit/?id=b14134583c2a15d4404695f72cb523daedb877ab). Lots of people use
> initrd without understanding how it works.

Yeah, my linux-firmware snapshot is the one up to https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=df40d15d6ad617e72ce7ea00b91d9117d92dcccc

> 
> So there are two issues here...
> 
> #1: we should probably disable reverse prime if you don't have acceleration
> (or figure out how to make it work without accel... should just be a very
> slow memcpy away... not great, but better than a non-working screen)
> 
> #2: we should figure out what's going on with accel on your GP107 -- looks
> like stuff is just hanging (that's the "timeout" messages you see, we're
> waiting for some condition to become true, and it never does).
> 
> I can help with the former, and hopefully Ben Skeggs can investigate the
> latter.

Great, thanks a lot:)
Comment 14 Pacho Ramos 2017-06-05 11:21:49 UTC
(In reply to Pacho Ramos from comment #7)
[...]
> (additionally, I don't know why on kernel 4.12-rc2 my touchpad stops to
> move... it clicks but doesn't move... but that is probably a different bug
> :( )

This was indeed a different bug that got fixed in rc4 :)... but nouveau still keeps failing in the same way with rc4 :(, do you need updated logs with that kernel? (they seem quite similar though) Thanks
Comment 15 Rhys Kidd 2017-08-31 22:28:27 UTC
Looks very closely related to, if not the same as, bz#100228
Comment 16 Carlo Caione 2017-09-01 10:49:15 UTC
Created attachment 133924 [details] [review]
[PATCH] Don't advertise any PRIME offloading capabilities without acceleration

(In reply to Michel Dänzer from comment #6)
> (In reply to Ilia Mirkin from comment #5)
> > I wonder if this is due to the fact that it's running in NoAccel mode.
> 
> Possibly, you might want something like
> 
> https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/
> ?id=b19417e2fddf4df725951aea5ad5e9558338f59e

Something like this for nouveau?
Comment 17 Pacho Ramos 2017-09-07 08:39:14 UTC
(In reply to Ilia Mirkin from comment #12)
[...]
> #2: we should figure out what's going on with accel on your GP107 -- looks
> like stuff is just hanging (that's the "timeout" messages you see, we're
> waiting for some condition to become true, and it never does).
> 
> I can help with the former, and hopefully Ben Skeggs can investigate the
> latter.

As a side note, I am still unable to run with "accel" enabled even with kernel 4.13.0 :/
Comment 18 Carlo Caione 2017-09-14 10:31:51 UTC
(In reply to Carlo Caione from comment #16)
> Created attachment 133924 [details] [review] [review]
> [PATCH] Don't advertise any PRIME offloading capabilities without
> acceleration
> 
> (In reply to Michel Dänzer from comment #6)
> > (In reply to Ilia Mirkin from comment #5)
> > > I wonder if this is due to the fact that it's running in NoAccel mode.
> > 
> > Possibly, you might want something like
> > 
> > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/
> > ?id=b19417e2fddf4df725951aea5ad5e9558338f59e
> 
> Something like this for nouveau?

ping on this patch. It's not the solution for the underlying problem but at least it's nice to have Xorg not crashing.
Comment 19 Pacho Ramos 2019-02-23 16:50:30 UTC
Any news on this? I still need to run with noaccel with kernel 4.19.25, otherwise system ends up getting hung after showing this errors:
feb 23 17:21:08 dell-2017 kernel: ------------[ cut here ]------------
feb 23 17:21:08 dell-2017 kernel: nouveau 0000:01:00.0: timeout
feb 23 17:21:08 dell-2017 kernel: WARNING: CPU: 4 PID: 64 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c:207 gf100_vmm_flush_+0x149/0x190 [nouveau]
feb 23 17:21:08 dell-2017 kernel: Modules linked in: cmac ctr ccm bnep uvcvideo videobuf2_vmalloc btusb videobuf2_memops videobuf2_v4l2 btrtl videodev btbcm btintel videobuf2_common bluetooth ecdh_generic hid_m>
feb 23 17:21:08 dell-2017 kernel:  acpi_pad int340x_thermal_zone int3400_thermal acpi_thermal_rel vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
feb 23 17:21:08 dell-2017 kernel: CPU: 4 PID: 64 Comm: kworker/4:1 Tainted: G        W  O      4.19.25-gentoo #1
feb 23 17:21:08 dell-2017 kernel: Hardware name: Dell Inc. Inspiron 15 7000 Gaming/065C71, BIOS 1.6.0 03/27/2018
feb 23 17:21:08 dell-2017 kernel: Workqueue: pm pm_runtime_work
feb 23 17:21:08 dell-2017 kernel: RIP: 0010:gf100_vmm_flush_+0x149/0x190 [nouveau]
feb 23 17:21:08 dell-2017 kernel: Code: 5f e9 3b ae 56 e1 48 8b 7d 10 48 8b 5f 50 48 85 db 74 46 e8 09 7b 32 e1 48 89 da 48 89 c6 48 c7 c7 a4 a8 3a a0 e8 07 63 df e0 <0f> 0b eb c2 48 8b 7d 10 48 8b 5f 50 48 85 >
feb 23 17:21:08 dell-2017 kernel: RSP: 0018:ffffc90001b53718 EFLAGS: 00010296
feb 23 17:21:08 dell-2017 kernel: RAX: 000000000000001d RBX: ffff88846cdae2d0 RCX: 0000000000000006
feb 23 17:21:08 dell-2017 kernel: RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88846f7153f0
feb 23 17:21:08 dell-2017 kernel: RBP: ffff88846bc37800 R08: 0000000000000001 R09: 00000000000004b3
feb 23 17:21:08 dell-2017 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff88846277e660
feb 23 17:21:08 dell-2017 kernel: R13: 0000000c6f79dd60 R14: ffff88846aa34020 R15: ffff88846aad3600
feb 23 17:21:08 dell-2017 kernel: FS:  0000000000000000(0000) GS:ffff88846f700000(0000) knlGS:0000000000000000
feb 23 17:21:08 dell-2017 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 23 17:21:08 dell-2017 kernel: CR2: 00007fab22c8d4a0 CR3: 000000000200a002 CR4: 00000000003606e0
feb 23 17:21:08 dell-2017 kernel: Call Trace:
feb 23 17:21:08 dell-2017 kernel:  nvkm_vmm_iter.constprop.15+0x2cf/0x7e0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? nvkm_vmm_map+0xb8/0x3e0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nvkm_vmm_map+0x1a5/0x3e0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? gp100_vmm_pgt_sgl+0x180/0x180 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nvkm_vram_map+0x43/0x50 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nvkm_uvmm_mthd+0x71e/0x850 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? lock_timer_base+0x62/0x80
feb 23 17:21:08 dell-2017 kernel:  nvkm_ioctl+0x105/0x240 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nvif_object_mthd+0xd3/0xf0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? dma_fence_wait_timeout+0x30/0x30
feb 23 17:21:08 dell-2017 kernel:  nvif_vmm_map+0xef/0x110 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nouveau_mem_map+0x73/0xd0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nouveau_vma_map+0x2f/0x40 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nouveau_bo_move_ntfy+0x6b/0xd0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ttm_bo_handle_move_mem+0x3b1/0x590 [ttm]
feb 23 17:21:08 dell-2017 kernel:  ? drm_vma_offset_add+0x3c/0x60
feb 23 17:21:08 dell-2017 kernel:  ttm_bo_evict+0x145/0x320 [ttm]
feb 23 17:21:08 dell-2017 kernel:  ? gf119_disp_chan_uevent_fini+0x3d/0x60 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  ? drm_vma_offset_add+0x3c/0x60
feb 23 17:21:08 dell-2017 kernel:  ? drm_mode_std+0x479/0x4a0
feb 23 17:21:08 dell-2017 kernel:  ttm_mem_evict_first+0x18b/0x210 [ttm]
feb 23 17:21:08 dell-2017 kernel:  ttm_bo_force_list_clean+0x8a/0x150 [ttm]
feb 23 17:21:08 dell-2017 kernel:  ? pci_pm_runtime_resume+0xc0/0xc0
feb 23 17:21:08 dell-2017 kernel:  nouveau_do_suspend+0x76/0x2a0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  nouveau_pmops_runtime_suspend+0x3d/0xa0 [nouveau]
feb 23 17:21:08 dell-2017 kernel:  pci_pm_runtime_suspend+0x56/0x150
feb 23 17:21:08 dell-2017 kernel:  ? next_online_pgdat+0x1d/0x40
feb 23 17:21:08 dell-2017 kernel:  __rpm_callback+0xb3/0x1b0
feb 23 17:21:08 dell-2017 kernel:  ? pci_pm_runtime_resume+0xc0/0xc0
feb 23 17:21:08 dell-2017 kernel:  rpm_callback+0x1a/0x70
feb 23 17:21:08 dell-2017 kernel:  ? pci_pm_runtime_resume+0xc0/0xc0
feb 23 17:21:08 dell-2017 kernel:  rpm_suspend+0x110/0x520
feb 23 17:21:08 dell-2017 kernel:  ? __update_idle_core+0x1b/0xb0
feb 23 17:21:08 dell-2017 kernel:  pm_runtime_work+0x5f/0xa0
feb 23 17:21:08 dell-2017 kernel:  process_one_work+0x1c3/0x340
feb 23 17:21:08 dell-2017 kernel:  worker_thread+0x28/0x3c0
feb 23 17:21:08 dell-2017 kernel:  ? set_worker_desc+0x90/0x90
feb 23 17:21:08 dell-2017 kernel:  kthread+0x109/0x120
feb 23 17:21:08 dell-2017 kernel:  ? kthread_create_worker_on_cpu+0x40/0x40
feb 23 17:21:08 dell-2017 kernel:  ret_from_fork+0x1f/0x40
feb 23 17:21:08 dell-2017 kernel: ---[ end trace 3e9fb3a70dfda7a7 ]---
feb 23 17:21:08 dell-2017 kernel: [TTM] Buffer eviction failed

Thanks
Comment 20 Pacho Ramos 2019-03-09 18:33:43 UTC
The same with kernel 5.0.0
Comment 21 Ilia Mirkin 2019-03-09 18:45:49 UTC
(In reply to Carlo Caione from comment #18)
> (In reply to Carlo Caione from comment #16)
> > Created attachment 133924 [details] [review] [review] [review]
> > [PATCH] Don't advertise any PRIME offloading capabilities without
> > acceleration
> > 
> > (In reply to Michel Dänzer from comment #6)
> > > (In reply to Ilia Mirkin from comment #5)
> > > > I wonder if this is due to the fact that it's running in NoAccel mode.
> > > 
> > > Possibly, you might want something like
> > > 
> > > https://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/
> > > ?id=b19417e2fddf4df725951aea5ad5e9558338f59e
> > 
> > Something like this for nouveau?
> 
> ping on this patch. It's not the solution for the underlying problem but at
> least it's nice to have Xorg not crashing.

Carlo, can you confirm that you've tested this out? My concern is that, without further investigation, it's unclear that it's OK to mess with pScrn->capabilities in ScreenInit -- the other function does it in PreInit.
Comment 22 Carlo Caione 2019-03-09 18:47:52 UTC
I did test it. But that was more than 1 year ago and I don't have the hw anymore. So not sure what to suggest here.
Comment 23 Ilia Mirkin 2019-03-09 18:48:43 UTC
(In reply to Carlo Caione from comment #22)
> I did test it. But that was more than 1 year ago and I don't have the hw
> anymore. So not sure what to suggest here.

Good enough for me. I'll give it a whirl myself too. Thanks!
Comment 24 Pacho Ramos 2019-07-31 09:51:50 UTC
Still the same with 5.2.x kernels


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.