Bug 100228

Summary: [NV137] bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
Product: xorg Reporter: Alexander <fake.ae>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: andrew, andrey+freedesktop, anton.kochkov, bazald, carlo, dirbaio, eric, fdsfgs, florens, francois.michonneau, harry_x, jan.public, krinkodot22, majzoube, markus, micheledisk, nelsonbrandao800, no111u3, nullspoon, pachoramos1, ramazanerol91, rhyskidd, viiru
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=101665
https://bugs.freedesktop.org/show_bug.cgi?id=101782
https://bugs.freedesktop.org/show_bug.cgi?id=101220
https://bugs.freedesktop.org/show_bug.cgi?id=104621
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel logs
none
lspci excerpt (kernel v4.10)
none
dmesg | grep nouveau (kernel v4.12-rc5)
none
NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1) vbios
none
ASUS ROG NVIDIA GTX 1050 Ti vbios.rom (GP107)
none
Kernel log with new firmware
none
Kernel log with new firmware after some boot
none
Lenovo Yoga 530-15 IKB Kernel log and additional information
none
Journal logs from a boot without any modeset-altering kernel parameters.
none
GP107GLM - kernel log
none
GP107GLM (ThinkPad P1) - lspci
none
GP107M (thinkpad x1 extreme) in hybrid graphics mode - fails
none
GP107M (thinkpad x1 extreme) in discrete graphics mode - works none

Description Alexander 2017-03-16 11:38:35 UTC
Created attachment 130256 [details]
kernel logs

Hi

i've recently bought a new laptop HP Omen 15-ax201nc.
It comes with nvidia 1050 ti video card which is supposedly should be supported by nouveau driver. But during boot i get "unknown chipset (137000a1)"
X server is loading regardless, but it has limited features (i think, like no gnome animations is what i see at first place).

Mesa-dri-nouveau version is 11.2.2-166.1
xf86-video-nouveau version is 1.0.14-30.1 
I use opensuse Leap 42.2

uname -a
Linux 78qwHJ4D 4.4.49-16-default #1 SMP Sun Feb 19 17:40:35 UTC 2017 (70e9954) x86_64 x86_64 x86_64 GNU/Linux

dmesg logs are attached
Comment 1 Ilia Mirkin 2017-03-16 17:12:44 UTC
This is expected. None of the software you listed supports your GPU.

With the latest released kernel (4.10), you can easily write a patch that will add support. The reason it has not been integrated is that I believe we're still waiting for a blob mmiotrace (and I don't appear to remember the password for the mmio.dumps@gmail.com account to check - others do though) to make sure that there are no odd differences in setup.

An example of such a patch is available at https://www.spinics.net/lists/dri-devel/msg132664.html .

However this won't get you acceleration - just modesetting. If you want acceleration, you also need the drm-next tree (targeted at Linux 4.12) as well as the associated firmware from nvidia.

Then you'd need a version of xf86-video-nouveau that supports GP10x - the current codebase doesn't, but it should, again, be a trivial patch, just needs testing.

You'll likely also need mesa from git, as the 17.0 release doesn't have some of the relevant logic.

[Long story short - it's not quite there yet.]
Comment 2 Rhys Kidd 2017-06-05 04:02:28 UTC
Created attachment 131701 [details]
lspci excerpt (kernel v4.10)

GP107M [GeForce GTX 1050 Mobile] on XPS 9560
Comment 3 Rhys Kidd 2017-06-21 03:10:49 UTC
Created attachment 132109 [details]
dmesg | grep nouveau (kernel v4.12-rc5)
Comment 4 Rhys Kidd 2017-06-21 03:16:02 UTC
So added a dmesg log excerpt from booting this nv137 (GP107) with the following :

linux kernel v4.12-rc5
xf86-video-nouveau 1:1.0.15-1
linux-firmware 1.166

The card is recognized and boots, although there's a MMIO read fault. dboyan appears to have also run into this same MMIO read fault, see here:

https://gist.github.com/dboyan/4b6999716aad089f398efe8625091b50
Comment 5 Rhys Kidd 2017-06-22 15:00:55 UTC
*** Bug 101553 has been marked as a duplicate of this bug. ***
Comment 6 Rhys Kidd 2017-06-24 14:36:30 UTC
*** Bug 101573 has been marked as a duplicate of this bug. ***
Comment 7 Rhys Kidd 2017-06-24 14:38:49 UTC
harry_x reports that:

> When starting the machine on Dell Inspiron 7000 (Kabylake, GTX 1050 Ti) with
> HDMI monitor connected (HDMI output is provided by NVIDIA card, eDP is 
> connected to internal), everything seems to work (using reverse prime that is 
> automatically setup). There is lot of tearing, but it works.
> 
> But when starting the machine without HDMI output connected (so the NVIDIA
> card has no connected output), it fails with [the timeout]:"
Comment 8 13t8Pm490DD44eZ 2017-06-24 21:39:06 UTC
Created attachment 132226 [details]
NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1) vbios
Comment 9 Rhys Kidd 2017-07-20 14:01:25 UTC
*** Bug 101764 has been marked as a duplicate of this bug. ***
Comment 10 Carlo Caione 2017-07-31 07:34:22 UTC
Same as https://bugs.freedesktop.org/show_bug.cgi?id=101782 ?

Do we have any workaround / fix / WiP for this issue?
Comment 11 13t8Pm490DD44eZ 2017-07-31 07:39:38 UTC
Running with runpm=0 parameter helped me. But that's not a real solution for laptop anyway... So I had to resort to either run nvidia binary blob (which is also batter killer since all rendering is done on NVIDIA) OR not using the external display at all (so when I need battery life I reboot to kernel without nvidia/nouveau driver and disable the card using bbswitch) :-)
Comment 12 Anton Kochkov 2017-08-13 07:43:16 UTC
On the latest kernel - 4.13_rc4 the error is still presented, but error message a bit different it seems:

[ 1677.216453] nouveau 0000:01:00.0: DRM: resuming object tree...
[ 1677.324861] nouveau 0000:01:00.0: DRM: resuming fence...
[ 1677.624298] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
[ 1679.623842] nouveau 0000:01:00.0: timeout
[ 1679.623851] ------------[ cut here ]------------
[ 1679.623873] WARNING: CPU: 3 PID: 6056 at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:1501 gf100_gr_init_ctxctl_ext+0x6a8/0x790 [nouveau]
[ 1679.623874] Modules linked in: ctr ccm 8021q garp stp llc arc4 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ath10k_pci nouveau ath10k_core i915 ath x86_pkg_temp_thermal mac80211 ttm coretemp snd_hda_intel btusb iosf_mbi dell_wmi btrtl dell_laptop sparse_keymap dell_smbios btbcm iTCO_wdt snd_hda_codec kvm_intel drm_kms_helper btintel iTCO_vendor_support dcdbas wmi_bmof mxm_wmi dell_smm_hwmon uvcvideo snd_hda_core kvm bluetooth videobuf2_vmalloc irqbypass snd_hwdep efi_pstore cfg80211 videobuf2_memops videobuf2_v4l2 crc32c_intel videobuf2_core drm ecdh_generic snd_pcm rfkill syscopyarea ghash_clmulni_intel snd_timer sysfillrect cryptd snd videodev sysimgblt fb_sys_fops soundcore efivars serio_raw pcspkr i2c_i801 wmi dell_smo8800 video efivarfs xts cbc libiscsi scsi_transport_iscsi
[ 1679.623899]  vmxnet3 virtio_net virtio_ring virtio tg3 sky2 r8169 pcnet32 mii igb ptp pps_core i2c_algo_bit i2c_core e1000 bnx2 atl1c fuse xfs nfs lockd grace sunrpc fscache jfs reiserfs btrfs ext4 jbd2 ext2 mbcache linear raid10 raid1 raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod dax firewire_core crc_itu_t sl811_hcd xhci_pci xhci_hcd usb_storage aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa cciss 3w_9xxx 3w_xxxx 3w_sas mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod pdc_adma sata_inic162x sata_mv ata_piix ahci libahci
[ 1679.623930]  sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pata_amd pata_ali pata_it8213 pata_pcmcia pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
[ 1679.623942] CPU: 3 PID: 6056 Comm: X Not tainted 4.13.0-rc4 #1
[ 1679.623943] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.3.3 05/08/2017
[ 1679.623943] task: ffff880469764e80 task.stack: ffffc90000ffc000
[ 1679.623960] RIP: 0010:gf100_gr_init_ctxctl_ext+0x6a8/0x790 [nouveau]
[ 1679.623961] RSP: 0018:ffffc90000fff930 EFLAGS: 00010286
[ 1679.623962] RAX: 000000000000001d RBX: ffff88046c48c840 RCX: ffffffff81c4af48
[ 1679.623962] RDX: 0000000000000001 RSI: 0000000000000046 RDI: ffffffff81f792cc
[ 1679.623962] RBP: ffffc90000fff960 R08: 0000000000000001 R09: 00000000000003a6
[ 1679.623963] R10: ffffc90000fff7e0 R11: 00000000000003a6 R12: 0000000077366a80
[ 1679.623963] R13: ffff88046923c000 R14: ffff880468d34080 R15: 000001869a2a33c0
[ 1679.623964] FS:  00007f2f55b798c0(0000) GS:ffff88047f4c0000(0000) knlGS:0000000000000000
[ 1679.623965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1679.623965] CR2: 000000000179d038 CR3: 0000000467d92000 CR4: 00000000003406e0
[ 1679.623965] Call Trace:
[ 1679.623982]  gf100_gr_init_ctxctl+0x1f5/0x290 [nouveau]
[ 1679.623998]  gp100_gr_init+0x757/0x790 [nouveau]
[ 1679.624013]  gf100_gr_init_+0x55/0x60 [nouveau]
[ 1679.624049]  nvkm_gr_init+0x17/0x20 [nouveau]
[ 1679.624059]  nvkm_engine_init+0x131/0x1e0 [nouveau]
[ 1679.624081]  nvkm_subdev_init+0x95/0x200 [nouveau]
[ 1679.624090]  nvkm_engine_ref+0x4f/0x70 [nouveau]
[ 1679.624099]  nvkm_ioctl_new+0xff/0x280 [nouveau]
[ 1679.624114]  ? nvkm_fifo_chan_child_del+0x90/0x90 [nouveau]
[ 1679.624129]  ? gf100_gr_dtor+0xd0/0xd0 [nouveau]
[ 1679.624138]  nvkm_ioctl+0x11b/0x260 [nouveau]
[ 1679.624153]  nvkm_client_ioctl+0x12/0x20 [nouveau]
[ 1679.624162]  nvif_object_ioctl+0x41/0x50 [nouveau]
[ 1679.624170]  nvif_object_init+0xc2/0x120 [nouveau]
[ 1679.624185]  nouveau_abi16_ioctl_grobj_alloc+0x149/0x2c0 [nouveau]
[ 1679.624200]  ? nouveau_abi16_ioctl_channel_free+0x90/0x90 [nouveau]
[ 1679.624206]  drm_ioctl_kernel+0x69/0xb0 [drm]
[ 1679.624210]  drm_ioctl+0x319/0x3f0 [drm]
[ 1679.624224]  ? nouveau_abi16_ioctl_channel_free+0x90/0x90 [nouveau]
[ 1679.624240]  nouveau_drm_ioctl+0x74/0xc0 [nouveau]
[ 1679.624242]  do_vfs_ioctl+0x94/0x5b0
[ 1679.624243]  ? handle_mm_fault+0xf3/0x210
[ 1679.624245]  ? security_file_ioctl+0x43/0x60
[ 1679.624246]  SyS_ioctl+0x79/0x90
[ 1679.624247]  entry_SYSCALL_64_fastpath+0x1c/0xac
[ 1679.624248] RIP: 0033:0x7f2f53a2bba7
[ 1679.624249] RSP: 002b:00007ffd3e2e6be8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 1679.624249] RAX: ffffffffffffffda RBX: 00000000013c6d30 RCX: 00007f2f53a2bba7
[ 1679.624250] RDX: 00007ffd3e2e6c3c RSI: 00000000400c6444 RDI: 000000000000000f
[ 1679.624250] RBP: 00000000013be280 R08: 0000000000000010 R09: 00000000013bd1a0
[ 1679.624251] R10: 0000000000000058 R11: 0000000000003246 R12: 00000000013c6d30
[ 1679.624251] R13: 0000000000479590 R14: 0000000000000006 R15: 00007ffd3e2e7068
[ 1679.624252] Code: fc ff ff 48 8b 7b 10 48 8b 5f 50 48 85 db 75 04 48 8b 5f 10 e8 aa 30 eb df 48 89 da 48 89 c6 48 c7 c7 88 66 62 a1 e8 a9 d0 b3 df <0f> ff b8 f0 ff ff ff e9 6f fc ff ff 41 8b 8d a8 00 00 00 49 8b 
[ 1679.624267] ---[ end trace cea4b3e271e4236a ]---
[ 1679.624269] nouveau 0000:01:00.0: gr: init failed, -16
[ 1681.625476] nouveau 0000:01:00.0: timeout
[ 1681.625480] ------------[ cut here ]------------
Comment 13 Carlo Caione 2017-08-13 10:52:32 UTC
Same as https://bugs.freedesktop.org/show_bug.cgi?id=101764#c3
Comment 14 Rhys Kidd 2017-08-13 13:31:53 UTC
*** Bug 102192 has been marked as a duplicate of this bug. ***
Comment 15 Rhys Kidd 2017-08-17 15:47:24 UTC
*** Bug 102275 has been marked as a duplicate of this bug. ***
Comment 16 Aaron Ball 2017-10-29 00:49:58 UTC
Created attachment 135146 [details]
ASUS ROG NVIDIA GTX 1050 Ti vbios.rom (GP107)

Got mine working (finally) with imirkin's help (thanks very much!). Per harry_x's comment, I just had to modprobe nouveau with runpm=0.

ASUS ROG GL753VE with NVIDIA GTX 1050 Ti (see attached vbios rom, per rhyskidd's request on irc)

Running kernel 4.14.0-rc6
Comment 17 Aaron Ball 2017-10-29 01:09:03 UTC
Per rhyskidd's request, nvapeek output.

> nvapeek 0x101000
> 00101000: 00400080
Comment 18 Rhys Kidd 2018-01-17 04:54:13 UTC
*** Bug 104621 has been marked as a duplicate of this bug. ***
Comment 19 Rhys Kidd 2018-06-03 13:46:39 UTC
*** Bug 106793 has been marked as a duplicate of this bug. ***
Comment 20 Andrew 2018-06-04 05:17:11 UTC
Thanks Rhys.

FWIW I've tried the nouveau.runpm=0 kernel parameter on my NV138 (GT 1030) system but nouveau still failing.

Jun  4 15:03:49 roffey10 dracut-cmdline[252]: Using kernel command line parameters: BOOT_IMAGE=/boot/vmlinuz-4.17.0-0.rc7.git2.1.vanilla.knurd.1.fc28.x86_64 root=/dev/mapper/OSVG-root ro rd.lvm.lv=OSVG/root rd.lvm.lv=OSVG/swap rd.lvm.lv=OSVG/usr rd.driver.pre=vfio-pci rhgb quiet intel_iommu=on nouveau.runpm=0
Comment 21 Rhys Kidd 2018-09-03 20:13:34 UTC
*** Bug 107818 has been marked as a duplicate of this bug. ***
Comment 22 Rhys Kidd 2018-09-04 17:21:45 UTC
So there's updated low-level firmware shipped by NVIDIA for the GP107 (and other pre-GP108 Pascal GPUs) [0]. This might resolve or improve the situation for mobile GP107 users.

   nvidia: switch GP10[2467] to newer scrubber/ACR firmware (from GP108)

   This is being done to resolve issues being seen on a number of newer
   laptop systems which aren't compatible with the older binaries.

Users will likely need to wait for their distribution to ship the updated firmware through the usual package update process, or get the firmware directly from upstream linux-firmware.git.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=85c5d90fc155d78531efa5d2b02e92aaef7e4b88
Comment 23 Boris Vinogradov 2018-09-05 04:48:08 UTC
Created attachment 141453 [details]
Kernel log with new firmware

There're no any changes for me with new firmare from git.
Comment 24 Boris Vinogradov 2018-09-05 05:05:56 UTC
(In reply to Boris Vinogradov from comment #23)
> Created attachment 141453 [details]
> Kernel log with new firmware
> 
> There're no any changes for me with new firmare from git.
(In reply to Boris Vinogradov from comment #23)
> Created attachment 141453 [details]
> Kernel log with new firmware
> 
> There're no any changes for me with new firmare from git.

Sorry, I recheck. No errors in dmesg and sleep mode is work.
Comment 25 Boris Vinogradov 2018-09-05 05:56:47 UTC
Created attachment 141455 [details]
Kernel log with new firmware after some boot

But I have some boot with reproduce error.
Comment 26 Boris Vinogradov 2018-09-11 14:40:29 UTC
(In reply to Boris Vinogradov from comment #25)
> Created attachment 141455 [details]
> Kernel log with new firmware after some boot
> 
> But I have some boot with reproduce error.

I have many experiments with boot and create stable reproduce way:

1)Instalation new version of kernel,
2)Reboot, or cold boot
3)Successefull boot with short time initialization. Sleep/suspend usally work.
4)Reboot, or cold boot
5)Reproduce error again.
Comment 27 Loïc Yhuel 2018-10-04 12:06:13 UTC
No change with the new firmwares on Dell Inspiron Gaming 7567.

Fedora 29 with manually installed new firmwares (kernel-4.18.10-300.fc29.x86_64 + linux-firmware-20180913-87.git44d4fca9.fc30.noarch).

No issue when booting with nouveau.noaccel=1.
Comment 28 Mitchell Keith Bloch 2018-10-06 07:35:03 UTC
Not sure what to add to this. Running a system with a GeForce GT 1030 GV-N1030SL-2GL. It worked fine when passed through to a Windows VM, but when attempting to use it as the Linux 4.18.12 GPU, I get this error. I can confirm that nouveau.runpm=0 has no apparent effect, while nouveau.noaccel=1 eliminates the error. (startx, of course, fails.)

Running kernel 4.14.74, I'm not sure this error appears. It is as though nouveau.noaccel=1 is in effect from the beginning. Perhaps the card is "better" supported with the 4.18 series.
Comment 29 Sebastian 2018-10-15 10:06:06 UTC
Created attachment 142029 [details]
Lenovo Yoga 530-15 IKB Kernel log and additional information
Comment 30 Sebastian 2018-10-15 10:10:21 UTC
I'd vote for an importance upgrade as the affected card spread more and more.
Comment 31 Florens 2018-11-12 15:34:48 UTC
I have a ThinkPad X1 Extreme, whose Nvidia card is also affected by this issue.

4.18.0-10-generic
01:00.0 VGA compatible controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev ff)

Is there anything I can do to help with this issue? I'm unfamiliar with nouveau development, but I do have programming experience.
Comment 32 Florens 2018-11-21 12:09:06 UTC
As Sebastian said, more and more laptops with this card are being sold. This issue is preventing me from having HDMI with power management, in other words, a functional laptop.
Is there *anything* I can do to get more traction on this issue?
Comment 33 krinkodot22 2019-05-07 03:26:18 UTC
The 1050 card has been tripping up Nouveau for me as well, ever since I got my laptop over a year ago.

Card info:

~$ lspci | grep -E 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)

It's possible to use Intel integrated graphics by using the kernel parameters "nomodeset i915.modeset=1" or "nouveau.modeset=0 i915.modeset=1", which is what I've been using. The proprietary Nvidia driver works fine as well, along with the kernel parameters that RPMFusion's package adds.

Without modeset parameters, the laptop used to lock up on boot. Lately, not using modeset parameters has the laptop shut down during the boot process instead.

For what it's worth, I'll attach the journal of a failed boot that I invoked recently. The call traces might be the only important parts, but I'll just post the whole thing in case other parts are useful.
Comment 34 krinkodot22 2019-05-07 03:31:08 UTC
Created attachment 144183 [details]
Journal logs from a boot without any modeset-altering kernel parameters.
Comment 35 Markus Wanner 2019-05-27 20:54:54 UTC
Created attachment 144355 [details]
GP107GLM - kernel log
Comment 36 Markus Wanner 2019-05-27 20:55:44 UTC
Created attachment 144356 [details]
GP107GLM (ThinkPad P1) - lspci
Comment 37 Markus Wanner 2019-05-27 20:56:11 UTC
I attached kernel logs and info from a ThinkPad P1 with a GP107GLM (Quadro P2000 Mobile).

This is with a very recent RC kernel and the nouveau module compiled from git as of today (58c5ebb).  drm.debug and nouveau.debug enabled, but with runpm=0 (which does not seem to make a difference for the first MMIO read timeout.
Comment 38 Pacho Ramos 2019-06-01 14:13:10 UTC
"nouveau.modeset=0 i915.modeset=1" solves the problem for me, thanks a lot! :D
Comment 39 Pacho Ramos 2019-07-31 09:52:31 UTC
(In reply to Pacho Ramos from comment #38)
> "nouveau.modeset=0 i915.modeset=1" solves the problem for me, thanks a lot!
> :D

But, then, nouveau is not listed by:
$ xrandr --listproviders

Then, it is still not usable :(
Comment 40 Pierre Moreau 2019-08-02 20:15:25 UTC
(In reply to Pacho Ramos from comment #39)
> (In reply to Pacho Ramos from comment #38)
> > "nouveau.modeset=0 i915.modeset=1" solves the problem for me, thanks a lot!
> > :D
> 
> But, then, nouveau is not listed by:
> $ xrandr --listproviders
> 
> Then, it is still not usable :(

Setting `nouveau.modeset=0` effectively disables the Nouveau driver (i.e. it will get loaded but will do nothing), which is why it doesn’t get listed by `xrandr --listproviders`.


(In reply to Markus Wanner from comment #35)
> Created attachment 144355 [details]
> GP107GLM - kernel log

FYI, you mistakenly uploaded twice the lspci output. :-)
Comment 41 Dario Nieuwenhuis 2019-08-12 18:21:06 UTC
Created attachment 145041 [details]
GP107M (thinkpad x1 extreme) in hybrid graphics mode - fails
Comment 42 Dario Nieuwenhuis 2019-08-12 18:21:46 UTC
Created attachment 145042 [details]
GP107M (thinkpad x1 extreme) in discrete graphics mode - works
Comment 43 Dario Nieuwenhuis 2019-08-12 18:52:54 UTC
Hello,

I'm also affected by this bug.

Hardware:
- Lenovo ThinkPad x1 Extreme i7-8850H, 32gb, 4k screen
- It has hybrid graphics (nvidia optimus) with:
    - Integrated graphics: Intel UHD 630
    - Discrete graphics: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)
- DMI: LENOVO 20MFCTO1WW/20MFCTO1WW, BIOS N2EET41W (1.23 ) 07/03/2019
- UEFI firmware is up to date.

Software:
- Arch Linux
- Both Linux 5.2.8.arch1-1 and self-built 4d352dbd (latest on branch linux-5.3 on https://github.com/skeggsb/linux/ at the time of writing)
- mesa 19.1.4-1
- linux-firmware 20190717.bf13a71-1
- I'm trying to use wayland, not xorg (to get support for different DPI on different monitors). (xorg doesn't work either anyway)

Now, here is the interesting thing: in the BIOS you can select between hybrid and integrated graphics:

Integrated graphics:
- Both the intel and the nvidia are enabled (both show up in lspci)
- Laptop screen is wired to the intel
- HDMI, USB C is wired to the nvidia
- The laptop screen is usable with the intel driver.
- nouveau doesn't work due to this bug -> there's no way to get the hdmi/usbc working :(
- Great battery life when shutting down the nvidia with bbswitch.

BUT, with discrete graphics:
- Only nvidia is enabled (intel doesn't even show up in lspci)
- Laptop screen is wired to the nvidia
- HDMI, USB C is wired to the nvidia
- Nouveau works perfectly!! Both laptop screen and externals.
- BUT battery life is terrible (because you can't turn off the nvidia on the go)

Conclusion: this bug is somehow related to nvidia optimus. It ONLY occurs in hybrid graphics mode.

Sad thing is I'm forced to choose between bad battery life, or inability to use external monitors, which makes this laptop very unusable :(

I've attached logs for both the discrete graphics mode (everything works) and hybrid graphics mode (nouveau crashes with this bug).

What can I test next that would help fix this? I'd REALLY love to help.
Comment 44 Andrey Melentyev 2019-11-02 15:40:04 UTC
Got some time to test runpm_fixes branch by Karol Herbst: https://github.com/karolherbst/linux/commits/runpm_fixes

Applying the two latest commits bbb0b9a16c86fc54fe296df73000da3fba4e91b6 and 749a9c843f646ceac39e14601b64e5bbf202a47c from this branch to kernel version 5.3.8 allows me to use nouveau on Thinkpad X1 Extreme Gen 1 laptop with GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)

What works

- Offloading with DRI_PRIME
- Reverse PRIME: HDMI port wired to the NVIDIA GPU works
- Power management (see note below)
- Suspend and resume

I was testing under X11 with modesetting DDX for both iGPU and dGPU. Haven't tried Wayland or any sort of video decoding acceleration.

Power management

After loading nouveau module, idle dGPU doesn't power down, albeit power control is set to "auto" for both the GPU and its audio device via sysfs. This can be worked around by setting the power to on and then back to auto:

echo -n "on" >/sys/bus/pci/devices/0000:01:00.1/power/control && echo -n "auto" >/sys/bus/pci/devices/0000:01:00.1/power/control

Once the idle audio device and GPU itself are powered down correctly, I get power usage similar (same?) to when bbswitch is used.

Not sure if https://bugzilla.kernel.org/show_bug.cgi?id=156341 is relevant here.

Thanks Karol and everyone involved, hope to see it mainlined eventually.
Comment 45 Sergey Yanovich 2019-11-16 14:12:23 UTC
(In reply to Loïc Yhuel from comment #27)
> No change with the new firmwares on Dell Inspiron Gaming 7567.

> No issue when booting with nouveau.noaccel=1.

The issue affects Dell G5 15 5587 as well. Ubuntu 19.04 is nearly unusable on it with default settings. "nouveau.noaccel=1" helps.
Comment 46 Martin Peres 2019-12-04 09:25:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/332.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.