105173 – [MCP79][Regression] Unhandled NULL pointer dereference in nvkm_object_unmap since kernel 4.15

Bug 105173 - [MCP79][Regression] Unhandled NULL pointer dereference in nvkm_object_unmap since kernel 4.15

Summary: [MCP79][Regression] Unhandled NULL pointer dereference in nvkm_object_unmap s...

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/nouveau (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Nouveau Project
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-20 09:48 UTC by Nick Lee
Modified:	2018-03-18 19:19 UTC (History)
CC List:	1 user (show)

See Also:	105319
i915 platform:
i915 features:

Attachments
Screenshot immediately after the launch supertuxkart (928.43 KB, image/png) 2018-02-20 09:48 UTC, Nick Lee	no flags	Details
dmesg log wayland after launch supertuxkart (69.08 KB, text/plain) 2018-02-20 09:50 UTC, Nick Lee	no flags	Details
xorg session dmesg log after launch supertuxkart (189.34 KB, text/plain) 2018-02-20 09:52 UTC, Nick Lee	no flags	Details
Proposed patch (1.00 KB, patch) 2018-03-01 18:21 UTC, Pierre Moreau	no flags	Details \| Splinter Review
View All

Description Nick Lee 2018-02-20 09:48:28 UTC

Created attachment 137459 [details]
Screenshot immediately after the launch supertuxkart

Description of problem:

With kernel >= 4.15 I got artefacts and freezes

Version-Release number of selected component (if applicable):

4.15.3-300.fc27.x86_64 and newer

How reproducible:

always

Steps to Reproduce:
1. boot with kernel-4.15
2. launch opengl app (supertuxcat)
3. get artefacts and then freeze

Actual results:

artefacts and freezes

Additional info:

any other kernels <4.15 do not have this problem

03:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9300 / nForce 730i] (rev b1)

Comment 1 Nick Lee 2018-02-20 09:50:44 UTC

Created attachment 137460 [details]
dmesg log wayland after launch supertuxkart

Comment 2 Nick Lee 2018-02-20 09:52:01 UTC

Created attachment 137461 [details]
xorg session dmesg log after launch supertuxkart

Comment 3 Pierre Moreau 2018-02-20 10:07:26 UTC

Thanks for the report. I’ll try to reproduce the issue on my laptop and if that works, bisect the kernel to figure out which change introduce the issue.

Looking at the logs, it seems like there is some out-of-memory error

> [   56.900580] nouveau 0000:03:00.0: imem: OOM: 0004b000 00000000 -28

followed by a NULL pointer dereference when trying to unmap an object

> [   56.900593] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [   56.900747] IP: nvkm_object_unmap+0x5/0x20 [nouveau]
> [   56.900754] PGD 0 P4D 0 
> [   56.900761] Oops: 0000 [#1] SMP PTI
> [   56.900767] Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc snd_hda_codec_hdmi xfs libcrc32c snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq coretemp snd_seq_device snd_pcm wmi_bmof shpchp snd_timer snd soundcore nv_tco i2c_nforce2 acpi_cpufreq binfmt_misc nouveau i2c_algo_bit
> [   56.900857]  mxm_wmi drm_kms_helper ttm drm serio_raw forcedeth video wmi
> [   56.900870] CPU: 1 PID: 2856 Comm: supertuxkart Not tainted 4.16.0-0.rc2.git0.1.fc28.x86_64 #1
> [   56.900876] Hardware name: NVIDIA MCP7A/MCP7A, BIOS 6.00 PG 04/22/2009
> [   56.900910] RIP: 0010:nvkm_object_unmap+0x5/0x20 [nouveau]
> [   56.900916] RSP: 0018:ffffae3c4188bca0 EFLAGS: 00010282
> [   56.900922] RAX: ffffffffc0592400 RBX: ffff9c81cb2cf198 RCX: 0000000000000018
> [   56.900928] RDX: ffffffffc04ac1b0 RSI: ffff9c81cb2cf1b8 RDI: 0000000000000000
> [   56.900934] RBP: ffff9c81cb2cf188 R08: 00000000000250c0 R09: ffffffffc04a9b63
> [   56.900941] R10: ffffd07b4280a8c0 R11: ffffffff959711ed R12: ffff9c81cb2cf1b8
> [   56.900947] R13: 0000000d4fb70488 R14: ffff9c8200180020 R15: 0000000000000006
> [   56.900955] FS:  00007fad6be37840(0000) GS:ffff9c822fc80000(0000) knlGS:0000000000000000
> [   56.900961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   56.900967] CR2: 0000000000000000 CR3: 0000000086db2000 CR4: 00000000000406e0
> [   56.900974] Call Trace:
> [   56.901012]  nvkm_object_dtor+0x96/0x100 [nouveau]
> [   56.901046]  ? nvkm_object_del+0x24/0xa0 [nouveau]
> [   56.901075]  ? nvkm_ioctl_new+0x1ee/0x220 [nouveau]
> [   56.901116]  ? nvkm_fifo_chan_dtor+0xf0/0xf0 [nouveau]
> [   56.901148]  ? nvkm_object_new_+0x60/0x60 [nouveau]
> [   56.901180]  ? nvkm_ioctl+0xd8/0x170 [nouveau]
> [   56.901222]  ? usif_ioctl+0x6b1/0x730 [nouveau]
> [   56.901262]  ? nouveau_drm_ioctl+0xad/0xc0 [nouveau]
> [   56.901271]  ? do_vfs_ioctl+0xa4/0x610
> [   56.901277]  ? SyS_ioctl+0x74/0x80
> [   56.901285]  ? do_syscall_64+0x74/0x180
> [   56.901295]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [   56.901301] Code: ff c3 0f 1f 40 00 66 66 66 66 90 48 8b 07 48 8b 40 28 48 85 c0 74 05 e9 0a 76 75 d4 b8 ed ff ff ff c3 0f 1f 40 00 66 66 66 66 90 <48> 8b 07 48 8b 40 30 48 85 c0 74 05 e9 ea 75 75 d4 b8 ed ff ff 
> [   56.901373] RIP: nvkm_object_unmap+0x5/0x20 [nouveau] RSP: ffffae3c4188bca0
> [   56.901380] CR2: 0000000000000000
> [   56.910903] ---[ end trace bde3a9a90b3fc089 ]---

Comment 4 Nick Lee 2018-02-21 12:37:44 UTC

(In reply to Pierre Moreau from comment #3)
> I’ll try to reproduce the issue on my laptop

I tried reproduce this bug from openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20180219-Media.iso 
kernel-4.15.4-default
session x11
not reproducible.

But reproducible from Fedora-Workstation-Live-x86_64-Rawhide-20180219.n.0.iso

Did you reproduce it?

Comment 5 Pierre Moreau 2018-02-21 17:31:59 UTC

(In reply to Nick Lee from comment #4)
> (In reply to Pierre Moreau from comment #3)
> > I’ll try to reproduce the issue on my laptop
> 
> I tried reproduce this bug from
> openSUSE-Tumbleweed-GNOME-Live-x86_64-Snapshot20180219-Media.iso 
> kernel-4.15.4-default
> session x11
> not reproducible.
> 
> But reproducible from Fedora-Workstation-Live-x86_64-Rawhide-20180219.n.0.iso
> 
> Did you reproduce it?

I tried running supertuxkart on my MCP79 (9400M), got some artefacts, but it did not freeze; the only error I got was “nouveau 0000:03:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002”.

Which version of Mesa, xf86-video-nouveau (or modesetting) did you have in those tests?

For reference, I have:
Linux: 4.15.4
Mesa: 17.3.5
xf86-video-nouveau: 1.0.15
xorg-server: some 1.19.6 snapshot

Comment 6 Nick Lee 2018-02-21 19:33:51 UTC

(In reply to Pierre Moreau from comment #5)

> I tried running supertuxkart on my MCP79 (9400M), got some artefacts, but it
> did not freeze; the only error I got was “nouveau 0000:03:00.0: fifo:
> CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002”.

i got same artefact after launch gtk4-demo and after launch disk-usage-analizer

> Which version of Mesa, xf86-video-nouveau (or modesetting) did you have in
> those tests?

curently fedora:
kernel-4.16.0-0.rc2.git0.1
mesa-17.3.5-1
xorg-x11-drv-nouveau-1.0.15-3
xorg-x11-server-Xorg-1.19.6-5
libwayland-server-1.14.0-2

Comment 7 Nick Lee 2018-02-21 20:36:45 UTC

Suse:

xorg-x11-driver-video-7.6_1-17.1.x86_64
xorg-x11-server-1.19.6-3.1.x86_64
Mesa-18.0.0-187.1.x86_64
kernel-4.15.4-1-default

Comment 8 Nick Lee 2018-02-22 10:19:43 UTC

Also tried:

Vanilla kernel 4.15.4-300.vanilla.knurd.1.fc27.x86_64
mesa-17.3.5
wayland session
reproducible

Fedora-Workstation-Live-x86_64-Rawhide-20180220.n.0.iso
wayland session
mesa-18.0.0-0.1.rc4.fc28.x86_64
Got artefacts with dmesg output:

[ 1035.437016] nouveau 0000:03:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 1035.437020] nouveau 0000:03:00.0: swiotlb: coherent allocation failed, size=2097152
[ 1035.437023] CPU: 0 PID: 1842 Comm: gnome-shell Not tainted 4.16.0-0.rc2.git0.1.fc28.x86_64 #1
[ 1035.437024] Hardware name: NVIDIA MCP7A/MCP7A, BIOS 6.00 PG 04/22/2009
[ 1035.437025] Call Trace:
[ 1035.437036]  dump_stack+0x5c/0x85
[ 1035.437040]  swiotlb_alloc_coherent+0x1c3/0x1e0
[ 1035.437052]  ttm_dma_pool_get_pages+0x21a/0x620 [ttm]
[ 1035.437057]  ttm_dma_populate+0xdd/0x390 [ttm]
[ 1035.437062]  ttm_tt_bind+0x2e/0x60 [ttm]
[ 1035.437067]  ttm_bo_handle_move_mem+0x4cf/0x550 [ttm]
[ 1035.437073]  ttm_bo_validate+0x119/0x130 [ttm]
[ 1035.437104]  ? drm_get_edid_switcheroo+0x16/0x40 [drm]
[ 1035.437109]  ttm_bo_init_reserved+0x334/0x380 [ttm]
[ 1035.437114]  ? ttm_bo_init+0x62/0xd0 [ttm]
[ 1035.437190]  ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[ 1035.437226]  ? nouveau_bo_new+0x401/0x580 [nouveau]
[ 1035.437262]  ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[ 1035.437298]  ? nouveau_gem_new+0x120/0x120 [nouveau]
[ 1035.437334]  ? nouveau_gem_new+0x5d/0x120 [nouveau]
[ 1035.437370]  ? nouveau_gem_ioctl_new+0x53/0xe0 [nouveau]
[ 1035.437381]  ? drm_ioctl_kernel+0x5b/0xb0 [drm]
[ 1035.437392]  ? drm_ioctl+0x1c4/0x380 [drm]
[ 1035.437428]  ? nouveau_gem_new+0x120/0x120 [nouveau]
[ 1035.437431]  ? eventfd_write+0x94/0x2a0
[ 1035.437467]  ? nouveau_drm_ioctl+0x65/0xc0 [nouveau]
[ 1035.437470]  ? do_vfs_ioctl+0xa4/0x610
[ 1035.437471]  ? SyS_ioctl+0x74/0x80
[ 1035.437475]  ? do_syscall_64+0x74/0x180
[ 1035.437478]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Comment 9 Sérgio M. Basto 2018-02-28 00:04:03 UTC

Complete unresponsive my nvidia , with nouveau or nvidia drive when boot with kernel 4.15.x last kernel tested :  4.15.4 


01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M GS] (rev a1)

[   29.876760] nouveau 0000:01:00.0: DRM: EVO timeout

Comment 10 Pierre Moreau 2018-02-28 21:54:55 UTC

(In reply to Nick Lee from comment #8)
> Also tried:
> 
> Vanilla kernel 4.15.4-300.vanilla.knurd.1.fc27.x86_64
> mesa-17.3.5
> wayland session
> reproducible

What exactly is reproducible? The artefacts I would assume, but which error exactly? The NULL pointer dereference, or the “trapped read at 0080000000 on channel 1 [0fbb0000 DRM] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000006 [NULL_DMAOBJ]” one?
I did manage to reproduce the second one with Wayland, but still no luck with the first one.

(In reply to Nick Lee from comment #8)
> Fedora-Workstation-Live-x86_64-Rawhide-20180220.n.0.iso
> wayland session
> mesa-18.0.0-0.1.rc4.fc28.x86_64
> Got artefacts with dmesg output:
> 
> [ 1035.437016] nouveau 0000:03:00.0: swiotlb buffer is full (sz: 2097152
> bytes)
> [ 1035.437020] nouveau 0000:03:00.0: swiotlb: coherent allocation failed,
> size=2097152
> [ 1035.437023] CPU: 0 PID: 1842 Comm: gnome-shell Not tainted
> 4.16.0-0.rc2.git0.1.fc28.x86_64 #1
> [ 1035.437024] Hardware name: NVIDIA MCP7A/MCP7A, BIOS 6.00 PG 04/22/2009
> [ 1035.437025] Call Trace:
> [ 1035.437036]  dump_stack+0x5c/0x85
> [ 1035.437040]  swiotlb_alloc_coherent+0x1c3/0x1e0
> [ 1035.437052]  ttm_dma_pool_get_pages+0x21a/0x620 [ttm]
> [ 1035.437057]  ttm_dma_populate+0xdd/0x390 [ttm]
> [ 1035.437062]  ttm_tt_bind+0x2e/0x60 [ttm]
> [ 1035.437067]  ttm_bo_handle_move_mem+0x4cf/0x550 [ttm]
> [ 1035.437073]  ttm_bo_validate+0x119/0x130 [ttm]
> [ 1035.437104]  ? drm_get_edid_switcheroo+0x16/0x40 [drm]
> [ 1035.437109]  ttm_bo_init_reserved+0x334/0x380 [ttm]
> [ 1035.437114]  ? ttm_bo_init+0x62/0xd0 [ttm]
> [ 1035.437190]  ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
> [ 1035.437226]  ? nouveau_bo_new+0x401/0x580 [nouveau]
> [ 1035.437262]  ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
> [ 1035.437298]  ? nouveau_gem_new+0x120/0x120 [nouveau]
> [ 1035.437334]  ? nouveau_gem_new+0x5d/0x120 [nouveau]
> [ 1035.437370]  ? nouveau_gem_ioctl_new+0x53/0xe0 [nouveau]
> [ 1035.437381]  ? drm_ioctl_kernel+0x5b/0xb0 [drm]
> [ 1035.437392]  ? drm_ioctl+0x1c4/0x380 [drm]
> [ 1035.437428]  ? nouveau_gem_new+0x120/0x120 [nouveau]
> [ 1035.437431]  ? eventfd_write+0x94/0x2a0
> [ 1035.437467]  ? nouveau_drm_ioctl+0x65/0xc0 [nouveau]
> [ 1035.437470]  ? do_vfs_ioctl+0xa4/0x610
> [ 1035.437471]  ? SyS_ioctl+0x74/0x80
> [ 1035.437475]  ? do_syscall_64+0x74/0x180
> [ 1035.437478]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2

I’ll try the iso. Did you get this error with supertuxkart as well, or simply by launching the session?


(In reply to Sérgio M. Basto from comment #9)
> Complete unresponsive my nvidia , with nouveau or nvidia drive when boot
> with kernel 4.15.x last kernel tested :  4.15.4 
> 
> 
> 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce 9300M
> GS] (rev a1)
> 
> [   29.876760] nouveau 0000:01:00.0: DRM: EVO timeout

It might be a different issue (especially if you get a freeze even when using the NVIDIA driver. Please open a separate bug report; it’s always easier to merge two bug reports than to split one in two.

Comment 11 Nick Lee 2018-03-01 08:20:02 UTC

(In reply to Pierre Moreau from comment #10)

> What exactly is reproducible? The artefacts I would assume, but which error
> exactly? The NULL pointer dereference, or the “trapped read at 0080000000 on
> channel 1 [0fbb0000 DRM] engine 00 [PGRAPH] client 03 [DISPATCH] subclient
> 04 [M2M_IN] reason 00000006 [NULL_DMAOBJ]” one?
> I did manage to reproduce the second one with Wayland, but still no luck
> with the first one.

This error:
[  372.954548] nouveau 0000:03:00.0: imem: OOM: 00100000 00001000 -28
[  372.954702] nouveau 0000:03:00.0: gr: TRAP_M2MF 00000002 [IN]
[  372.954712] nouveau 0000:03:00.0: gr: TRAP_M2MF 00320951 c0001fc0 00000000 04000430
[  372.954722] nouveau 0000:03:00.0: gr: 00200000 [] ch 1 [000fbb0000 DRM] subc 4 class 5039 mthd 0100 data 00000000
[  372.954752] nouveau 0000:03:00.0: fb: trapped read at 00c0000000 on channel 1 [0fbb0000 DRM] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000000 [PT_NOT_PRESENT]

kernel-4.15.6-300.vanilla.knurd.1.fc27.x86_64
mesa-17.3.6

Screenshot: https://s9.postimg.org/hajle1skf/Screenshot_from_2018-03-01_10-12-28.png

> (In reply to Nick Lee from comment #8)
> > Fedora-Workstation-Live-x86_64-Rawhide-20180220.n.0.iso
> > wayland session
> > 
> > [ 1035.437016] nouveau 0000:03:00.0: swiotlb buffer is full (sz: 2097152
> 
> I’ll try the iso. Did you get this error with supertuxkart as well, or
> simply by launching the session?
> 

by launching the session

Comment 12 Nick Lee 2018-03-01 13:32:26 UTC

> The NULL pointer dereference, or the “trapped read at 0080000000 on channel 1 
> [0fbb0000 DRM] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] 
> reason 00000006 [NULL_DMAOBJ]” one?

"NULL pointer dereference" AND "trapped read" after launtching supertuxkart

kernel-4.16.0-0.rc3.git2.1.vanilla.knurd.1.fc27.x86_64
mesa-17.3.6
wayland session

[   63.992917] nouveau 0000:03:00.0: imem: OOM: 0004b000 00000000 -28
[   63.992930] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[   63.993014] IP: nvkm_object_unmap+0x5/0x20 [nouveau]
[   63.993020] PGD 0 P4D 0 
[   63.993027] Oops: 0000 [#1] SMP PTI
[   63.993034] Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi sunrpc xfs libcrc32c snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec wmi_bmof pcspkr snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer shpchp snd nv_tco soundcore i2c_nforce2 acpi_cpufreq binfmt_misc nouveau
[   63.993122]  mxm_wmi i2c_algo_bit drm_kms_helper ttm drm serio_raw forcedeth video wmi
[   63.993144] CPU: 0 PID: 2867 Comm: supertuxkart Not tainted 4.16.0-0.rc3.git2.1.vanilla.knurd.1.fc27.x86_64 #1
[   63.993153] Hardware name: NVIDIA MCP7A/MCP7A, BIOS 6.00 PG 04/22/2009
[   63.993182] RIP: 0010:nvkm_object_unmap+0x5/0x20 [nouveau]
[   63.993188] RSP: 0018:ffffad338456fc98 EFLAGS: 00010282
[   63.993194] RAX: ffffffffc036d400 RBX: ffff94b4cdf513d8 RCX: 0000000000000018
[   63.993201] RDX: ffffffffc028a9e0 RSI: ffff94b4cdf513f8 RDI: 0000000000000000
[   63.993207] RBP: ffff94b4cdf513c8 R08: 00000000000250c0 R09: ffffffffc0287ca3
[   63.993213] R10: fffff9754294c340 R11: ffffffffaa9440cd R12: ffff94b4cdf513f8
[   63.993219] R13: 0000000ecba0cfdc R14: ffff94b55c8e7020 R15: 0000000000000020
[   63.993226] FS:  00007f77ac70d840(0000) GS:ffff94b56fc00000(0000) knlGS:0000000000000000
[   63.993233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   63.993238] CR2: 0000000000000000 CR3: 000000006d418000 CR4: 00000000000406f0
[   63.993244] Call Trace:
[   63.993276]  nvkm_object_dtor+0x9a/0x160 [nouveau]
[   63.993304]  nvkm_object_del+0x24/0xa0 [nouveau]
[   63.993331]  nvkm_ioctl_new+0x260/0x2b0 [nouveau]
[   63.993371]  ? nvkm_fifo_chan_dtor+0x100/0x100 [nouveau]
[   63.993398]  ? nvkm_object_new_+0x60/0x60 [nouveau]
[   63.993425]  nvkm_ioctl+0x10a/0x240 [nouveau]
[   63.993464]  usif_ioctl+0x62e/0x740 [nouveau]
[   63.993504]  nouveau_drm_ioctl+0xad/0xc0 [nouveau]
[   63.993514]  do_vfs_ioctl+0xa4/0x620
[   63.993521]  SyS_ioctl+0x74/0x80
[   63.993529]  do_syscall_64+0x74/0x180
[   63.993536]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   63.993543] RIP: 0033:0x7f77a89bf8e7
[   63.993547] RSP: 002b:00007ffc62fbfd28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   63.993554] RAX: ffffffffffffffda RBX: 0000000000000038 RCX: 00007f77a89bf8e7
[   63.993561] RDX: 000055a3912a7d70 RSI: 00000000c0386447 RDI: 0000000000000007
[   63.993566] RBP: 000055a3912a7d70 R08: 000055a39129f910 R09: 00007f77a8a14708
[   63.993572] R10: ffffffffffffff90 R11: 0000000000000246 R12: 00000000c0386447
[   63.993579] R13: 0000000000000007 R14: 000055a3912a7da8 R15: 0000000000000000
[   63.993585] Code: ff c3 0f 1f 40 00 66 66 66 66 90 48 8b 07 48 8b 40 28 48 85 c0 74 05 e9 6a 8f 97 e9 b8 ed ff ff ff c3 0f 1f 40 00 66 66 66 66 90 <48> 8b 07 48 8b 40 30 48 85 c0 74 05 e9 4a 8f 97 e9 b8 ed ff ff 
[   63.993651] RIP: nvkm_object_unmap+0x5/0x20 [nouveau] RSP: ffffad338456fc98
[   63.993657] CR2: 0000000000000000
[   63.997842] ---[ end trace a49568284ce09eb6 ]---
[   79.659127] nouveau 0000:03:00.0: imem: OOM: 00100000 00001000 -28
[   79.659723] nouveau 0000:03:00.0: gr: TRAP_M2MF 00000002 [IN]
[   79.659729] nouveau 0000:03:00.0: gr: TRAP_M2MF 00320951 206f1fc0 00000000 04000430
[   79.659733] nouveau 0000:03:00.0: gr: 00200000 [] ch 1 [000fbb0000 DRM] subc 4 class 5039 mthd 0100 data 00000000
[   79.659746] nouveau 0000:03:00.0: fb: trapped read at 00206f0000 on channel 1 [0fbb0000 DRM] engine 00 [PGRAPH] client 03 [DISPATCH] subclient 04 [M2M_IN] reason 00000002 [PAGE_NOT_PRESENT]

Comment 13 Pierre Moreau 2018-03-01 18:21:20 UTC

Created attachment 137730 [details] [review]
Proposed patch

Okay, thank you for the confirmation. So, I am still failing to reproduce the NULL pointer dereference, but I can reproduce the other errors, so that’s a start.
Could you please try the attached patch (from https://github.com/skeggsb/nouveau/pull/1)? It fixes the errors on my end at least.

@Sergio You might want to give that patch a try as well.

Comment 14 Nick Lee 2018-03-01 23:04:58 UTC

(In reply to Pierre Moreau from comment #13)
> Created attachment 137730 [details] [review] [review]
> Proposed patch

> Could you please try the attached patch (from
> https://github.com/skeggsb/nouveau/pull/1)? It fixes the errors on my end at
> least.

Yes, it works! I applied the patch to kernel-4.16.0-0.rc3.git2.1. After launching supertuxkart i see only:

[   22.471837] fuse init (API version 7.26)
[   52.346467] nouveau 0000:03:00.0: imem: OOM: 00100000 00001000 -28
[   52.346516] nouveau 0000:03:00.0: imem: OOM: 00100000 00001000 -28
[  108.372556] perf: interrupt took too long (2530 > 2500), lowering kernel.perf_event_max_sample_rate to 79000


No artefacts.

Comment 15 Nick Lee 2018-03-02 07:42:24 UTC

today I got that (without OOM Error):

[ 1868.631494] nouveau 0000:03:00.0: gr: magic set 0:
[ 1868.631502] nouveau 0000:03:00.0: gr: 	00408604: 98084805
[ 1868.631507] nouveau 0000:03:00.0: gr: 	00408608: 00383d66
[ 1868.631511] nouveau 0000:03:00.0: gr: 	0040860c: 40000430
[ 1868.631515] nouveau 0000:03:00.0: gr: 	00408610: 3bd00002
[ 1868.631520] nouveau 0000:03:00.0: gr: TRAP_TEXTURE - TP0: 00000009 [ LINEAR_MISMATCH]
[ 1868.631527] nouveau 0000:03:00.0: gr: 00200000 [] ch 24 [000db13000 Xwayland[2070]] subc 3 class 8397 mthd 0f04 data 00000000

I don't know when moment it happened and how to reproduce. Graphics system works without issues (without artefacts etc) The same I got earlier on kernels < 4.15.

Comment 16 Pierre Moreau 2018-03-17 15:10:23 UTC

Fyi, the attached patch has been submitted along fixes to DRM. It doesn’t look like it has landed yet, but might be part of 4.16-rc6.

It’s great that it helped you as well.
You could open a separate bug report for that TRAP_TEXTURE error, though it might be difficult to check wether a patch fixes it, if you do not know how to reproduce it.

Thank you for your replies and testing.

Comment 17 Sérgio M. Basto 2018-03-17 15:15:39 UTC

I tested https://bugs.freedesktop.org/attachment.cgi?id=137730 on a kernel-4.15
but doesn't fixed my boot problem .

Comment 18 Maris Nartiss 2018-03-17 15:39:30 UTC

(In reply to Sérgio M. Basto from comment #17)
> I tested https://bugs.freedesktop.org/attachment.cgi?id=137730 on a
> kernel-4.15
> but doesn't fixed my boot problem .

Then, please, open a different bug and provide full kernel (dmesg) output. Or even better – as you can reproduce the bug, do a git bisect yourself. (It took me one week to bisect the issue fixed by proposed patch.)

Comment 19 Sérgio M. Basto 2018-03-18 00:29:58 UTC

(In reply to Maris Nartiss from comment #18)
> (In reply to Sérgio M. Basto from comment #17)
> > I tested https://bugs.freedesktop.org/attachment.cgi?id=137730 on a
> > kernel-4.15
> > but doesn't fixed my boot problem .
> 
> Then, please, open a different bug and provide full kernel (dmesg) output.
> Or even better – as you can reproduce the bug, do a git bisect yourself. (It
> took me one week to bisect the issue fixed by proposed patch.)

My case is bug #105319 , please let me know if you need more information

Comment 20 Nick Lee 2018-03-18 19:19:27 UTC

(In reply to Pierre Moreau from comment #16)


Pierre, Maris thank you!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.