Bug 111234

Summary: amdgpu bug: kernel NULL pointer dereference during video playback
Product: DRI Reporter: Michael J Evans <mjevans1983>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium CC: jamespharvey20, nicholas.kazlauskas
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Michael J Evans 2019-07-27 18:12:56 UTC
Over the past month I've experienced frequent (more than once per week) video output freezes while watching hardware accelerated video output from Twitch (they transcoded, played in mpv with profile=gpu-hq).

For three of these I collected dmesg outputs by logging in from a different PC.  I also observed that the numlock key was unresponsive, but I suspect that might just be because the video driver issue caused KDE Plasma to either crash or lock up waiting on the kernel.

It's slightly interesting that over two kernel versions and the three traces I did collect that the null pointer is to the same address.

[    4.615743] [drm] initializing kernel modesetting (TONGA 0x1002:0x6939 0x148C:0x2349 0x00).
[    4.615754] [drm] register mmio base: 0xF7E00000
[    4.615755] [drm] register mmio size: 262144
[    4.616024] ATOM BIOS: 113-C7660101_106
[    5.072492] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:01:00.0 on minor 0



[ 4953.465909] BUG: kernel NULL pointer dereference, address: 00000000000002b4
[ 4953.465918] #PF: supervisor read access in kernel mode
[ 4953.465922] #PF: error_code(0x0000) - not-present page
[ 4953.465925] PGD 0 P4D 0 
[ 4953.465932] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4953.465939] CPU: 5 PID: 14373 Comm: kworker/u16:4 Tainted: G        W  OE     5.2.1-arch1-1-ARCH #1
[ 4953.465967] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 4953.466127] RIP: 0010:dc_stream_log+0x9/0xb0 [amdgpu]
[ 4953.466133] Code: 4c 89 e2 48 8b 07 48 8b 40 50 e8 22 ae 7e f6 b8 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 0f 1f 40 00 0f 1f 44 00 00 53 48 89 f3 <8b> 83 b4 02 00 00 48 89 da 8b 8b 10 01 00 00 44 8b 8b 18 01 0
0 00
[ 4953.466137] RSP: 0018:ffffaed60f84faf0 EFLAGS: 00210202
[ 4953.466142] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 4953.466145] RDX: ffffffffc149a710 RSI: 0000000000000000 RDI: ffff8aa32a5d9000
[ 4953.466148] RBP: ffff8a9d80030000 R08: ffff8a9d80030000 R09: 0000000000000000
[ 4953.466152] R10: ffff8a9d80030000 R11: ffff8aa33eb692a4 R12: 0000000000000001
[ 4953.466155] R13: ffffaed60f84fd58 R14: ffff8aa32140cff0 R15: 0000000000000000
[ 4953.466159] FS:  0000000000000000(0000) GS:ffff8aa33eb40000(0000) knlGS:0000000000000000
[ 4953.466163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4953.466166] CR2: 00000000000002b4 CR3: 00000007b5212002 CR4: 00000000001606e0
[ 4953.466170] Call Trace:
[ 4953.466332]  dc_commit_state+0xa1/0x5a0 [amdgpu]
[ 4953.466446]  ? amdgpu_bo_unpin+0xce/0xe0 [amdgpu]
[ 4953.466491]  ? drm_calc_timestamping_constants+0xe0/0x140 [drm]
[ 4953.466675]  amdgpu_dm_atomic_commit_tail+0xc64/0x1a20 [amdgpu]
[ 4953.466689]  ? __switch_to_asm+0x40/0x70
[ 4953.466694]  ? __switch_to_asm+0x34/0x70
[ 4953.466699]  ? __switch_to_asm+0x40/0x70
[ 4953.466704]  ? __switch_to_asm+0x40/0x70
[ 4953.466709]  ? __switch_to_asm+0x40/0x70
[ 4953.466713]  ? __switch_to_asm+0x40/0x70
[ 4953.466718]  ? __switch_to_asm+0x34/0x70
[ 4953.466723]  ? __switch_to_asm+0x40/0x70
[ 4953.466728]  ? __switch_to_asm+0x40/0x70
[ 4953.466737]  ? __switch_to_xtra+0x1b6/0x610
[ 4953.466742]  ? __switch_to_asm+0x34/0x70
[ 4953.466747]  ? __switch_to_asm+0x40/0x70
[ 4953.466751]  ? __switch_to_asm+0x34/0x70
[ 4953.466780]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4953.466798]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4953.466806]  process_one_work+0x1d1/0x3e0
[ 4953.466812]  worker_thread+0x4a/0x3d0
[ 4953.466821]  kthread+0xfd/0x130
[ 4953.466826]  ? process_one_work+0x3e0/0x3e0
[ 4953.466832]  ? kthread_park+0x90/0x90
[ 4953.466838]  ret_from_fork+0x35/0x40
[ 4953.466845] Modules linked in: arc4 md4 cfg80211 8021q garp mrp bridge stp llc nct6775 hwmon_vid ext4 crc16 mbcache jbd2 intel_rapl amdgpu x86_pkg_temp_thermal intel_powerclamp coretemp input_leds joydev kvm_intel mousedev hid_steam kvm irqbypass amd_iommu_v2 gpu_sched ttm ofpart crct10dif_pclmul cmdlinepart crc32_pclmul drm_kms_helper intel_spi_platform snd_hda_codec_realtek ghash_clmulni_intel intel_spi snd_hda_codec_generic eeepc_wmi snd_hda_codec_hdmi ledtrig_audio drm spi_nor asus_wmi sparse_keymap iTCO_wdt hid_generic mei_hdcp mtd snd_hda_intel ppdev iTCO_vendor_support rfkill aesni_intel snd_usb_audio wmi_bmof mxm_wmi snd_hda_codec snd_virtuoso aes_x86_64 agpgart igb crypto_simd snd_oxygen_lib mei_me syscopyarea snd_mpu401_uart snd_usbmidi_lib sysfillrect snd_hda_core cryptd glue_helper sysimgblt media intel_cstate i2c_algo_bit usbhid snd_rawmidi snd_hwdep parport_pc intel_uncore pcspkr i2c_i801 mei lpc_ich dca fb_sys_fops ie31200_edac hid intel_rapl_perf snd_pcm parport evdev wmi
[ 4953.466916]  pcc_cpufreq mac_hid sch_fq tcp_htcp snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore cuse nfsd auth_rpcgss nfsv4 nfsv3 nfs_acl nfsv2 nfs lockd grace sunrpc nls_utf8 cifs ccm dns_resolver fscache loop fuse sg crypto_user vfs_monitor(OE) ip_tables x_tables dm_mod btrfs libcrc32c crc32c_generic xor raid6_pq raid1 md_mod sd_mod ahci libahci libata xhci_pci scsi_mod crc32c_intel ehci_pci firewire_ohci xhci_hcd ehci_hcd firewire_core crc_itu_t
[ 4953.466966] CR2: 00000000000002b4
[ 4953.466972] ---[ end trace 24c4d6a2e775c61e ]---
[ 4953.467120] RIP: 0010:dc_stream_log+0x9/0xb0 [amdgpu]
[ 4953.467126] Code: 4c 89 e2 48 8b 07 48 8b 40 50 e8 22 ae 7e f6 b8 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 0f 1f 40 00 0f 1f 44 00 00 53 48 89 f3 <8b> 83 b4 02 00 00 48 89 da 8b 8b 10 01 00 00 44 8b 8b 18 01 00 00
[ 4953.467130] RSP: 0018:ffffaed60f84faf0 EFLAGS: 00210202
[ 4953.467134] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[ 4953.467137] RDX: ffffffffc149a710 RSI: 0000000000000000 RDI: ffff8aa32a5d9000
[ 4953.467141] RBP: ffff8a9d80030000 R08: ffff8a9d80030000 R09: 0000000000000000
[ 4953.467144] R10: ffff8a9d80030000 R11: ffff8aa33eb692a4 R12: 0000000000000001
[ 4953.467147] R13: ffffaed60f84fd58 R14: ffff8aa32140cff0 R15: 0000000000000000
[ 4953.467151] FS:  0000000000000000(0000) GS:ffff8aa33eb40000(0000) knlGS:0000000000000000
[ 4953.467154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4953.467158] CR2: 00000000000002b4 CR3: 00000007b5212002 CR4: 00000000001606e0
[ 5148.503504] audit: type=1006 audit(1563216150.929:94): pid=20550 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=5 res=1


[464226.751918] BUG: kernel NULL pointer dereference, address: 00000000000002b4
[464226.751929] #PF: supervisor read access in kernel mode
[464226.751933] #PF: error_code(0x0000) - not-present page
[464226.751936] PGD 0 P4D 0 
[464226.751944] Oops: 0000 [#1] PREEMPT SMP PTI
[464226.751951] CPU: 6 PID: 20885 Comm: kworker/u16:7 Tainted: G           OE     5.2.1-arch1-1-ARCH #1
[464226.751981] Workqueue: events_unbound commit_work [drm_kms_helper]
[464226.752158] RIP: 0010:dc_stream_log+0x9/0xb0 [amdgpu]
[464226.752165] Code: 4c 89 e2 48 8b 07 48 8b 40 50 e8 22 fe 73 d4 b8 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 0f 1f 40 00 0f 1f 44 00 00 53 48 89 f3 <8b> 83 b4 02 00 00 48 89 da 8b 8b 10 01 00 00 44 8b 8b 18 01 
00 00
[464226.752170] RSP: 0018:ffffad91cd9f3af0 EFLAGS: 00010202
[464226.752174] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[464226.752178] RDX: ffffffffc1345710 RSI: 0000000000000000 RDI: ffff928467e8c000
[464226.752182] RBP: ffff927f19680000 R08: ffff927f19680000 R09: 0000000000000000
[464226.752185] R10: ffff927f19680000 R11: 0000000000000000 R12: 0000000000000001
[464226.752188] R13: ffffad91cd9f3d58 R14: ffff92846149cff0 R15: 0000000000000000
[464226.752193] FS:  0000000000000000(0000) GS:ffff92847eb80000(0000) knlGS:0000000000000000
[464226.752197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[464226.752201] CR2: 00000000000002b4 CR3: 00000007b8fdc002 CR4: 00000000001606e0
[464226.752204] Call Trace:
[464226.752384]  dc_commit_state+0xa1/0x5a0 [amdgpu]
[464226.752507]  ? amdgpu_bo_unpin+0xce/0xe0 [amdgpu]
[464226.752551]  ? drm_calc_timestamping_constants+0xe0/0x140 [drm]
[464226.752733]  amdgpu_dm_atomic_commit_tail+0xc64/0x1a20 [amdgpu]
[464226.752745]  ? __switch_to_asm+0x40/0x70
[464226.752751]  ? __switch_to_asm+0x34/0x70
[464226.752756]  ? __switch_to_asm+0x40/0x70
[464226.752762]  ? __switch_to_asm+0x40/0x70
[464226.752767]  ? __switch_to_asm+0x40/0x70
[464226.752772]  ? __switch_to_asm+0x40/0x70
[464226.752778]  ? __switch_to_asm+0x34/0x70
[464226.752783]  ? __switch_to_asm+0x40/0x70
[464226.752788]  ? __switch_to_asm+0x40/0x70
[464226.752797]  ? __switch_to_xtra+0x1b6/0x610
[464226.752802]  ? __switch_to_asm+0x34/0x70
[464226.752807]  ? __switch_to_asm+0x40/0x70
[464226.752812]  ? __switch_to_asm+0x34/0x70
[464226.752843]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[464226.752862]  commit_tail+0x3c/0x70 [drm_kms_helper]
[464226.752871]  process_one_work+0x1d1/0x3e0
[464226.752878]  worker_thread+0x4a/0x3d0
[464226.752887]  kthread+0xfd/0x130
[464226.752893]  ? process_one_work+0x3e0/0x3e0
[464226.752899]  ? kthread_park+0x90/0x90
[464226.752905]  ret_from_fork+0x35/0x40
[464226.752913] Modules linked in: sctp arc4 md4 cfg80211 8021q garp mrp bridge stp llc nct6775 hwmon_vid ext4 crc16 mbcache jbd2 intel_rapl amdgpu hid_steam joydev mousedev input_leds x86_pkg_temp_thermal inte
l_powerclamp coretemp kvm_intel kvm dm_mod irqbypass amd_iommu_v2 gpu_sched ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ofpart ghash_clmulni_intel cmdlinepart drm intel_spi_platform intel_spi spi_nor eeepc
_wmi hid_generic mei_hdcp asus_wmi snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic sparse_keymap iTCO_vendor_support ledtrig_audio mtd snd_hda_codec_hdmi wmi_bmof ppdev rfkill mxm_wmi snd_usb_audio snd_hda_intel aesni_intel snd_virtuoso agpgart snd_hda_codec snd_oxygen_lib igb aes_x86_64 crypto_simd cryptd snd_hda_core snd_usbmidi_lib glue_helper syscopyarea intel_cstate snd_mpu401_uart sysfillrect i2c_algo_bit intel_uncore media snd_hwdep snd_rawmidi mei_me sysimgblt usbhid intel_rapl_perf pcspkr i2c_i801 lpc_ich snd_pcm mei fb_sys_fops dca ie31200_edac hid parport_pc evdev
[464226.752991]  parport mac_hid pcc_cpufreq wmi sch_fq tcp_htcp snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore cuse nfsd auth_rpcgss nfsv4 nfsv3 nfs_acl nfsv2 nfs lockd grace sunrpc nls_utf8 cifs ccm dns_resolver fscache loop fuse sg crypto_user vfs_monitor(OE) ip_tables x_tables btrfs libcrc32c crc32c_generic xor raid6_pq raid1 md_mod sd_mod ahci libahci libata xhci_pci firewire_ohci crc32c_intel scsi_mod xhci_hcd ehci_pci firewire_core ehci_hcd crc_itu_t
[464226.753047] CR2: 00000000000002b4
[464226.753053] ---[ end trace a22ae414b68cae32 ]---
[464226.753218] RIP: 0010:dc_stream_log+0x9/0xb0 [amdgpu]
[464226.753225] Code: 4c 89 e2 48 8b 07 48 8b 40 50 e8 22 fe 73 d4 b8 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 0f 1f 40 00 0f 1f 44 00 00 53 48 89 f3 <8b> 83 b4 02 00 00 48 89 da 8b 8b 10 01 00 00 44 8b 8b 18 01 00 00
[464226.753229] RSP: 0018:ffffad91cd9f3af0 EFLAGS: 00010202
[464226.753233] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
[464226.753237] RDX: ffffffffc1345710 RSI: 0000000000000000 RDI: ffff928467e8c000
[464226.753241] RBP: ffff927f19680000 R08: ffff927f19680000 R09: 0000000000000000
[464226.753244] R10: ffff927f19680000 R11: 0000000000000000 R12: 0000000000000001
[464226.753248] R13: ffffad91cd9f3d58 R14: ffff92846149cff0 R15: 0000000000000000
[464226.753252] FS:  0000000000000000(0000) GS:ffff92847eb80000(0000) knlGS:0000000000000000
[464226.753256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[464226.753260] CR2: 00000000000002b4 CR3: 00000007b8fdc002 CR4: 00000000001606e0


[89446.140229] BUG: kernel NULL pointer dereference, address: 00000000000002b4
[89446.140238] #PF: supervisor read access in kernel mode
[89446.140242] #PF: error_code(0x0000) - not-present page
[89446.140245] PGD 0 P4D 0 
[89446.140252] Oops: 0000 [#1] PREEMPT SMP PTI
[89446.140258] CPU: 5 PID: 18007 Comm: kworker/u16:0 Tainted: G           OE     5.2.2-arch1-1-ARCH #1
[89446.140284] Workqueue: events_unbound commit_work [drm_kms_helper]
[89446.140489] RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
[89446.140498] Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 1d b5 e5 ec b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 0
0 00
[89446.140503] RSP: 0018:ffffb8dc4c743af0 EFLAGS: 00010202
[89446.140510] RAX: 0000000000000000 RBX: ffff9dea30ea6000 RCX: 0000000000000002
[89446.140515] RDX: ffffffffc162a710 RSI: 0000000000000000 RDI: ffff9dea30ea6000
[89446.140520] RBP: ffff9de9007a0000 R08: ffff9de9007a0000 R09: 0000000000000000
[89446.140525] R10: ffff9de9007a0000 R11: 0000000000000018 R12: 0000000000000001
[89446.140531] R13: ffffb8dc4c743d58 R14: ffff9dea207ecff0 R15: 0000000000000000
[89446.140537] FS:  0000000000000000(0000) GS:ffff9dea3eb40000(0000) knlGS:0000000000000000
[89446.140542] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[89446.140547] CR2: 00000000000002b4 CR3: 000000075c0c2004 CR4: 00000000001606e0
[89446.140551] Call Trace:
[89446.140772]  dc_commit_state+0x9a/0x5a0 [amdgpu]
[89446.140968]  ? dm_plane_helper_cleanup_fb+0xa3/0x120 [amdgpu]
[89446.141175]  amdgpu_dm_atomic_commit_tail+0xc5d/0x1a10 [amdgpu]
[89446.141191]  ? __switch_to_asm+0x40/0x70
[89446.141199]  ? __switch_to_asm+0x34/0x70
[89446.141206]  ? __switch_to_asm+0x40/0x70
[89446.141212]  ? __switch_to_asm+0x40/0x70
[89446.141219]  ? __switch_to_asm+0x40/0x70
[89446.141225]  ? __switch_to_asm+0x40/0x70
[89446.141231]  ? __switch_to_asm+0x34/0x70
[89446.141237]  ? __switch_to_asm+0x40/0x70
[89446.141244]  ? __switch_to_asm+0x40/0x70
[89446.141250]  ? __switch_to_asm+0x34/0x70
[89446.141260]  ? __switch_to_xtra+0x1b4/0x610
[89446.141266]  ? __switch_to_asm+0x34/0x70
[89446.141273]  ? __switch_to_asm+0x40/0x70
[89446.141279]  ? __switch_to_asm+0x34/0x70
[89446.141320]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[89446.141340]  commit_tail+0x3c/0x70 [drm_kms_helper]
[89446.141347]  process_one_work+0x1d1/0x3e0
[89446.141353]  worker_thread+0x4a/0x3d0
[89446.141361]  kthread+0xfb/0x130
[89446.141365]  ? process_one_work+0x3e0/0x3e0
[89446.141371]  ? kthread_park+0x90/0x90
[89446.141377]  ret_from_fork+0x35/0x40
[89446.141384] Modules linked in: arc4 md4 cfg80211 8021q garp mrp bridge stp llc nct6775 hwmon_vid ext4 crc16 mbcache jbd2 joydev mousedev input_leds amdgpu intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ofpart cmdlinepart intel_spi_platform intel_spi eeepc_wmi spi_nor crct10dif_pclmul asus_wmi sparse_keymap iTCO_wdt crc32_pclmul mtd mei_hdcp ppdev iTCO_vendor_support rfkill wmi_bmof mxm_wmi amd_iommu_v2 ghash_clmulni_intel gpu_sched ttm drm_kms_helper snd_hda_codec_realtek snd_usb_audio snd_hda_codec_generic snd_usbmidi_lib drm aesni_intel ledtrig_audio snd_virtuoso snd_hda_codec_hdmi aes_x86_64 snd_oxygen_lib crypto_simd snd_hda_intel cryptd media glue_helper snd_hda_codec snd_mpu401_uart intel_cstate agpgart syscopyarea intel_uncore igb snd_hda_core hid_steam intel_rapl_perf snd_rawmidi snd_hwdep sysfillrect i2c_algo_bit sysimgblt mei_me dca lpc_ich pcspkr snd_pcm fb_sys_fops i2c_i801 mei ie31200_edac parport_pc parport evdev mac_hid wmi pcc_cpufreq
[89446.141456]  sch_fq tcp_htcp snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore cuse nfsd auth_rpcgss nfsv4 nfsv3 nfs_acl nfsv2 nfs lockd grace sunrpc nls_utf8 cifs ccm dns_resolver fscache loop fuse sg crypto_user vfs_monitor(OE) ip_tables x_tables hid_generic usbhid hid dm_mod btrfs libcrc32c crc32c_generic xor raid6_pq raid1 md_mod sd_mod ahci libahci libata xhci_pci firewire_ohci crc32c_intel scsi_mod xhci_hcd ehci_pci firewire_core ehci_hcd crc_itu_t
[89446.141524] CR2: 00000000000002b4
[89446.141531] ---[ end trace a31be47676a3f1f7 ]---
[89446.141679] RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
[89446.141686] Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 1d b5 e5 ec b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00 00
[89446.141690] RSP: 0018:ffffb8dc4c743af0 EFLAGS: 00010202
[89446.141694] RAX: 0000000000000000 RBX: ffff9dea30ea6000 RCX: 0000000000000002
[89446.141697] RDX: ffffffffc162a710 RSI: 0000000000000000 RDI: ffff9dea30ea6000
[89446.141701] RBP: ffff9de9007a0000 R08: ffff9de9007a0000 R09: 0000000000000000
[89446.141704] R10: ffff9de9007a0000 R11: 0000000000000018 R12: 0000000000000001
[89446.141707] R13: ffffb8dc4c743d58 R14: ffff9dea207ecff0 R15: 0000000000000000
[89446.141711] FS:  0000000000000000(0000) GS:ffff9dea3eb40000(0000) knlGS:0000000000000000
[89446.141715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[89446.141718] CR2: 00000000000002b4 CR3: 000000075c0c2004 CR4: 00000000001606e0


I've been updating to the latest packages after most of these crashes, currently:

GL_VERSION: 4.5 (Compatibility Profile) Mesa 19.1.3
GL_RENDERER: AMD Radeon R9 200 Series (TONGA, DRM 3.32.0, 5.2.3-arch1-1-ARCH, LLVM 8.0.1)

-

Please let me know if there's additional information that could help and/or if there is a better location to file this bug.
Comment 1 Michel Dänzer 2019-07-29 08:53:59 UTC
Please attach the full output of dmesg.
Comment 2 Nicholas Kazlauskas 2019-07-29 12:27:16 UTC
This is almost certainly a duplicate of:

https://bugzilla.kernel.org/show_bug.cgi?id=204181

I think something in KDE plasma changed recently for how multi monitor support is handled and how commits are sequenced. It's not that there's been a regression per-se in amdgpu, but rather userspace changes have exposed a bug that has been there for a while.

A fix is in development.
Comment 3 Michael J Evans 2019-07-30 02:36:04 UTC
After looking at the linked bug and your description of my environment (dual screen and yes KDE / Plasma as a compositor potentially exposing whatever corner case this is in the kernel driver):

I agree this is likely a duplicate of that issue.

The above dmesg contained the only relevant data post-boot, though if you'd like sections of specific drivers initializing other other memory data I can provide a more hand-picked selection of data for the next crash.

After this evening's crash I've updated to the latest arch-linux kernel and packages.

The kernel is now booting with a 'cmdline' including: amdgpu.gpu_recovery=1 log_buf_len=64M drm.debug=84 debug

I've kept my desktop environment the same in the hope that I might collect a useful crash context with the added data.
Comment 4 jamespharvey20 2019-09-20 00:34:30 UTC
Just ran into this for my first time.  I've had pretty consistent problems with the potentially related bug that Nicholas Kazlauskas mentioned at https://bugzilla.kernel.org/show_bug.cgi?id=204181

Could certainly be the same bug, but definitely has a different backtrace.

Also running KDE, also running multi (5) monitor.

Was on linux 5.2.10, mesa 19.1.6, plasma 5.16.4.

==========

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c1) (prog-if 00 [VGA controller])
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] RX Vega64
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 76
        NUMA node: 0
        Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at d0000000 (64-bit, prefetchable) [size=2M]
        Region 4: I/O ports at 8000 [size=256]
        Region 5: Memory at dfb00000 (32-bit, non-prefetchable) [size=512K]
        Expansion ROM at 000c0000 [disabled] [size=128K]

==========

[drm] amdgpu kernel modesetting enabled.
fb0: switching to amdgpudrmfb from EFI VGA
[drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1002:0x0B36 0xC1).
[drm] register mmio base: 0xDFB00000
[drm] register mmio size: 524288
[drm] add ip block number 0 <soc15_common>
[drm] add ip block number 1 <gmc_v9_0>
[drm] add ip block number 2 <vega10_ih>
[drm] add ip block number 3 <psp>
[drm] add ip block number 4 <gfx_v9_0>
[drm] add ip block number 5 <sdma_v4_0>
[drm] add ip block number 6 <powerplay>
[drm] add ip block number 7 <dm>
[drm] add ip block number 8 <uvd_v7_0>
[drm] add ip block number 9 <vce_v4_0>
[drm] UVD(0) is enabled in VM mode
[drm] UVD(0) ENC is enabled in VM mode
[drm] VCE enabled in VM mode
[drm] RAS INFO: ras initialized successfully, hardware ability[0] ras_mask[0]
[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[drm] Detected VRAM RAM=8176M, BAR=256M
[drm] RAM width 2048bits HBM
[drm] amdgpu: 8176M of VRAM memory ready
[drm] amdgpu: 8176M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[drm] use_doorbell being set to: [true]
[drm] use_doorbell being set to: [true]
[drm] Found UVD firmware Version: 65.29 Family ID: 17
[drm] PSP loading UVD firmware
[drm] Found VCE firmware Version: 57.4 Binary ID: 4
[drm] PSP loading VCE firmware
[drm] reserve 0x400000 from 0xf401000000 for PSP TMR SIZE
[drm] Display Core initialized with v3.2.27!
[drm] DM_MST: Differing MST start on aconnector: (____ptrval____) [id: 59]
[drm] DM_MST: Differing MST start on aconnector: (____ptrval____) [id: 62]
[drm] DM_MST: Differing MST start on aconnector: (____ptrval____) [id: 65]
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] UVD and UVD ENC initialized successfully.
[drm] VCE initialized successfully.
[drm] Cannot find any crtc or sizes
[drm] ECC is not present.
[drm] SRAM ECC is not present.
[drm] Initialized amdgpu 3.32.0 20150101 for 0000:05:00.0 on minor 0
[drm] amdgpu_dm_irq_schedule_work FAILED src 8
[drm] DM_MST: added connector: (____ptrval____) [id: 70] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 74] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 78] [master: (____ptrval____)]
[drm] fb mappable at 0xC1400000
[drm] vram apper at 0xC0000000
[drm] size 14745600
[drm] fb depth is 24
[drm]    pitch is 10240
fbcon: amdgpudrmfb (fb0) is primary device
[drm] DM_MST: added connector: (____ptrval____) [id: 85] [master: (____ptrval____)]
amdgpu 0000:05:00.0: fb0: amdgpudrmfb frame buffer device
[drm] DM_MST: added connector: (____ptrval____) [id: 89] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 93] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 97] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 107] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 113] [master: (____ptrval____)]
[drm] DM_MST: added connector: (____ptrval____) [id: 117] [master: (____ptrval____)]

..........

BUG: kernel NULL pointer dereference, address: 00000000000002b4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 2678499 Comm: kworker/u65:9 Tainted: G        W  OE     5.2.11-arch1-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS P1.90 04/12/2018
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Code: 04 00 00 49 8b bc 02 b0 02 00 00 48 8b 07 48 8b 40 50 e8 2d 3b 14 c9 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 66 66 66 66 90 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00 00
RSP: 0018:ffffa0c6a365faf8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000005
RDX: ffffffffc0547710 RSI: 0000000000000000 RDI: ffff9a4bd62e5000
RBP: ffff9a4483080000 R08: ffff9a4483080000 R09: 0000000000000000
R10: ffff9a4483080000 R11: 0000000000000018 R12: ffff9a4bd62e5000
R13: ffffa0c6a365fd58 R14: ffff9a4bd67dcff0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff9a4bdf840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 0000000ffbe76005 CR4: 00000000000626e0
Call Trace:
 dc_commit_state+0x99/0x580 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xc5d/0x19a0 [amdgpu]
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? _raw_spin_unlock_irq+0x1d/0x30
 ? finish_task_switch+0x84/0x2d0
 ? preempt_schedule_common+0x32/0x80
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x1d1/0x3e0
 worker_thread+0x4a/0x3d0
 kthread+0xfb/0x130
 ? process_one_work+0x3e0/0x3e0
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
Modules linked in: netlink_diag uas usb_storage xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6_tables iptable_filter tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod msr ib_srpt nct6775 hwmon_vid target_core_mod ib_srp scsi_transport_srp rpcrdma sunrpc rdma_ucm ib_iser rdma_cm ib_umad iw_cm
ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat ipmi_ssif intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel bridge snd_hda_codec snd_hda_core stp irqbypass llc intel_cstate snd_hwdep snd_pcm iTCO_wdt iTCO_vendor_support snd_timer mousedev input_leds joydev pcspkr intel_uncore ipmi_si mlx4_core mei_me e1000e snd ioatdma
 ipmi_devintf i2c_i801 pcc_cpufreq intel_rapl_perf lpc_ich mei soundcore dca wmi ipmi_msghandler evdev mac_hid vmmon(OE) vmw_vmci vboxnetflt(OE) vboxnetadp(OE) vboxpci(OE) vboxdrv(OE) sg crypto_user ip_tables x_tables btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic hid_generic usbhid hid sr_mod cdrom sd_mod dm_crypt dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel isci aesni_intel libsas ahci aes_x86_64 crypto_simd scsi_transport_sas libahci cryptd glue_helper libata ehci_pci scsi_mod ehci_hcd amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart
CR2: 00000000000002b4
---[ end trace 2c71b3abb2d778a6 ]---
RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Code: 04 00 00 49 8b bc 02 b0 02 00 00 48 8b 07 48 8b 40 50 e8 2d 3b 14 c9 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 66 66 66 66 90 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00 00
RSP: 0018:ffffa0c6a365faf8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000005
RDX: ffffffffc0547710 RSI: 0000000000000000 RDI: ffff9a4bd62e5000
RBP: ffff9a4483080000 R08: ffff9a4483080000 R09: 0000000000000000
R10: ffff9a4483080000 R11: 0000000000000018 R12: ffff9a4bd62e5000
R13: ffffa0c6a365fd58 R14: ffff9a4bd67dcff0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff9a4bdf840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 0000000ffbe76005 CR4: 00000000000626e0
Comment 5 Martin Peres 2019-11-19 09:37:42 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/881.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.