Bug 104611 - [fiji, polaris10] BUG: unable to handle kernel NULL pointer dereference when waking up displays with amdgpu.dc=1
Summary: [fiji, polaris10] BUG: unable to handle kernel NULL pointer dereference when ...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: All All
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-12 22:38 UTC by Vedran Miletić
Modified: 2019-11-19 08:28 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (65.53 KB, text/plain)
2018-01-12 22:38 UTC, Vedran Miletić
no flags Details
dmesg with rx580 amdgpu.dc_log=1 (68.44 KB, text/plain)
2018-01-17 15:38 UTC, Vedran Miletić
no flags Details
dmesg with fiji amdgpu.dc_log=1 (62.21 KB, text/plain)
2018-07-12 14:14 UTC, Vedran Miletić
no flags Details
Nullptr deref after powercycling MST display (112.03 KB, text/plain)
2019-04-23 11:16 UTC, Kevin Hamacher
no flags Details

Description Vedran Miletić 2018-01-12 22:38:03 UTC
Created attachment 136703 [details]
dmesg

Using Fedora rawhide (28) with kernel 4.15.0-0.rc7.git4.1.fc28.x86_64 on

01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji HDMI/DP Audio [Radeon R9 Nano / FURY/FURY X] [1002:aae8]

If I let displays go to sleep and then wake them up, I get:

[ 1464.218367] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
[ 1464.218407] IP: create_stream_for_sink+0x1cc/0x3c0 [amdgpu]
[ 1464.218409] PGD 80000003c89ae067 P4D 80000003c89ae067 PUD 404d9d067 PMD 0 
[ 1464.218414] Oops: 0000 [#1] SMP PTI
[ 1464.218415] Modules linked in: fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp joydev kvm irqbypass crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hda_core intel_cstate eeepc_wmi asus_wmi iTCO_wdt mei_wdt iTCO_vendor_support sparse_keymap snd_seq snd_hwdep rfkill intel_uncore snd_seq_device wmi_bmof ppdev snd_pcm intel_rapl_perf snd_timer parport_pc parport shpchp snd i2c_i801 soundcore mei_me video mei lpc_ich wmi auth_rpcgss binfmt_misc sunrpc amdkfd amd_iommu_v2 amdgpu chash i2c_algo_bit drm_kms_helper ttm crc32c_intel drm r8169 mii hid_cherry
[ 1464.218465] CPU: 7 PID: 1474 Comm: Xorg Not tainted 4.15.0-0.rc7.git4.1.fc28.x86_64 #1
[ 1464.218467] Hardware name: Transtec AG    /B85M-E, BIOS 3505 11/28/2016
[ 1464.218490] RIP: 0010:create_stream_for_sink+0x1cc/0x3c0 [amdgpu]
[ 1464.218491] RSP: 0018:ffffadcac9c7b900 EFLAGS: 00010286
[ 1464.218493] RAX: 0000000000000870 RBX: ffff9063805b0000 RCX: 0000000000000000
[ 1464.218494] RDX: 0000000000000013 RSI: 0000000000000002 RDI: ffff906337f5b08c
[ 1464.218495] RBP: ffffadcac9c7ba40 R08: 0000000000000000 R09: 0000000000000f00
[ 1464.218497] R10: ffff906337f5b000 R11: 0000000000000870 R12: ffff906337f5b000
[ 1464.218498] R13: 0000000000000000 R14: ffff906345723e80 R15: ffff906337f5d000
[ 1464.218499] FS:  00007f3de9b77a80(0000) GS:ffff90638d400000(0000) knlGS:0000000000000000
[ 1464.218500] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1464.218502] CR2: 0000000000000208 CR3: 0000000405d86001 CR4: 00000000001606e0
[ 1464.218503] Call Trace:
[ 1464.218508]  ? __lock_is_held+0x65/0xb0
[ 1464.218534]  dm_update_crtcs_state+0xef/0x3a0 [amdgpu]
[ 1464.218555]  ? dm_update_crtcs_state+0xef/0x3a0 [amdgpu]
[ 1464.218577]  ? dc_resource_state_copy_construct+0xcc/0x110 [amdgpu]
[ 1464.218598]  amdgpu_dm_atomic_check+0x210/0x480 [amdgpu]
[ 1464.218611]  drm_atomic_check_only+0x387/0x560 [drm]
[ 1464.218618]  ? drm_connector_list_iter_end+0x5a/0x70 [drm]
[ 1464.218624]  drm_atomic_commit+0x18/0x50 [drm]
[ 1464.218630]  drm_atomic_connector_commit_dpms+0xef/0x100 [drm]
[ 1464.218637]  set_property_atomic+0xce/0x150 [drm]
[ 1464.218646]  drm_mode_obj_set_property_ioctl+0xf9/0x1b0 [drm]
[ 1464.218652]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 1464.218657]  drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm]
[ 1464.218663]  drm_ioctl_kernel+0x5d/0xb0 [drm]
[ 1464.218669]  drm_ioctl+0x31b/0x3d0 [drm]
[ 1464.218674]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 1464.218678]  ? trace_hardirqs_on_caller+0xf4/0x190
[ 1464.218679]  ? trace_hardirqs_on+0xd/0x10
[ 1464.218694]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 1464.218697]  do_vfs_ioctl+0xa6/0x6c0
[ 1464.218699]  ? __fget+0x124/0x210
[ 1464.218702]  SyS_ioctl+0x79/0x90
[ 1464.218706]  entry_SYSCALL_64_fastpath+0x25/0x9c
[ 1464.218707] RIP: 0033:0x7f3de6e72877
[ 1464.218708] RSP: 002b:00007fffabff2c98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1464.218710] RAX: ffffffffffffffda RBX: 0000000000000200 RCX: 00007f3de6e72877
[ 1464.218712] RDX: 00007fffabff2cd0 RSI: 00000000c01064ab RDI: 000000000000000c
[ 1464.218713] RBP: 0000000000000040 R08: 0000000001fd7390 R09: 0000000000000018
[ 1464.218714] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000001bf1810
[ 1464.218715] R13: 0000000000000000 R14: 0000000001bf1a00 R15: 0000000001bf0990
[ 1464.218719] Code: da 4c 89 e7 e8 56 fc ff ff 4c 89 f6 4c 89 ef 4c 89 e2 e8 28 f0 ff ff 4c 8b ab c0 04 00 00 49 8d bc 24 8c 00 00 00 ba 13 00 00 00 <41> 0f b7 85 08 02 00 00 49 8d b5 12 02 00 00 41 89 84 24 a0 00 
[ 1464.218780] RIP: create_stream_for_sink+0x1cc/0x3c0 [amdgpu] RSP: ffffadcac9c7b900
[ 1464.218781] CR2: 0000000000000208
[ 1464.218783] ---[ end trace 0e302a5838408694 ]---

Without DC this works fine.
Comment 1 Harry Wentland 2018-01-15 16:02:21 UTC
Does this reproduce consistently or intermittently?
Comment 2 Roman Li 2018-01-16 21:55:07 UTC
I cannot reproduce it on v4.15-rc7 (git#1545dec46db3)
Vedran, can you  provide more info on your setup:
- display and connector types
- window manager
- dmesg with amdgpu.dc_log=1
Comment 3 Vedran Miletić 2018-01-17 15:27:24 UTC
(In reply to Harry Wentland from comment #1)
> Does this reproduce consistently or intermittently?

No, but I just got it on RX 580: 

[ 3582.972912] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
[ 3582.972982] IP: create_stream_for_sink+0x1cc/0x3c0 [amdgpu]
[ 3582.972985] PGD 80000003b3e7b067 P4D 80000003b3e7b067 PUD 3d0c0a067 PMD 0 
[ 3582.972992] Oops: 0000 [#1] SMP PTI
[ 3582.972995] Modules linked in: fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_codec_hdmi irqbypass joydev snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core eeepc_wmi ghash_clmulni_intel intel_cstate snd_hwdep intel_uncore snd_seq snd_seq_device asus_wmi sparse_keymap snd_pcm mei_wdt intel_rapl_perf rfkill iTCO_wdt iTCO_vendor_support snd_timer snd wmi_bmof ppdev soundcore mei_me wmi lpc_ich mei i2c_i801 parport_pc parport video shpchp binfmt_misc auth_rpcgss sunrpc amdkfd amd_iommu_v2 amdgpu chash i2c_algo_bit drm_kms_helper ttm crc32c_intel drm r8169 mii hid_cherry
[ 3582.973075] CPU: 3 PID: 4419 Comm: Xorg Not tainted 4.15.0-0.rc7.git4.1.fc28.x86_64 #1
[ 3582.973077] Hardware name: Transtec AG    /B85M-E, BIOS 3507 07/21/2017
[ 3582.973130] RIP: 0010:create_stream_for_sink+0x1cc/0x3c0 [amdgpu]
[ 3582.973132] RSP: 0018:ffffaa8402503900 EFLAGS: 00010286
[ 3582.973136] RAX: 0000000000000870 RBX: ffff8d4f472aa000 RCX: 0000000000000000
[ 3582.973138] RDX: 0000000000000013 RSI: 0000000000000002 RDI: ffff8d4f0373f48c
[ 3582.973140] RBP: ffffaa8402503a40 R08: 0000000000000000 R09: 0000000000000f00
[ 3582.973142] R10: ffff8d4f0373f400 R11: 0000000000000870 R12: ffff8d4f0373f400
[ 3582.973143] R13: 0000000000000000 R14: ffff8d4f0c16fe80 R15: ffff8d4f03739800
[ 3582.973146] FS:  00007efef37f6a80(0000) GS:ffff8d4f4cc00000(0000) knlGS:0000000000000000
[ 3582.973148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3582.973150] CR2: 0000000000000208 CR3: 00000003cc198001 CR4: 00000000001606e0
[ 3582.973152] Call Trace:
[ 3582.973159]  ? __lock_is_held+0x65/0xb0
[ 3582.973215]  dm_update_crtcs_state+0xef/0x3a0 [amdgpu]
[ 3582.973262]  ? dm_update_crtcs_state+0xef/0x3a0 [amdgpu]
[ 3582.973310]  ? dc_resource_state_copy_construct+0xcc/0x110 [amdgpu]
[ 3582.973356]  amdgpu_dm_atomic_check+0x210/0x480 [amdgpu]
[ 3582.973377]  drm_atomic_check_only+0x387/0x560 [drm]
[ 3582.973389]  ? drm_connector_list_iter_end+0x5a/0x70 [drm]
[ 3582.973402]  drm_atomic_commit+0x18/0x50 [drm]
[ 3582.973413]  drm_atomic_connector_commit_dpms+0xef/0x100 [drm]
[ 3582.973425]  set_property_atomic+0xce/0x150 [drm]
[ 3582.973440]  drm_mode_obj_set_property_ioctl+0xf9/0x1b0 [drm]
[ 3582.973451]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 3582.973461]  drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm]
[ 3582.973472]  drm_ioctl_kernel+0x5d/0xb0 [drm]
[ 3582.973483]  drm_ioctl+0x31b/0x3d0 [drm]
[ 3582.973492]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 3582.973499]  ? trace_hardirqs_on_caller+0xf4/0x190
[ 3582.973502]  ? trace_hardirqs_on+0xd/0x10
[ 3582.973533]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 3582.973538]  do_vfs_ioctl+0xa6/0x6c0
[ 3582.973544]  ? __fget+0x124/0x210
[ 3582.973548]  SyS_ioctl+0x79/0x90
[ 3582.973554]  entry_SYSCALL_64_fastpath+0x25/0x9c
[ 3582.973557] RIP: 0033:0x7efef0af1877
[ 3582.973559] RSP: 002b:00007ffd548f2768 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 3582.973562] RAX: ffffffffffffffda RBX: 0000000001763a10 RCX: 00007efef0af1877
[ 3582.973564] RDX: 00007ffd548f27a0 RSI: 00000000c01064ab RDI: 000000000000000c
[ 3582.973566] RBP: 0000000001a07140 R08: 00000000021a4bd0 R09: 0000000000000020
[ 3582.973568] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000847360
[ 3582.973570] R13: 0000000000000001 R14: 0000000000000004 R15: 00007ffd548f31c0
[ 3582.973576] Code: da 4c 89 e7 e8 56 fc ff ff 4c 89 f6 4c 89 ef 4c 89 e2 e8 28 f0 ff ff 4c 8b ab c0 04 00 00 49 8d bc 24 8c 00 00 00 ba 13 00 00 00 <41> 0f b7 85 08 02 00 00 49 8d b5 12 02 00 00 41 89 84 24 a0 00 
[ 3582.973687] RIP: create_stream_for_sink+0x1cc/0x3c0 [amdgpu] RSP: ffffaa8402503900
[ 3582.973689] CR2: 0000000000000208
[ 3582.973711] ---[ end trace db6ecfa1f0babe6e ]---

(In reply to Roman Li from comment #2)
> I cannot reproduce it on v4.15-rc7 (git#1545dec46db3)
> Vedran, can you  provide more info on your setup:
> - display and connector types
> - window manager
> - dmesg with amdgpu.dc_log=1

2xDell P2715Q hooked over DP.
GNOME 3 on Fedora 27.
I will provide dmesg.
Comment 4 Vedran Miletić 2018-01-17 15:38:02 UTC
Created attachment 136808 [details]
dmesg with rx580 amdgpu.dc_log=1
Comment 5 Roman Li 2018-01-18 00:07:55 UTC
So what was the repro rate for the issue? 
Can you try to reproduce on the latest rawhide 4.15.0-0.rc8.git0.1.fc28.x86_64 and/or collect dmesg with amdgpu.dc_log=1 for the bad case? Thank you.
Comment 6 Vedran Miletić 2018-01-18 08:38:59 UTC
(In reply to Roman Li from comment #5)
> So what was the repro rate for the issue? 
> Can you try to reproduce on the latest rawhide
> 4.15.0-0.rc8.git0.1.fc28.x86_64 and/or collect dmesg with amdgpu.dc_log=1
> for the bad case? Thank you.

I can do that over the weekend.
Comment 7 Harry Wentland 2018-01-29 16:20:46 UTC
Did you have a chance to capture a repro dmesg with dc_log=1?
Comment 8 Vedran Miletić 2018-07-12 14:11:21 UTC
This still occurs on 4.17, despite bug 106194 suggesting otherwise:

[ 7670.095885] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
[ 7670.095899] PGD 80000003fcee8067 P4D 80000003fcee8067 PUD 3fcee6067 PMD 0
[ 7670.095913] Oops: 0000 [#1] SMP PTI
[ 7670.095917] Modules linked in: fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm eeepc_wmi irqbypass asus_wmi joydev crct10dif_pclmul crc32_pclmul sparse_keymap ghash_clmulni_intel intel_cstate rfkill iTCO_wdt intel_uncore wmi_bmof mei_wdt iTCO_vendor_support mxm_wmi mei_me ppdev mei intel_rapl_perf wmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi video i2c_i801 snd_hda_intel acpi_pad snd_hda_codec snd_hda_core snd_hwdep parport_pc parport shpchp snd_seq snd_seq_device snd_pcm snd_timer snd soundcore binfmt_misc auth_rpcgss sunrpc amdkfd amd_iommu_v2 amdgpu chash i2c_algo_bit gpu_sched drm_kms_helper ttm r8169 drm crc32c_intel mii hid_cherry                                                            
[ 7670.096014] CPU: 7 PID: 934 Comm: Xorg Not tainted 4.17.4-200.fc28.x86_64 #1
[ 7670.096018] Hardware name: Transtec AG   /B150M-C, BIOS 3601 12/12/2017
[ 7670.096169] RIP: 0010:create_stream_for_sink+0x2cd/0x650 [amdgpu]
[ 7670.096174] RSP: 0018:ffffabfbc20e79d0 EFLAGS: 00010246
[ 7670.096179] RAX: 0000000000000000 RBX: ffff9d9b7ded8000 RCX: 0000000000000000
[ 7670.096184] RDX: 0000000000000013 RSI: ffffffffc0757400 RDI: ffff9d9b7ded80ac
[ 7670.096188] RBP: ffffabfbc20e7b10 R08: 0000000000000f00 R09: 0000000000000870
[ 7670.096192] R10: ffff9d9b7ded8000 R11: 0000000000000f00 R12: ffff9d9ba32b1000
[ 7670.096197] R13: 0000000000000000 R14: ffff9d98c0a0c200 R15: ffff9d99495d8000
[ 7670.096202] FS:  00007fa0d0017ac0(0000) GS:ffff9d9bb5dc0000(0000) knlGS:0000000000000000
[ 7670.096207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7670.096211] CR2: 0000000000000208 CR3: 00000003fcf16003 CR4: 00000000003606e0
[ 7670.096216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7670.096220] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7670.096223] Call Trace:
[ 7670.096370]  dm_update_crtcs_state+0x26c/0x4d0 [amdgpu]
[ 7670.096510]  amdgpu_dm_atomic_check+0x1b1/0x3b0 [amdgpu]
[ 7670.096552]  drm_atomic_check_only+0x360/0x4f0 [drm]
[ 7670.096587]  drm_atomic_commit+0x13/0x50 [drm]
[ 7670.096619]  drm_atomic_connector_commit_dpms+0xdb/0x100 [drm]
[ 7670.096652]  drm_mode_obj_set_property_ioctl+0x178/0x280 [drm]
[ 7670.096686]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 7670.096715]  drm_mode_connector_property_set_ioctl+0x39/0x60 [drm]
[ 7670.096744]  drm_ioctl_kernel+0x5b/0xb0 [drm]
[ 7670.096772]  drm_ioctl+0x1b3/0x370 [drm]
[ 7670.096801]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 7670.096898]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 7670.096910]  do_vfs_ioctl+0xa4/0x610
[ 7670.096918]  ksys_ioctl+0x60/0x90
[ 7670.096925]  __x64_sys_ioctl+0x16/0x20
[ 7670.096934]  do_syscall_64+0x5b/0x160
[ 7670.096944]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7670.096951] RIP: 0033:0x7fa0cd2dbe17
[ 7670.096955] RSP: 002b:00007ffc439d2b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7670.096962] RAX: ffffffffffffffda RBX: 00000000022374b0 RCX: 00007fa0cd2dbe17
[ 7670.096966] RDX: 00007ffc439d2b70 RSI: 00000000c01064ab RDI: 000000000000000c
[ 7670.096970] RBP: 00007ffc439d2b70 R08: 0000000000000000 R09: 0000000002236100
[ 7670.096974] R10: 0000000000000018 R11: 0000000000000246 R12: 00000000c01064ab
[ 7670.096978] R13: 000000000000000c R14: 0000000002237940 R15: 000000000223b001
[ 7670.096983] Code: 49 c7 87 28 60 00 00 00 00 00 00 4c 89 bb 98 01 00 00 e8 c7 f9 ff ff 4d 8b ac 24 50 04 00 00 ba 13 00 00 00 48 8d bb ac 00 00 00 <41> 0f b7 85 08 02 00 00 49 8d b5 12 02 00 00 89 83 c0 00 00 00                                                         
[ 7670.097204] RIP: create_stream_for_sink+0x2cd/0x650 [amdgpu] RSP: ffffabfbc20e79d0
[ 7670.097208] CR2: 0000000000000208
[ 7670.097214] ---[ end trace bfc39023ec02bc67 ]---

(In reply to Harry Wentland from comment #7)
> Did you have a chance to capture a repro dmesg with dc_log=1?

Doesn't seem to change anything in the output, but I'll attach it.
Comment 9 Vedran Miletić 2018-07-12 14:14:14 UTC
Created attachment 140595 [details]
dmesg with fiji amdgpu.dc_log=1
Comment 10 Kevin Hamacher 2019-04-23 11:16:28 UTC
Created attachment 144079 [details]
Nullptr deref after powercycling MST display
Comment 11 Kevin Hamacher 2019-04-23 11:17:17 UTC
I'm also affected by this.
Setup:
 GPUs: RX 580 (used), R9 Fury
 Screens: 3xU2415, connected via DP. 2 of them connected via MST, 1 HDMI to disabled screen, 1 HDMI to Oculus Rift VR
 Linux x7 5.0.7-arch1-1-ARCH #1 SMP PREEMPT Mon Apr 8 10:37:08 UTC 2019 x86_64 GNU/Linux
 Window manager: i3 4.16.1

Crashes when disabling + enabling the secondary screen on MST.
Under some conditions the screen is not detected properly (instead of crashing the system) and powercycling the primary MST screen will crash the system.
(Primary MST screen: The one plugged into the GPU, secondary: plugged into the primary screen)


dmesg: https://bugs.freedesktop.org/attachment.cgi?id=144079&action=edit (with dc_log=1 but it seems like it was removed?)
[   27.292118] amdgpu: unknown parameter 'dc_log' ignored

@51.6 I've disabled the secondary MST screen, the crash a couple of seconds later is when turning it back on.
Comment 12 Martin Peres 2019-11-19 08:28:49 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/294.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.