Bug 105619 - Kernel DC oops on dce81_create_resource_pool with kernel 4.15
Summary: Kernel DC oops on dce81_create_resource_pool with kernel 4.15
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-20 09:43 UTC by freedesktop
Modified: 2019-11-19 08:32 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
kernel 4.15 oops (86.19 KB, text/x-log)
2018-03-20 09:43 UTC, freedesktop
no flags Details
Xorg log when using the 4.15 kernel (70.80 KB, text/x-log)
2018-03-21 09:41 UTC, freedesktop
no flags Details
Xorg log when running on kernel 4.13 (ubuntu) (50.80 KB, text/x-log)
2018-03-21 09:48 UTC, freedesktop
no flags Details
kernel-4.16.2 dmesg (72.94 KB, text/plain)
2018-04-12 17:11 UTC, Michael Lange
no flags Details
kernel-4.16.2 Xorg.0.log (1.21 MB, text/x-log)
2018-04-12 17:13 UTC, Michael Lange
no flags Details
EL7 ROCM 1.8 dmesg (72.63 KB, text/x-log)
2018-05-06 17:19 UTC, freedesktop
no flags Details
EL7 ROCM 1.8 Xorg.0.log (22.27 KB, text/x-log)
2018-05-06 17:20 UTC, freedesktop
no flags Details

Description freedesktop 2018-03-20 09:43:23 UTC
Created attachment 138216 [details]
kernel 4.15 oops

Hi,

I tried to experiment with HSA on recent upstream kernels [1][2], but instead it seems I ran into are some DC wiring(?) problems with Kaveri:

Mar 18 15:05:48 z kernel: [    2.015153] [drm:resource_construct [amdgpu]] *ERROR* DC: unexpected audio fuse!
Mar 18 15:05:48 z kernel: [    2.015232] WARNING: CPU: 2 PID: 173 at /home/kernel/COD/linux/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_resource.c:190 resource_construct+0x2aa/0x310 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015236] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash radeon crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_algo_bit ttm pcbc drm_kms_helper syscopyarea sysfillrect sysimgblt aesni_intel fb_sys_fops aes_x86_64 crypto_simd r8169 ahci glue_helper cryptd libahci drm mii wmi video
Mar 18 15:05:48 z kernel: [    2.015256] CPU: 2 PID: 173 Comm: systemd-udevd Not tainted 4.15.10-041510-generic #201803152130
Mar 18 15:05:48 z kernel: [    2.015259] Hardware name: System manufacturer System Product Name/A88XM-A, BIOS 3001 03/09/2016
Mar 18 15:05:48 z kernel: [    2.015303] RIP: 0010:resource_construct+0x2aa/0x310 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015305] RSP: 0018:ffffa95d41eef718 EFLAGS: 00010282
Mar 18 15:05:48 z kernel: [    2.015307] RAX: 0000000000000000 RBX: ffff9098c1b2d300 RCX: ffffffffb2062808
Mar 18 15:05:48 z kernel: [    2.015310] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
Mar 18 15:05:48 z kernel: [    2.015312] RBP: ffffa95d41eef778 R08: 0000000000000000 R09: 000000000000032a
Mar 18 15:05:48 z kernel: [    2.015314] R10: ffff9098defd5ef8 R11: 0720072007200720 R12: 0000000000000007
Mar 18 15:05:48 z kernel: [    2.015317] R13: ffffffffc06be800 R14: ffff9098c27eac20 R15: ffff9098c27eac00
Mar 18 15:05:48 z kernel: [    2.015319] FS:  00007ff020a1b8c0(0000) GS:ffff9098ded00000(0000) knlGS:0000000000000000
Mar 18 15:05:48 z kernel: [    2.015322] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 15:05:48 z kernel: [    2.015324] CR2: 00007ff020a01cc8 CR3: 0000000401d10000 CR4: 00000000000406e0
Mar 18 15:05:48 z kernel: [    2.015327] Call Trace:
Mar 18 15:05:48 z kernel: [    2.015372]  dce81_create_resource_pool+0x419/0x4b0 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015416]  dc_create_resource_pool+0xd9/0x180 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015421]  ? _cond_resched+0x19/0x40
Mar 18 15:05:48 z kernel: [    2.015424]  ? __kmalloc+0x1e7/0x220
Mar 18 15:05:48 z kernel: [    2.015468]  ? dal_gpio_service_create+0xa1/0x120 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015511]  dc_create+0x244/0x6c0 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015556]  dm_hw_init+0xf2/0x2a0 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015588]  amdgpu_device_init+0xd23/0x1620 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015592]  ? kmalloc_order+0x18/0x40
Mar 18 15:05:48 z kernel: [    2.015595]  ? kmalloc_order_trace+0x24/0xb0
Mar 18 15:05:48 z kernel: [    2.015627]  amdgpu_driver_load_kms+0x8b/0x2e0 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015648]  drm_dev_register+0x149/0x1d0 [drm]
Mar 18 15:05:48 z kernel: [    2.015681]  amdgpu_pci_probe+0x113/0x150 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015712]  local_pci_probe+0x47/0xa0
Mar 18 15:05:48 z kernel: [    2.015715]  pci_device_probe+0x145/0x1b0
Mar 18 15:05:48 z kernel: [    2.015720]  driver_probe_device+0x31e/0x490
Mar 18 15:05:48 z kernel: [    2.015723]  __driver_attach+0xa7/0xf0
Mar 18 15:05:48 z kernel: [    2.015727]  ? driver_probe_device+0x490/0x490
Mar 18 15:05:48 z kernel: [    2.015730]  bus_for_each_dev+0x70/0xc0
Mar 18 15:05:48 z kernel: [    2.015733]  driver_attach+0x1e/0x20
Mar 18 15:05:48 z kernel: [    2.015736]  bus_add_driver+0x1c7/0x270
Mar 18 15:05:48 z kernel: [    2.015739]  ? 0xffffffffc0752000
Mar 18 15:05:48 z kernel: [    2.015742]  driver_register+0x60/0xe0
Mar 18 15:05:48 z kernel: [    2.015745]  ? 0xffffffffc0752000
Mar 18 15:05:48 z kernel: [    2.015749]  __pci_register_driver+0x5a/0x60
Mar 18 15:05:48 z kernel: [    2.015790]  amdgpu_init+0x96/0xa9 [amdgpu]
Mar 18 15:05:48 z kernel: [    2.015795]  do_one_initcall+0x52/0x191
Mar 18 15:05:48 z kernel: [    2.015798]  ? __vunmap+0x81/0xb0
Mar 18 15:05:48 z kernel: [    2.015801]  ? _cond_resched+0x19/0x40
Mar 18 15:05:48 z kernel: [    2.015803]  ? kmem_cache_alloc_trace+0xa6/0x1b0
Mar 18 15:05:48 z kernel: [    2.015807]  ? do_init_module+0x27/0x209
Mar 18 15:05:48 z kernel: [    2.015811]  do_init_module+0x5f/0x209
Mar 18 15:05:48 z kernel: [    2.015814]  load_module+0x18ea/0x1ee0
Mar 18 15:05:48 z kernel: [    2.015819]  ? ima_post_read_file+0x96/0xa0
Mar 18 15:05:48 z kernel: [    2.015823]  SYSC_finit_module+0xfc/0x120
Mar 18 15:05:48 z kernel: [    2.015826]  ? SYSC_finit_module+0xfc/0x120
Mar 18 15:05:48 z kernel: [    2.015830]  SyS_finit_module+0xe/0x10
Mar 18 15:05:48 z kernel: [    2.015833]  do_syscall_64+0x73/0x130
Mar 18 15:05:48 z kernel: [    2.015838]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Mar 18 15:05:48 z kernel: [    2.015840] RIP: 0033:0x7ff01f8714d9
Mar 18 15:05:48 z kernel: [    2.015842] RSP: 002b:00007fff8fd68a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Mar 18 15:05:48 z kernel: [    2.015846] RAX: ffffffffffffffda RBX: 0000557008501e50 RCX: 00007ff01f8714d9
Mar 18 15:05:48 z kernel: [    2.015849] RDX: 0000000000000000 RSI: 0000557008502190 RDI: 0000000000000015
Mar 18 15:05:48 z kernel: [    2.015851] RBP: 0000557008502190 R08: 0000000000000000 R09: 000000000000001a
Mar 18 15:05:48 z kernel: [    2.015854] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
Mar 18 15:05:48 z kernel: [    2.015857] R13: 000055700850f150 R14: 0000000000020000 R15: 0000000000000000
Mar 18 15:05:48 z kernel: [    2.015860] Code: 3d e4 1a f1 48 8b 4d a0 48 89 81 68 5f 00 00 b8 01 00 00 00 eb b7 48 c7 c2 69 8d 6e c0 31 f6 48 c7 c7 85 8d 6e c0 e8 36 45 b7 ff <0f> 0b 45 85 e4 44 89 65 b0 0f 95 c0 e9 18 fe ff ff 45 85 e4 44 
Mar 18 15:05:48 z kernel: [    2.015891] ---[ end trace ab3652e9b02f9b75 ]---

The primary display seems to work, etc, but the secondary display does seems connected anymore. I did not experienced the same problem on ubuntu 4.13 kernel when dc enabled (using the same boot options). I may saw some DC output wiring bugs (try to connect to wrong components) and patches somewhere related to kaveri, but I did not found again. Any suggestions how to fix it? THe kernel logs are attached.

Hardware: ASUS A88XM-A + A10-7850K

Software: Ubuntu 16.04

4.15 kernel comes from (using ukuu):
http://kernel.ubuntu.com/~kernel-ppa/mainline/

4.13 kernel comes from ubuntu package.

[1] https://lists.freedesktop.org/archives/amd-gfx/2017-September/013611.html
[2] https://patchwork.freedesktop.org/patch/196060/

Thanks for your help
Comment 1 Harry Wentland 2018-03-20 13:55:34 UTC
That's only a warning because the audio fuse is unexpected. Should be nothing to worry about.
Comment 2 freedesktop 2018-03-20 15:30:45 UTC
Thanks for your help! It may be a warning about the audio fuse, but the functionality loss is real - no secondary display exists/detected.

The same problem (same oops and no secondary display) exists with 4.13 kernel + ROCK/ROCM DKMS (https://github.com/RadeonOpenCompute/ROCm).
Comment 3 Alex Deucher 2018-03-20 15:46:45 UTC
What display connectors are actually on your board?  Can you attach your xorg log?
Comment 4 freedesktop 2018-03-21 09:41:18 UTC
Created attachment 138236 [details]
Xorg log when using the 4.15 kernel

Thank you for your help! Sorry for the delay, the Xorg log attached I use the 4.15 kernel.

The motherboard is an ASUS A88XM-A, from this page (it has a picture about the connectors direct url may or may not work):

https://www.asus.com/Motherboards/A88XMA/

Direct link:
https://www.asus.com/websites/global/products/zDRJFj2HfjDphY4C/line.jpg

I have a monitor (AL2623W) connected to DVI-D and a projector (usually turned off) connected to the VGA connector.
Comment 5 freedesktop 2018-03-21 09:48:36 UTC
Created attachment 138237 [details]
Xorg log when running on kernel 4.13 (ubuntu)

Here is the Xorg log when running on kernel 4.13. 

it has the following lines in the log lines when the output is enumerated:

[    26.851] (II) AMDGPU(0): glamor detected, initialising EGL layer.
[    26.851] (II) AMDGPU(0): KMS Pageflipping: enabled
[    26.899] (II) AMDGPU(0): Output DVI-D-0 has no monitor section
[    26.900] (II) AMDGPU(0): Output HDMI-A-0 has no monitor section
[    26.961] (II) AMDGPU(0): Output VGA-0 has no monitor section
[    26.995] (II) AMDGPU(0): EDID for output DVI-D-0

running on kernel 4.15 the output enumeration log entries looks like this:

[    23.921] (II) AMDGPU(0): glamor detected, initialising EGL layer.
[    23.921] (II) AMDGPU(0): KMS Pageflipping: enabled
[    23.921] (II) AMDGPU(0): Output DVI-D-0 has no monitor section
[    23.921] (II) AMDGPU(0): Output HDMI-A-0 has no monitor section
[    23.923] (II) AMDGPU(0): EDID for output DVI-D-0


Thanks for your help!
Comment 6 Michael Lange 2018-04-12 17:08:58 UTC
Running in the same problem, same error in dmesg.
I have two monitors (AOG AG272FCX), one connected via hdmi to the discrete gpu (HD 8800) and the second one connected via hdmi to the integrated gpu (KAVERI).
On Windows 10 both working as expected, but in linux only the first monitor (connected to HD8800) works. 

Mainbord: Gigabyte G1.Sniper A88X-CF
CPU: A10-7850K

linux: 4.16.2

The dmesg and Xorg.0.log are attached
Comment 7 Michael Lange 2018-04-12 17:11:26 UTC
Created attachment 138811 [details]
kernel-4.16.2  dmesg
Comment 8 Michael Lange 2018-04-12 17:13:44 UTC
Created attachment 138812 [details]
kernel-4.16.2  Xorg.0.log
Comment 9 freedesktop 2018-05-06 17:19:36 UTC
Created attachment 139396 [details]
EL7 ROCM 1.8 dmesg

The ROCM 1.8 beta amdgpu dkms module have the same error message with Kaveri.
Comment 10 freedesktop 2018-05-06 17:20:14 UTC
Created attachment 139397 [details]
EL7 ROCM 1.8 Xorg.0.log
Comment 11 freedesktop 2018-05-06 17:34:23 UTC
None of the graphics display working (DVI, VGA) on EL7 + ROCM 1.8 beta. At least on 4.15 the DVI output was worked. It seems that this issue will not going to be fixed, just disable the amdgpu support for Kaveri:

Don't default to DC support for Kaveri and older:
https://lists.freedesktop.org/archives/amd-gfx/2018-May/022021.html
Comment 12 Martin Peres 2019-11-19 08:32:29 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/324.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.