Summary: | smu7_populate_single_firmware_entry fails to load powerplay firmware. | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | José Pekkarinen <koalinux> | ||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | CC: | mpagano, RedGreenBlueDiamond, rene.linder, taijian | ||||||||||||
Version: | unspecified | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=105760 | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Description
José Pekkarinen
2018-01-30 08:32:11 UTC
I believe the issue is actually shown before, these are the lines prior to the error: Mar 4 22:00:16 bee kernel: [ 35.741939] amdgpu: [powerplay] Failed to notify smc display settings! Mar 4 22:00:16 bee laptop-mode[9581]: Failed to re-set power saving mode for wireless card Mar 4 22:00:21 bee kernel: [ 40.847050] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting Mar 4 22:00:21 bee kernel: [ 40.847078] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 7424 (len 272, WS 0, PS 4) @ 0x746D Mar 4 22:00:21 bee kernel: [ 40.847102] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 606A (len 70, WS 0, PS 8) @ 0x6090 Mar 4 22:00:21 bee kernel: [ 40.847117] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu asic init failed Mar 4 22:00:21 bee kernel: [ 41.200534] amdgpu 0000:01:00.0: Wait for MC idle timedout ! Mar 4 22:00:22 bee kernel: [ 41.553970] amdgpu 0000:01:00.0: Wait for MC idle timedout ! Mar 4 22:00:22 bee kernel: [ 41.563727] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000). Mar 4 22:00:22 bee kernel: [ 41.566759] amdgpu: [powerplay] smu not running, upload firmware again ... I believe the error is that the following function is trying to send a message to a display, when the vga is not tied to any: static int smu7_notify_smc_display(struct pp_hwmgr *hwmgr) { struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend); if (hwmgr->feature_mask & PP_VBI_TIME_SUPPORT_MASK) smum_send_msg_to_smc_with_parameter(hwmgr, (PPSMC_Msg)PPSMC_MSG_SetVBITimeout, data->frame_time_x2); return (smum_send_msg_to_smc(hwmgr, (PPSMC_Msg)PPSMC_HasDisplay) == 0) ? 0 : -EINVAL; } Is there any way to check if there is a display from hwmgr? Created attachment 137911 [details]
Initialization output on SMU firmware load.
Seems that autodetection of firmware method to load fails to get
a proper default, setting firmware load to SMU on /etc/modprobe.d
fixes the issue for me.
$ DRI_PRIME=1 glxinfo|grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: AMD Radeon (TM) R7 M360 (ICELAND / DRM 3.23.0 / 4.15.7+, LLVM 5.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.0-rc4
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 18.0.0-rc4
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 18.0.0-rc4
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:
What kernel are you using? I don't see any way the fw loading type could not be set correctly for any chip unless a module parameter was provided. Please attach your full dmesg output. What module parameters are you using? Created attachment 137923 [details]
Right boot initialization
Unfortunately it seems to have been once that it booted properly, today it
doesn't boot well anymore. Here is what I could recover from that one. The plus of the kernel is for the following irrelevant patch that hogs my boot in a
different way:
diff --git a/block/blk-core.c b/block/blk-core.c
index 82b92adf3477..f05714bea9ad 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3650,10 +3650,6 @@ EXPORT_SYMBOL(blk_finish_plug);
*/
void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
{
- /* not support for RQF_PM and ->rpm_status in blk-mq yet */
- if (q->mq_ops)
- return;
-
q->dev = dev;
q->rpm_status = RPM_ACTIVE;
pm_runtime_set_autosuspend_delay(q->dev, -1);
Please don't hesitate to ask for any other output you may need.
Thanks!
Created attachment 138467 [details] backtrace trying drm-4.17 As stated in bug 105760, drm-4.17 didn't make any difference for me, though, setting acpi_backlight=intel_backlight, as mentioned in(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1656649), allowed me to boot once without any backtraces, using an stable kernel: $ DRI_PRIME=1 glxinfo ... OpenGL vendor string: X.Org OpenGL renderer string: AMD Radeon (TM) R7 M360 (ICELAND / DRM 3.23.0 / 4.15.12+, LLVM 6.0.0) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.0 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile ... I'll try to reboot several times to see if the work around works always, and let you know about it. Thanks! José. So far no luck, seems like with the acpi_backlight and removing laptop mode from my services initialisation gives more chances to successfully boot without issues, but eventually the bug will turn up. possibly related to: https://bugzilla.kernel.org/show_bug.cgi?id=199693 I'm afraid I reproduced this again using the kernel 4.17.2 where I see the following commit: commit 9ca5a2ae4259e7aec8efb0db0f6ec721a6854c54 Merge: bee797529d7c c62ec4610c40 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu May 24 08:49:56 2018 -0700 Merge tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a regression from the 4.15 cycle that caused the system suspend and resume overhead to increase on many systems and triggered more serious problems on some of them (Rafael Wysocki)" * tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / core: Fix direct_complete handling for devices with no callbacks I'll attach the boot process with the traceback soon. Created attachment 140225 [details]
boot process on 4.17.2.
Created attachment 140293 [details]
dri config to work it around.
I spent some more time toying this around and I found that, building
the drm inside the kernel like in this extract of the config, and
provided that you still blacklist the amdgpu module:
BOOT_IMAGE=/kernel-genkernel-x86_64-4.17.2+ root=/dev/mapper/bee-root ro dolvm crypt_root=/dev/sda3 real_root=/dev/bee/root resume=/dev/bee/swap iommu=pt intel_iommu=on kvm-intel.nested=1 apparmor=1 zswap.enabled=1 security=apparmor crashkernel=256M modprobe.blacklist=amdgpu
The kernel booting will ignore my evil purposes:
# dmesg|grep amdgpu
[ 0.000000] Command line: BOOT_IMAGE=/kernel-genkernel-x86_64-4.17.2+ root=/dev/mapper/bee-root ro dolvm crypt_root=/dev/sda3 real_root=/dev/bee/root resume=/dev/bee/swap iommu=pt intel_iommu=on kvm-intel.nested=1 apparmor=1 zswap.enabled=1 security=apparmor crashkernel=256M modprobe.blacklist=amdgpu
[ 0.000000] Kernel command line: BOOT_IMAGE=/kernel-genkernel-x86_64-4.17.2+ root=/dev/mapper/bee-root ro dolvm crypt_root=/dev/sda3 real_root=/dev/bee/root resume=/dev/bee/swap iommu=pt intel_iommu=on kvm-intel.nested=1 apparmor=1 zswap.enabled=1 security=apparmor crashkernel=256M modprobe.blacklist=amdgpu
[ 0.755913] [drm] amdgpu kernel modesetting enabled.
[ 0.756304] amdgpu 0000:01:00.0: enabling device (0400 -> 0403)
[ 0.772792] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[ 0.773338] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 0.777463] [drm] amdgpu: 2048M of VRAM memory ready
[ 0.778060] [drm] amdgpu: 3072M of GTT memory ready.
[ 0.798170] amdgpu: [powerplay] can't get the mac of 5
[ 0.809732] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:01:00.0 on minor 0
[ 6.112759] amdgpu: [powerplay] VI should always have 2 performance levels
[ 6.144432] amdgpu 0000:01:00.0: GPU pci config reset
[ 22.024297] amdgpu: [powerplay] can't get the mac of 5
resulting in a properly booted machine that is able to use the amd gpu:
$ DRI_PRIME=1 glxinfo |grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: AMD Radeon (TM) R7 M360 (ICELAND, DRM 3.25.0, 4.17.2+, LLVM 6.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.1.2
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.1.2
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 18.1.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:
Technically it seems to be a weird workaround and I don't know if that may
light your bulbs in any way, but hey, I hope it helps.
José.
aw, and yes, if I don't blacklist the module, the trace happens, which certainly blows my mind, as it's built in, modprobe shouldn't affect it. José. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/300. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.