Created attachment 135050 [details] dmesg (broken boot) I am seeing messages like the following during startup: ``` [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2) AMD-Vi: Event logged [IO_PAGE_FAULT device=00:00.0 domain=0x0000 address=0x00000000fffc0000 flags=0x0070] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5 [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22) amdgpu 0000:01:00.0: Fatal error during GPU init [TTM] Memory type 2 has not been initialized ``` The first message always appears, while the others are not as easily reproducable. During a boot like this, the second card (AMD Radeon RX 560) fails to come up and is not available to the system. After a "regular" startup, `dmesg -l err` shows the following messages: ``` [ 3.056584] [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2) [ 6.592964] ACPI Error: [AFN7] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364) [ 6.593020] ACPI Error: Method parse/execution failed \_SB.PCI0.VGA.LCD._BCM, AE_NOT_FOUND (20170531/psparse-550) [ 6.593062] ACPI Error: Evaluating _BCM failed (20170531/video-364) [ 6.593207] ACPI Error: [AFN7] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364) [ 6.593243] ACPI Error: Method parse/execution failed \_SB.PCI0.PB21.VGA.LCD._BCM, AE_NOT_FOUND (20170531/psparse-550) [ 6.593286] ACPI Error: Evaluating _BCM failed (20170531/video-364) [ 6.628143] snd_hda_intel 0000:01:00.1: control 3:0:0:ELD:0 is already present [ 6.631508] snd_hda_intel 0000:01:00.1: control 3:0:0:ELD:0 is already present [ 6.637737] snd_hda_intel 0000:01:00.1: control 3:0:0:ELD:0 is already present ``` Other weird behaviour I notice is: * Hangs of the entire system when I start Steam using `env DRI_PRIME=1 steam` (nothing reacts to commands anymore, including mouse clicks, the power button and the num-lock key, and the mouse cursor moves very sluggishly) * Crashes of KWin when using Alt+Tab (s.b.) * The firmware and GRUB (and Linux, initially) display at 1024x768, while the monitor's native resolution is 2560x1080. After the Linux kernel takes over, the monitor switches back to the native resolution. * Sometimes the system fails to boot entirely and gets stuck after the "*ERROR* ACPI VFCT table present but broken" error message I would hope that someone could guide me in gathering more information about this and in the best case getting additional output or a backtrace from the kernel, please. Please find the full output of dmesg, lshw, lspci, glxinfo attached. Output taken after a "broken" boot with the AMD Radeon RX 560 not coming up is suffixed with "broken-boot", while output taken from a system that came up more completely is suffixed with "regular-boot". I run Gentoo Linux with following software: * Linux 4.13.8 * Mesa 17.2.3 * LLVM 5.0.0 I have two graphics cards plugged in: * AMD Radeon R7 / AMD A10-7800 * AMD Radeon RX 560 The monitor is connected via Display Port to the first card (R7). If more information would be helpful, please tell me how and I will try to acquire it. See-Also: https://bugs.freedesktop.org/show_bug.cgi?id=103234
Created attachment 135051 [details] dmesg (regular boot)
Created attachment 135052 [details] lshw (broken boot)
Created attachment 135053 [details] lshw (regular boot)
Created attachment 135054 [details] lspci (broken boot)
Created attachment 135055 [details] lspci (regular boot)
Created attachment 135056 [details] glxinfo (broken boot)
Created attachment 135057 [details] glxinfo (regular boot)
Created attachment 135058 [details] glxinfo (regular boot, DRI_PRIME=1)
(In reply to Dennis Schridde from comment #0) > During a boot like this, the second card (AMD Radeon RX 560) fails to come up > and is not available to the system. That's actually because of: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD) [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -22 amdgpu 0000:01:00.0: amdgpu_init failed the other messages are probably mostly harmless / not directly related to this problem. > * Hangs of the entire system when I start Steam using `env DRI_PRIME=1 > steam` (nothing reacts to commands anymore, including mouse clicks, the power > button and the num-lock key, and the mouse cursor moves very sluggishly) That's probably related to the above. > * The firmware and GRUB (and Linux, initially) display at 1024x768, while > the monitor's native resolution is 2560x1080. That's a motherboard firmware / video card ROM issue, nothing to do with the Linux kernel / drivers.
(In reply to Michel Dänzer from comment #9) > (In reply to Dennis Schridde from comment #0) > > During a boot like this, the second card (AMD Radeon RX 560) fails to come up > > and is not available to the system. > > That's actually because of: > > [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed > (scratch(0xC040)=0xCAFEDEAD) > [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> > failed -22 > amdgpu 0000:01:00.0: amdgpu_init failed > > the other messages are probably mostly harmless / not directly related to > this problem. > > > > * Hangs of the entire system when I start Steam using `env DRI_PRIME=1 > > steam` (nothing reacts to commands anymore, including mouse clicks, the power > > button and the num-lock key, and the mouse cursor moves very sluggishly) > > That's probably related to the above. Clarification / more information: The missing RX 560 happens for a few boots *after* the full system hang. I.e. first I run Steam with DRI_PRIME=1, click around for a bit until the system hangs (sometimes waiting alone seems to be enough, though), then I hard-reset the system, when Linux started the RX 560 is missing, I reboot, RX 560 is still missing, ... (loop for a few iterations) ..., RX 560 is back and we have a "regular boot". > > * The firmware and GRUB (and Linux, initially) display at 1024x768, while > > the monitor's native resolution is 2560x1080. > > That's a motherboard firmware / video card ROM issue, nothing to do with the > Linux kernel / drivers. One more bit of information (though probably still unrelated): The resolution is correct when I do not plug in the RX 560, or connect the display to the RX 560 directly (and setup the mainboard firmware to use the dGPU as primary video adapter).
One more thing: It appears as if the first "cold" boot usually fails -- the kernel hanging after "[drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2)" and the blinking cursor freezing. "Cold" boot meaning booting the system after it was powered off. The next boot (hard resetting the machine) usually succeeds. I write "usually", because sometimes the first boot already succeeds, and sometimes it needs two hard resets to bring up the machine successfully, but I cannot yet make out a pattern. I will enable verbose and debug command line arguments to hopefully get some more information, the next time it happens.
P.S. If you have ANY hint, on how to gather more information about this issue, I would be most grateful. Maybe there is some way to make the kernel dump stacktraces somewhere, or to make the hardware itself dump information someplace...?
Created attachment 135165 [details] Linux 4.13.10-gentoo config
Created attachment 135171 [details] photo of screen when kernel hangs Please find attached a photo of the screen contents, with "verbose debug" in the kernel command line, at the time when the kernel hangs. If you have a tip on how to get the full log in text form, even though the kernel hangs, that would be great and might give us additional information on what is actually happening.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/249.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.