Bug 92974 - Fiji Nano long boot up and long X startup with amdgpu-powerplay enabled
Summary: Fiji Nano long boot up and long X startup with amdgpu-powerplay enabled
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-17 02:11 UTC by charlie
Modified: 2019-11-19 08:06 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Output of dmesg after startup. (22.00 KB, text/plain)
2015-11-17 02:11 UTC, charlie
no flags Details
dmesg Friday Dec. 4, 2015 (30.64 KB, text/plain)
2015-12-05 02:45 UTC, charlie
no flags Details
Kernel config Dec. 4, 2015 (88.04 KB, text/plain)
2015-12-05 02:49 UTC, charlie
no flags Details
disable pcie dpm (580 bytes, patch)
2015-12-21 21:12 UTC, Alex Deucher
no flags Details | Splinter Review
disable pcie gen3 switching (641 bytes, patch)
2015-12-21 21:24 UTC, Alex Deucher
no flags Details | Splinter Review

Description charlie 2015-11-17 02:11:51 UTC
Created attachment 119727 [details]
Output of dmesg after startup.

With AMD Powerplay enabled in the kernel (cloned from http://cgit.freedesktop.org/~agd5f/linux/?h=amdgpu-powerplay)--when the computer boots there is text, as usual, before modesetting tries to switch to the native resolution of the monitor.  At this point, the monitor backlight turns off for about 5 seconds and then stays on for approximately 2 minutes displaying nothing as if the computer locked up with no other boot activity occuring.  The Alt+PrintScreen+E/I/U/R button combo still works at this time.  If left alone the system finishes booting normally after about 2 minutes.  dmesg gives these new error/messages: "Failed to send Previous Message." and ..."[drm] ib test on ring 12 succeeded" that I have not noticed when AMD Powerplay is disabled.

Trying to run, "startx" at the console also simulates a system freeze for about 2 minutes or so then X starts normally.

To work around the long boot and long X startup bug the kernel option, "Device Drivers/Graphics support/Direct Rendering Manager/Enable legacy fbdev support for your modesetting driver" must be set with, "n".  This gives a black screen or no console after the machine boots to log in blind with.  X can also be started blind and displayed normally with no startup delay.  Another option is to autologin with KDM or such like to boot directly into X after legacy fbdev has been disabled in the kernel options.

With AMD Powerplay disabled in the kernel there are no startup issues.

Toggling or including/excluding these kernel boot, "lilo.conf" options have no effect on the bug: "amdgpu.enable_scheduler=0 ; radeon.modeset=1 ; radeon.hw_i2c=1 ; radeon.disp_priority=2 ; radeon.fastfb=1 ; radeon.backlight=1 ; radeon.pcie_gen2=-1 ; radeon.hard_reset=0"

These patches (applied and not applied) have no effect on the bug:
http://people.freedesktop.org/~agd5f/0001-radeonsi-fix-fiji-raster-config.patch
http://people.freedesktop.org/~agd5f/0001-drm-amdgpu-update-Fiji-s-tiling-mode-table.patch

Hardware: Fury Nano
git: http://cgit.freedesktop.org/~agd5f/linux/commit/?h=amdgpu-powerplay&id=256ca780500bc445752763f32b246dc9ee396b62

Sidenote: hang check timer is disabled in the kernel.
Comment 1 Michel Dänzer 2015-11-19 02:55:37 UTC
(In reply to charlie from comment #0)
> With AMD Powerplay disabled in the kernel there are no startup issues.

Do you mean CONFIG_DRM_AMD_POWERPLAY=n in .config or amdgpu.powerplay=0 on the kernel command line?


> Toggling or including/excluding these kernel boot, "lilo.conf" options have
> no effect on the bug: "amdgpu.enable_scheduler=0 ; radeon.modeset=1 ;
> radeon.hw_i2c=1 ; radeon.disp_priority=2 ; radeon.fastfb=1 ;
> radeon.backlight=1 ; radeon.pcie_gen2=-1 ; radeon.hard_reset=0"

radeon.* parameters don't have any effect with the amdgpu driver.
Comment 2 charlie 2015-11-21 03:58:43 UTC
Sorry for the delay in response.  My computer was down.

I mean with CONFIG_DRM_AMD_POWERPLAY=n in kernel the ".config" file the computer starts up normally and X starts normally too without a 2 minute delay after each.

With CONFIG_DRM_AMD_POWERPLAY=y in the kernel .config file there are two minute delays at boot and also after type, "startx".

However with CONFIG_DRM_AMD_POWERPLAY=y in kernel this delay/bug can be bypassed if, "Device Drivers/Graphics support/Direct Rendering Manager/Enable legacy fbdev support for your modesetting driver" is set to, "n".  I don't know what ".config" line name is for that option which I set using, "make menuconfig".  Bypassing the bug this way just causes a blank screen however I can still login blind and start X to a normal display.  Or I can use KDM or SDDM to auto-login for me.
Comment 3 Michel Dänzer 2015-11-24 03:42:20 UTC
What if you enable building the PowerPlay code with CONFIG_DRM_AMD_POWERPLAY=y but disable it at runtime with amdgpu.powerplay=0 on the kernel command line? Does the problem occur then or not?
Comment 4 charlie 2015-11-24 22:39:53 UTC
With these options there is no ~2 minute kernel boot or startx delay bug:

CONFIG_DRM_AMD_POWERPLAY=y in kernel ".config"

"Device Drivers/Graphics support/Direct Rendering Manager/Enable legacy fbdev support for your modesetting driver" set to, "y" in kernel "make menuconfig".

"amdgpu.powerplay=0" on lilo append line.
Comment 5 charlie 2015-12-04 08:58:35 UTC
I recently tried out kernel code up to this version: http://cgit.freedesktop.org/~agd5f/linux/commit/?h=amdgpu-powerplay&id=a7b21a9055f2576df9ff69d52811a047b9399e36

There has been no change--bug remains.
Comment 6 charlie 2015-12-05 02:45:57 UTC
Created attachment 120362 [details]
dmesg Friday Dec. 4, 2015
Comment 7 charlie 2015-12-05 02:47:18 UTC
I recently compiled this kernel version: http://cgit.freedesktop.org/~agd5f/linux/commit/?h=amdgpu-powerplay&id=ab72939ad61cc7c22b03946cac94153a1fa23e43

There was no change in the bug.  I'll submit my kernel ".config" as well.
Comment 8 charlie 2015-12-05 02:49:11 UTC
Created attachment 120363 [details]
Kernel config Dec. 4, 2015
Comment 9 charlie 2015-12-21 00:12:20 UTC
This commit:

"drm/amdgpu/powerplay/fiji: query supported pcie info from cgs (v2)"

http://cgit.freedesktop.org/~agd5f/linux/commit/?h=amdgpu-powerplay&id=64797db892500bc02f5f15524aca3be97368b7f9

and everything after the above commit has the long boot bug.

______________________

This commit:

"drm/amdgpu/powerplay/tonga: query supported pcie info from cgs (v2)"

http://cgit.freedesktop.org/~agd5f/linux/commit/?h=amdgpu-powerplay&id=211e86eee365c63028c2f25978b964caa9ad622e

does not have the long boot bug on Fiji Nano.

______________________

I did a "git reset --hard HEAD~49" (among others...) from the current git at "amd/powerplay: don't enable ucode fan control if vbios has no fan table" to eventually get to a normal boot.
Comment 10 Alex Deucher 2015-12-21 21:12:10 UTC
Created attachment 120642 [details] [review]
disable pcie dpm

Does this patch help?
Comment 11 Alex Deucher 2015-12-21 21:24:02 UTC
Created attachment 120643 [details] [review]
disable pcie gen3 switching

Please try this patch independent of the previous one.
Comment 12 charlie 2015-12-21 22:44:55 UTC
Both patches ("disable_gen3.diff" and "fiji_disable_pcie_dpm.diff") applied independently of each other work.  The kernel boots normally and X starts normally.

These commands were issued before each *.diff was applied and compiled:
"git clean -dxf ; git fetch --all ; git reset --hard origin/amdgpu-powerplay"
Comment 13 charlie 2016-06-16 03:50:26 UTC
I'm now using "drm-next-4.8-wip" (from https://cgit.freedesktop.org/~agd5f/linux/).  I still require "fiji_disable_pcie_dpm.diff" to overcome the bug.  I can't remember if "disable_gen3.diff" no longer patches cleanly or does not work once the kernel is compiled.  In any case, "disable_gen3.diff" is no longer effective.

Is this a bios issue?  If so, then I can upgrade to the latest bios for my motherboard to see if the bug persists without patching the kernel.
Comment 14 Alex Deucher 2016-06-16 16:19:37 UTC
(In reply to charlie from comment #13)
> I'm now using "drm-next-4.8-wip" (from
> https://cgit.freedesktop.org/~agd5f/linux/).  I still require
> "fiji_disable_pcie_dpm.diff" to overcome the bug.  I can't remember if
> "disable_gen3.diff" no longer patches cleanly or does not work once the
> kernel is compiled.  In any case, "disable_gen3.diff" is no longer effective.
> 
> Is this a bios issue?  If so, then I can upgrade to the latest bios for my
> motherboard to see if the bug persists without patching the kernel.

It could be.  Does a new bios help?  We've seen similar issues with certain boards internally.  What CPU/motherboard is this?  If your board has options for configuring the default pcie gen or generic pcie performance options does changing any of them help?
Comment 15 Alex Deucher 2016-06-16 16:24:06 UTC
(In reply to charlie from comment #13)
> I'm now using "drm-next-4.8-wip" (from
> https://cgit.freedesktop.org/~agd5f/linux/).  I still require
> "fiji_disable_pcie_dpm.diff" to overcome the bug.  I can't remember if
> "disable_gen3.diff" no longer patches cleanly or does not work once the
> kernel is compiled.  In any case, "disable_gen3.diff" is no longer effective.
>

On newer kernels you can configure the supported pcie gen modes via module option.  E.g., append:
amdgpu.pcie_gen_cap=0x00030003
to the kernel command line in grub to limit the bus and the card to pcie gen2.
Comment 16 charlie 2016-06-17 06:57:51 UTC
Motherboard: Asus A88X-PRO
APU: AMD A10-7850K (monitor only receiving R9 Nano output)

I will update the bios and see if there are any "new pcie gen or generic pcie performance options".

I report back on the use of "amdgpu.pcie_gen_cap=0x00030003" in lilo although  "PCIe 3.0" is printed on the motherboard.
Comment 17 charlie 2016-06-18 11:03:50 UTC
The bios was updated from 0801 to 2603. The machine booted normally with an unpatched kernel.  I restored previous overclocking settings and the long boot bug returned.  Among testing a few bios parameters I found that adjusting "NB Configuration--PCIEX16_1"--like forcing auto(PCIEX16_2), X16 or X8--has no effect on the bug.

"APU Frequency" or base clock is the only setting found to effect the long boot/startx bug.  Any base clock values greater than 100 causes the bug.  Underclocking CPU, NB and RAM has no effect when base clock is set to 101--the bug remains.

This behavior did not occur before the commit mentioned earlier in this bug report thread as the base clock remained the same since before I first submitted the bug and until my recent testing today.

With "amdgpu.pcie_gen_cap=0x00030003" applied through LILO this machine is capable of running at 104 base clock stable with system RAM near 2500mhz--without the long boot/startx bug.
Comment 18 Martin Peres 2019-11-19 08:06:59 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/59.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.