Bug 100666 - amdgpu coolers never stoping linux
Summary: amdgpu coolers never stoping linux
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-12 18:21 UTC by Denis Denisov
Modified: 2019-11-19 08:15 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Denis Denisov 2017-04-12 18:21:00 UTC
* Ubuntu 17.04, Kernel-4.10
* Gigabyte Radeon™ RX 480 WINDFORCE 8G rev1.0
* http://www.gigabyte.com/Graphics-Card/GV-RX480WF2-8GD-rev-10
* https://gist.github.com/anonymous/2e8964de6e8bf37d3a3b52dc7d213078

1. On Windows 10 & macOS coolers may be parked in mode 0 rpm.
   AMDGPU "pwm1_enable" can work only in mode=1, it is interesting that the rotation speed static.

   How hard can it be to implement?

   The initial implementation with support for throttling of the cooler.
   Have already been implemented, not enough reading VBIOS from EFI settings to automatically stop the cooler in idle.
   EFI VBIOS for macOS & Windows driver detected coolers supported noise fan 0% to idle card, but not linux amdgpu....

   Lzzy can change /sys/class/drm/card0/device/power meters/hwmon1/pwm1=0 revs but under load it is fraught with burnout,
nv support coolers speed pwm1_enable
    * 0=NONE - Card control PW?
    * 1=MANUAL
    * 2=AUTO - OS control PW?

2. When parking in cooler 0 rpm gauge still shows the rotation of coolers (lm-sensors), but the physical status LED/Fan stopped
    # echo 0 | tee /sys/class/drm/card0/device/hwmon/hwmon1/pwm1
    # sensors
    amdgpu-pci-0100
    Adapter: PCI adapter
    fan1:         875 RPM
    temp1:        +38.0°C  (crit =  +0.0°C, hyst =  +0.0°C)

    fan1: 875 RPM ?

$ cat /sys/class/drm/card0/device/power_dpm_force_performance_level
auto

$ cat /sys/class/drm/card0/device/power_dpm_state
performance

$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1
81
$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_min
0
$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_max
255


$ echo 0 | sudo tee /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
1

$ echo 0 | sudo tee /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
1

$ echo 2 | sudo tee /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
$ cat /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable
1

$ lsb_release -dcr
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty

$ uname -rmv
4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64

$ DRI_PRIME=1 glxinfo |grep string
server glx vendor string: SGI
server glx version string: 1.4
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD POLARIS10 (DRM 3.9.0 /
4.10.0-19-generic, LLVM 4.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.0.3
OpenGL core profile shading language version string: 4.50
OpenGL version string: 3.0 Mesa 17.0.3
OpenGL shading language version string: 1.30
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.0.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

$ LC_ALL=C dmesg -Tx | grep -E "drm|radeon"
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Initialized
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] amdgpu kernel
modesetting enabled.
kern  :info  : [Tue Apr 11 00:03:04 2017] fb: switching to amdgpudrmfb
from EFI VGA
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] initializing kernel
modesetting (POLARIS10 0x1002:0x67DF 0x1458:0x22DF 0xC7).
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] register mmio base: 0xEFE00000
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] register mmio size: 262144
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] doorbell mmio base: 0xE0000000
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] doorbell mmio size: 2097152
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] probing gen 2 caps for
device 8086:1901 = 261ad03/e
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] probing mlw for device
8086:1901 = 261ad03
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] UVD is enabled in VM mode
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] VCE enabled in VM mode
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] GPU post is not needed
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Detected VRAM
RAM=8192M, BAR=256M
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] RAM width 256bits GDDR5
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] amdgpu: 8192M of VRAM
memory ready
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] amdgpu: 8192M of GTT
memory ready.
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] GART: num cpu pages
2097152, num gpu pages 2097152
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] PCIE GART of 8192M
enabled (table at 0x0000000000040000).
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Supports vblank
timestamp caching Rev 2 (21.10.2013).
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Driver supports
precise vblank timestamp query.
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] amdgpu: irq initialized.
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] AMDGPU Display Connectors
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Connector 0:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DP-1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HPD6
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DDC: 0x4868 0x4868
0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   Encoders:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]     DFP1: INTERNAL_UNIPHY2
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Connector 1:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DP-2
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HPD4
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DDC: 0x4870 0x4870
0x4871 0x4871 0x4872 0x4872 0x4873 0x4873
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   Encoders:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]     DFP2: INTERNAL_UNIPHY2
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Connector 2:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DP-3
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HPD1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DDC: 0x486c 0x486c
0x486d 0x486d 0x486e 0x486e 0x486f 0x486f
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   Encoders:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]     DFP3: INTERNAL_UNIPHY1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Connector 3:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HDMI-A-1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HPD5
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DDC: 0x4874 0x4874
0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   Encoders:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]     DFP4: INTERNAL_UNIPHY1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Connector 4:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DVI-D-1
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   HPD3
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   DDC: 0x487c 0x487c
0x487d 0x487d 0x487e 0x487e 0x487f 0x487f
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]   Encoders:
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm]     DFP5: INTERNAL_UNIPHY
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Found UVD firmware
Version: 1.79 Family ID: 16
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] Found VCE firmware
Version: 52.4 Binary ID: 3
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 0
succeeded in 15 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 1
succeeded in 28 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 2
succeeded in 28 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 3
succeeded in 13 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 4
succeeded in 13 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 5
succeeded in 13 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 6
succeeded in 14 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 7
succeeded in 13 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 8
succeeded in 13 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 9
succeeded in 6 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 10
succeeded in 6 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] ring test on 11
succeeded in 1 usecs
kern  :info  : [Tue Apr 11 00:03:04 2017] [drm] UVD initialized successfully.
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ring test on 12
succeeded in 10 usecs
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ring test on 13
succeeded in 5 usecs
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] VCE initialized successfully.
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] fb mappable at 0xD136F000
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] vram apper at 0xD0000000
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] size 8294400
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] fb depth is 24
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm]    pitch is 7680
kern  :info  : [Tue Apr 11 00:03:05 2017] fbcon: amdgpudrmfb (fb0) is
primary device
kern  :info  : [Tue Apr 11 00:03:05 2017] amdgpu 0000:01:00.0: fb0:
amdgpudrmfb frame buffer device
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 0 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 1 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 2 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 3 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 4 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 5 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 6 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 7 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 8 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 9 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 10 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 11 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] ib test on ring 12 succeeded
kern  :info  : [Tue Apr 11 00:03:05 2017] [drm] Initialized amdgpu
3.9.0 20150101 for 0000:01:00.0 on minor 0
Comment 1 Sergey Kochneff 2017-08-06 16:37:53 UTC
You should try kernel 4.12. There is some progress with pwm1_enable: now we have not only "1" which is manual control but "0" (full speed) and "2" (this should be FW control, although it doesn’t look so for me: fans are still spinning and pwm1 reads 0 or 122-124 randomly). Still have to use userspace control (mode "1").
Comment 2 Denis Denisov 2017-08-06 16:57:52 UTC
(In reply to Sergey Kochneff from comment #1)
> You should try kernel 4.12. There is some progress with pwm1_enable: now we
> have not only "1" which is manual control but "0" (full speed) and "2" (this
> should be FW control, although it doesn’t look so for me: fans are still
> spinning and pwm1 reads 0 or 122-124 randomly). Still have to use userspace
> control (mode "1").

I use 4.13.0-041300rc3-generic, but still do not see the differences "pwm1_enable=1" or "pwm1_enable=2".
4.13.0-041300rc3-generic uses the default "pwm1_enable=2". I still need to run the daemon for monitoring the temperature and load of voltage relative fan speed.

$ lsb_release -drc
Description:	Ubuntu 17.04
Release:	17.04
Codename:	zesty

$ uname -rso
Linux 4.13.0-041300rc3-generic GNU/Linux
Comment 3 Lucas Riutzel 2017-09-08 04:31:16 UTC
I believe this is related. Might be a separate issue though.

I have an Asus RX550. With the amdgpu drive my fan is at what I believe is 100% all the time. Though reporting doesn't work.

I can change the pwm1_enable setting between 1/2 but no difference in the fan.

$ cat /sys/class/drm/card0/device/hwmon/hwmon0/pwm1
cat: pwm1: No such device

Attempting to set pwm1 results in no change.

$ sensors
amdgpu-pci-2400
Adapter: PCI adapter
fan1:             N/A
temp1:        +45.0°C  (crit =  +0.0°C, hyst =  +0.0°C)


$ lsb_release -drc
Description:	Arch Linux
Release:	rolling
Codename:	n/a

$ uname -rso
Linux 4.12.10-1-ARCH GNU/Linux
Comment 4 Denis Denisov 2017-09-08 04:52:48 UTC
$ ls /sys/class/drm/card*/device/hwmon/hwmon*/pwm*
Comment 5 Lucas Riutzel 2017-09-08 04:59:07 UTC
$ ls /sys/class/drm/card*/device/hwmon/hwmon*/pwm*

/sys/class/drm/card0/device/hwmon/hwmon0/pwm1
/sys/class/drm/card0/device/hwmon/hwmon0/pwm1_enable
/sys/class/drm/card0/device/hwmon/hwmon0/pwm1_max
/sys/class/drm/card0/device/hwmon/hwmon0/pwm1_min
Comment 6 Dimitrios Liappis 2017-10-22 10:04:27 UTC
Lucas, the Asus RX-550 (and apparently cards from all manufacturers for this chipset) doesn't have any fan control. See my comment here: https://bugs.freedesktop.org/show_bug.cgi?id=97556#c7
Comment 7 Lucas Riutzel 2017-10-23 18:01:04 UTC
Dimitrios,

Good to know. Looks like my short term solution became a long term one.

I unplugged the builtin fan and ziptied on another fan and connected it to a motherboard fan header. I might come back and extend the original fan header to reach the motherboard.
Comment 8 Luke McKee 2018-02-28 01:29:22 UTC
Yeah I documented the workaround too:
https://forum-en.msi.com/index.php?topic=298468.0

Root cause may be able to how amdgpu handles not being able to read it's powerplay settings because motherboard bioses (old AMI) don't set up MMIO BARs properly and intel submitted a patch to enforce restrictions on memory address regions in UEFI.
Comment 9 Alex Deucher 2018-02-28 02:08:06 UTC
(In reply to Luke McKee from comment #8)
> Yeah I documented the workaround too:
> https://forum-en.msi.com/index.php?topic=298468.0
>

Please stop posting this on every bug report.
Comment 10 Alex Deucher 2018-02-28 02:10:24 UTC
(In reply to Alex Deucher from comment #9)
> (In reply to Luke McKee from comment #8)
> > Yeah I documented the workaround too:
> > https://forum-en.msi.com/index.php?topic=298468.0
> >
> 
> Please stop posting this on every bug report.  That page is confusing and not likely related to any of these.
Comment 11 Luke McKee 2018-02-28 02:36:43 UTC
In this case it was on topic. The link explains how to use fancontrol script from lm_sensors to work around fan control issues. I saw on another ticket when I first posted here that dc=1 fixed the fancontrol issues. Finally I got dc=1 working and still it doesn't resolve the dpm fancontrol issues on my platform.

https://github.com/kobalicek/amdtweak
as root
# ./amdtweak  --card 0 --verbose --extract-bios /tmp/amdbios.bin
fails. The sysfs shows that the powerplay tables are not proper too.

[ 4969.713277] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c3fff window]
[ 4969.713283] caller pci_map_rom+0x66/0xf0 mapping multiple BARs
[ 4969.713289] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

If it can't read it's powerplay table because it can't read the bios maybe that's why there is all these problems.


 (In reply to Alex Deucher from comment #9)
> 
> Please stop posting this on every bug report.

https://bugs.freedesktop.org/show_bug.cgi?id=100666#c0
Also the users above on this ticket above here when they grepped their dmesg wouldn't have output any powerplay mes.sages because they grepped radeon instead of amdgpu

[   10.124232] amdgpu: [powerplay] 
                failed to send message 309 ret is 254 
[   10.124248] amdgpu: [powerplay] 
                failed to send pre message 14e ret is 254 

Maybe Denis could confirm or deny if this is in his dmesg?
Comment 12 Luke McKee 2018-02-28 03:00:40 UTC
(In reply to Alex Deucher from comment #10)

> > Please stop posting this on every bug report.  That page is confusing and not likely related to any of these.

You obviously know about this sir.
https://bugs.freedesktop.org/attachment.cgi?id=135739
https://bugs.freedesktop.org/show_bug.cgi?id=98798

A new intel patch has caused a reversion to the behaviour in this old ticket.
it's using pci_info not dev_info now.
Comment 13 Alex Deucher 2018-02-28 03:07:11 UTC
(In reply to Luke McKee from comment #11)
> In this case it was on topic. The link explains how to use fancontrol script
> from lm_sensors to work around fan control issues. I saw on another ticket
> when I first posted here that dc=1 fixed the fancontrol issues. Finally I
> got dc=1 working and still it doesn't resolve the dpm fancontrol issues on
> my platform.

dc and powerplay are largely independent.  It's generally not likely that one will affect the other.  

> 
> https://github.com/kobalicek/amdtweak
> as root
> # ./amdtweak  --card 0 --verbose --extract-bios /tmp/amdbios.bin
> fails. The sysfs shows that the powerplay tables are not proper too.
> 

I'm not familiar with that tool or how it goes about attempting to fetch the vbios.  The driver uses several mechanism to fetch it depending on the platform.  It's possible that tool does something weird to fetch the vbios and it's possible that tool incorrectly interprets some of the vbios tables.

> [ 4969.713277] resource sanity check: requesting [mem
> 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem
> 0x000c0000-0x000c3fff window]
> [ 4969.713283] caller pci_map_rom+0x66/0xf0 mapping multiple BARs
> [ 4969.713289] amdgpu 0000:01:00.0: Invalid PCI ROM header signature:
> expecting 0xaa55, got 0xffff

This last message is from the pci subsystem and is harmless.  If the driver were not able to load the vbios, it would fail to load.

> 
> If it can't read it's powerplay table because it can't read the bios maybe
> that's why there is all these problems.

The driver is able to load the vbios image just fine.  If it wasn't able to, or if there was a major problem with one of the tables, the driver would fail to load.

> 
> 
>  (In reply to Alex Deucher from comment #9)
> > 
> > Please stop posting this on every bug report.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=100666#c0
> Also the users above on this ticket above here when they grepped their dmesg
> wouldn't have output any powerplay mes.sages because they grepped radeon
> instead of amdgpu
> 
> [   10.124232] amdgpu: [powerplay] 
>                 failed to send message 309 ret is 254 
> [   10.124248] amdgpu: [powerplay] 
>                 failed to send pre message 14e ret is 254 
> 

There are lots of reasons an smu message might fail.  Just because you see an smu message failure does not mean you are seeing the same issue as someone else.  It's like a GPU hang.  There are lots of potential root causes.
Comment 14 Luke McKee 2018-02-28 04:06:28 UTC
Alex thanks for your help.

How it gets the rom is probably the same as this shell script using the pci method listed on that github link in the last comment.

# To read ROM you first need to write `1` to it, then read it, and then write
# `0` to it as described in the documentation. The reason is that the content
# is not provided by default, by writing `1` to it you are telling the driver
# to make it accessible.
CARD_ID=0
CARD_ROM="/sys/class/drm/card${CARD_ID}/device/rom"
FILE_ROM="amdgpu-rom.bin"

echo 1 > $CARD_ROM
cat $CARD_ROM > $FILE_ROM
echo 0 > $CARD_ROM
echo "Saved as ${FILE_ROM}"

--
output:
cat: /sys/class/drm/card0/device/rom: Input/output error
[Not] Saved as amdgpu-rom.bin

Is there any other user-space accessible methods to extract / write the rom in Linux? Now only focusing on comparing the pp table to other roms not modifying it.
Maybe the powerplay is an atom-bios issue perhaps. If it's still broken in this 4.16-rc1 version I'm trying out now I'll open a ticket.

For your reference this is the ticket that claims powerplay dpm is fixed in newer kernels / dc=1 in 4.15.
https://bugs.freedesktop.org/show_bug.cgi?id=100443#c37
Comment 15 Alex Deucher 2018-02-28 04:21:34 UTC
(In reply to Luke McKee from comment #14)
> Alex thanks for your help.
> 
> How it gets the rom is probably the same as this shell script using the pci
> method listed on that github link in the last comment.
> 
> # To read ROM you first need to write `1` to it, then read it, and then write
> # `0` to it as described in the documentation. The reason is that the content
> # is not provided by default, by writing `1` to it you are telling the driver
> # to make it accessible.
> CARD_ID=0
> CARD_ROM="/sys/class/drm/card${CARD_ID}/device/rom"
> FILE_ROM="amdgpu-rom.bin"
> 
> echo 1 > $CARD_ROM
> cat $CARD_ROM > $FILE_ROM
> echo 0 > $CARD_ROM
> echo "Saved as ${FILE_ROM}"
> 
> --
> output:
> cat: /sys/class/drm/card0/device/rom: Input/output error
> [Not] Saved as amdgpu-rom.bin

That should generally work for desktop discrete cards.  You need to be root however.

> 
> Is there any other user-space accessible methods to extract / write the rom
> in Linux? Now only focusing on comparing the pp table to other roms not
> modifying it.

You can read the amdgpu_vbios file in debugfs.  That will dump the copy of the vbios that the driver is using.

> Maybe the powerplay is an atom-bios issue perhaps. If it's still broken in
> this 4.16-rc1 version I'm trying out now I'll open a ticket.
> 
> For your reference this is the ticket that claims powerplay dpm is fixed in
> newer kernels / dc=1 in 4.15.
> https://bugs.freedesktop.org/show_bug.cgi?id=100443#c37

There's no confirmation that specifically enabling dc fixed it.

Anyway, we are cluttering up this bug with potentially unrelated information.  Please file a new bug for your issue and we can discuss it there.
Comment 16 Martin Peres 2019-11-19 08:15:10 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/152.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.