Bug 97403 - AMDGPU/Iceland Strange warnings on drm-next-4.9-wip
Summary: AMDGPU/Iceland Strange warnings on drm-next-4.9-wip
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-18 22:10 UTC by Armin K
Modified: 2019-11-19 08:09 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
tool (1.47 MB, application/x-executable)
2016-11-14 09:44 UTC, rezhu
no flags Details
fix patch (1.73 KB, patch)
2016-11-15 09:09 UTC, rezhu
no flags Details | Splinter Review

Description Armin K 2016-08-18 22:10:44 UTC
4.9-wip branch should have proper DPM support for Iceland. However, this doesn't seem to work. I can't manually force "high" performance by issuing:

# echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level 
bash: echo: write error: Invalid argument

dmesg has one weird note, and that is:

[   20.998755] VI should always have 2 performance levels

Full dmesg output related to amdgpu:

[   15.084442] [drm] amdgpu kernel modesetting enabled.
[   15.084452] vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
[   15.084530] ATPX version 1, functions 0x00000003
[   15.084576] ATPX Hybrid Graphics
[   15.391373] CRAT table not found
[   15.391375] Finished initializing topology ret=0
[   15.391388] kfd kfd: Initialized module
[   15.391676] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
[   15.391869] [drm] initializing kernel modesetting (TOPAZ 0x1002:0x6900 0x103C:0x811C 0x83).
[   15.391879] [drm] register mmio base: 0xE2000000
[   15.391879] [drm] register mmio size: 262144
[   15.391883] [drm] doorbell mmio base: 0xE0000000
[   15.391884] [drm] doorbell mmio size: 2097152
[   15.391891] [drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
[   15.391892] [drm] probing mlw for device 8086:9d10 = 1724843
[   15.391903] vga_switcheroo: enabled
[   15.395046] ATOM BIOS: HP/Quanta
[   15.395060] [drm] GPU not posted. posting now...
[   15.398467] [drm] Changing default dispclk from 0Mhz to 600Mhz
[   15.450307] iwlwifi 0000:03:00.0: loaded firmware version 22.361476.0 op_mode iwlmvm
[   15.549634] [TTM] Zone  kernel: Available graphics memory: 4027938 kiB
[   15.549636] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[   15.549637] [TTM] Initializing pool allocator
[   15.549640] [TTM] Initializing DMA pool allocator
[   15.549655] amdgpu 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[   15.549657] amdgpu 0000:01:00.0: GTT: 3933M 0x0000000080000000 - 0x0000000175D887FF
[   15.549658] [drm] Detected VRAM RAM=2048M, BAR=256M
[   15.549659] [drm] RAM width 64bits DDR3
[   15.549669] [drm] amdgpu: 2048M of VRAM memory ready
[   15.549669] [drm] amdgpu: 3933M of GTT memory ready.
[   15.549681] [drm] GART: num cpu pages 1006984, num gpu pages 1006984
[   15.550554] [drm] PCIE GART of 3933M enabled (table at 0x0000000000040000).
[   15.550580] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   15.550580] [drm] Driver supports precise vblank timestamp query.
[   15.550614] amdgpu 0000:01:00.0: amdgpu: using MSI.
[   15.550639] [drm] amdgpu: irq initialized.
[   15.559808] amdgpu: powerplay initialized
[   15.686674] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000010, cpu addr 0xffff880231108010
[   15.686788] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000020, cpu addr 0xffff880231108020
[   15.686831] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000030, cpu addr 0xffff880231108030
[   15.686863] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000040, cpu addr 0xffff880231108040
[   15.686881] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000050, cpu addr 0xffff880231108050
[   15.686897] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000080000060, cpu addr 0xffff880231108060
[   15.686915] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000080000070, cpu addr 0xffff880231108070
[   15.686930] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000080000080, cpu addr 0xffff880231108080
[   15.686947] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000080000090, cpu addr 0xffff880231108090
[   15.773983] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000800000a0, cpu addr 0xffff8802311080a0
[   15.774017] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000000800000b0, cpu addr 0xffff8802311080b0
[   15.835172] [drm] ring test on 0 succeeded in 10 usecs
[   15.835361] [drm] ring test on 1 succeeded in 14 usecs
[   15.835383] [drm] ring test on 2 succeeded in 11 usecs
[   15.835389] [drm] ring test on 3 succeeded in 2 usecs
[   15.835392] [drm] ring test on 4 succeeded in 1 usecs
[   15.835397] [drm] ring test on 5 succeeded in 2 usecs
[   15.835403] [drm] ring test on 6 succeeded in 2 usecs
[   15.835408] [drm] ring test on 7 succeeded in 2 usecs
[   15.835411] [drm] ring test on 8 succeeded in 1 usecs
[   15.835445] [drm] ring test on 9 succeeded in 4 usecs
[   15.835449] [drm] ring test on 10 succeeded in 3 usecs
[   15.835649] [drm] ib test on ring 0 succeeded
[   15.835829] [drm] ib test on ring 1 succeeded
[   15.835939] [drm] ib test on ring 2 succeeded
[   15.836043] [drm] ib test on ring 3 succeeded
[   15.836153] [drm] ib test on ring 4 succeeded
[   15.836256] [drm] ib test on ring 5 succeeded
[   15.836274] [drm] ib test on ring 6 succeeded
[   15.836291] [drm] ib test on ring 7 succeeded
[   15.836307] [drm] ib test on ring 8 succeeded
[   15.836321] [drm] ib test on ring 9 succeeded
[   15.836333] [drm] ib test on ring 10 succeeded
[   15.838598] [drm] Initialized amdgpu 3.3.0 20150101 for 0000:01:00.0 on minor 1
[   20.998755] VI should always have 2 performance levels

Contents of various pp_* files from /sys/class/drm/card1/device

# cat power_dpm_force_performance_level 
off

# cat power_dpm_state 
performance

# cat pp_cur_state 
0

# cat pp_dpm_mclk 
0: 300Mhz 
1: 600Mhz 
2: 1000Mhz

# cat pp_dpm_pcie
0: 2.5GB, x8 
1: 2.5GB, x8 
2: 8.0GB, x16 
3: 8.0GB, x16 
4: 8.0GB, x16 
5: 8.0GB, x16

# cat pp_dpm_sclk
0: 300Mhz 
1: 551Mhz 
2: 678Mhz 
3: 754Mhz 
4: 810Mhz 
5: 867Mhz 
6: 943Mhz 
7: 1021Mhz

# cat pp_force_state
(empty, not the actual output)

# cat pp_mclk_od
0

# cat pp_num_states
states: 3
0 boot
1 performance
2 battery

# cat pp_sclk_od
0
Comment 1 Armin K 2016-08-18 22:15:31 UTC
Correction: DPM does seen to work, as indicated in performance increase in Talos Principle (17 FPS on 4.8 to 44 FPS on 4.9), but the rendering is broken.

Still, message about VI performance levels is a bit confusing. I assume HP's atombios has different perf levels. It would be nice if that would be fixed.
Comment 2 rezhu 2016-11-14 09:44:16 UTC
Created attachment 127964 [details]
tool

save the atombios by
atiflash -s 0 file.name
Comment 3 rezhu 2016-11-14 09:45:46 UTC
Hi Armin K,

Can you help to attach hp's atombios?
so i can test on my end.
Comment 4 Armin K 2016-11-14 15:31:09 UTC
It says "Adapter not found."
Comment 5 rezhu 2016-11-15 05:23:13 UTC
sorry, maybe the adaptor number is not 0 in your machine.
can you just try the command:
./atiflash -ai

on my end: the result is 
/home# ./atiflash -ai
Adapter  0    (BN=01, DN=00, PCIID=69011002, SSID=01341002)
Asic Family        :  Iceland        
 
if you can get the adapter number,
then try to save the atom bios by
./atiflash -s number file.name
Comment 6 rezhu 2016-11-15 09:09:35 UTC
Created attachment 127977 [details] [review]
fix patch

the attached patch is for the warning message.
please help to verify.

thanks.
Comment 7 rezhu 2016-11-15 09:12:56 UTC
when run Talos Principle

can you cat the pm info by command:

cat /sys/kernel/debug/dri/64/amdgpu_pm_info
Comment 8 Armin K 2016-11-15 16:53:30 UTC
(In reply to rezhu from comment #5)
> sorry, maybe the adaptor number is not 0 in your machine.
> can you just try the command:
> ./atiflash -ai
> 
> on my end: the result is 
> /home# ./atiflash -ai
> Adapter  0    (BN=01, DN=00, PCIID=69011002, SSID=01341002)
> Asic Family        :  Iceland        
>  
> if you can get the adapter number,
> then try to save the atom bios by
> ./atiflash -s number file.name

Hm, I'm currently on 4.8.6, does that matter? 4.9 is unusable right now, see #98417
Comment 9 Armin K 2016-11-15 16:56:45 UTC
(In reply to rezhu from comment #5)
> sorry, maybe the adaptor number is not 0 in your machine.
> can you just try the command:
> ./atiflash -ai
> 
> on my end: the result is 
> /home# ./atiflash -ai
> Adapter  0    (BN=01, DN=00, PCIID=69011002, SSID=01341002)
> Asic Family        :  Iceland        
>  
> if you can get the adapter number,
> then try to save the atom bios by
> ./atiflash -s number file.name

OK, now I feel stupid. Your command is ran as root, I wasn't doing that. However, there's another problem now:

# ./atiflash -ai
Adapter  0    (BN=01, DN=00, PCIID=69001002, SSID=811C103C)
    Asic Family        :  Iceland        
    Flash Type         :  R600 SPI    (64 KB)
    No VBIOS

# ./atiflash -s 0 hpatombios.bin
Failed to read ROM
Comment 10 Armin K 2016-11-25 13:48:30 UTC
[  139.811192] cant't get the mac of 5 
[  139.819526] [drm] ring test on 0 succeeded in 10 usecs
[  139.819729] [drm] ring test on 1 succeeded in 13 usecs
[  139.819749] [drm] ring test on 2 succeeded in 10 usecs
[  139.819757] [drm] ring test on 3 succeeded in 3 usecs
[  139.819760] [drm] ring test on 4 succeeded in 1 usecs
[  139.819765] [drm] ring test on 5 succeeded in 2 usecs
[  139.819769] [drm] ring test on 6 succeeded in 1 usecs
[  139.819774] [drm] ring test on 7 succeeded in 2 usecs
[  139.819779] [drm] ring test on 8 succeeded in 2 usecs
[  139.819812] [drm] ring test on 9 succeeded in 4 usecs
[  139.819816] [drm] ring test on 10 succeeded in 3 usecs
[  146.992737] VI should always have 2 performance levels
[  147.448605] amdgpu 0000:01:00.0: GPU pci config reset
[  151.265505] [drm] PCIE GART of 3931M enabled (table at 0x0000000000040000).

In addition to original report, two more "strange" warnings now appear:

[  139.811192] cant't get the mac of 5 
[  147.448605] amdgpu 0000:01:00.0: GPU pci config reset

Not sure if second one is a warning or just information (intended).

I have gotten amdgpu to work on 4.9 by building it into kernel, along with all the firmware. atiflash still doesn't work. glxgears with vblank_mode=0 has broken rendering and hangs the system, requiring hard reboot. When synced to vblank, it doesn't hang anything.
Comment 11 Armin K 2016-11-25 13:49:24 UTC
Forgot to add: Information from previous comment appears every time the GPU powers back up at runtime, as well as at boot before it powers down.
Comment 12 Martin Peres 2019-11-19 08:09:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/91.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.