Bug 105515

Summary: hw_init of IP block <uvd_v6_0> failed
Product: DRI Reporter: Edward Kigwana <edwardwwgk>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg
none
dmesg none

Description Edward Kigwana 2018-03-15 03:17:23 UTC
Card is an MSI RX550 AERO IX OC. I have never gotten it to work with the drm driver. I can configure EFI for PEG and as along as I black list amdgpu I get the basic EFI framebuffer. Ass soon as I install the module the screen goes black. I can reboot using keyboard. With IGD, same story though I can remove the module and keep trying though parameter settings are not well documented. E.g What IP blocks can I disable? As for the powerplay messages, what is message 154 or even 134?

<M> AMD GPU
[*]   Enable amdgpu support for SI parts
[*]   Enable amdgpu support for CIK parts
[*]   Always enable userptr write support
[ ]   Allow GART access through debugfs
ACP (Audio CoProcessor) Configuration  --->
[*] Enable AMD Audio CoProcessor IP support
    Display Engine Configuration  --->
        [ ] AMD DC - Enable new display engine
        [*] DC support for Polaris and older ASICs
    AMD Library routines  --->
    [ ] Closed hash table performance statistics
    [ ] Closed hash table self test
<M> HSA kernel driver for AMD GPU devices
----------------------------------------------------------------
[    2.540547] fb: switching to inteldrmfb from EFI VGA
[    2.540564] Console: switching to colour dummy device 80x25
[    2.540613] [drm] Replacing VGA console driver
[    2.541103] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.541104] [drm] Driver supports precise vblank timestamp query.
[    2.541476] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    2.541754] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    2.551488] [drm] amdgpu kernel modesetting enabled.
[    2.552796] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    2.552796] AMD IOMMUv2 functionality not available on this system
[    2.553582] nct6775: Found NCT6793D or compatible chip at 0x4e:0xa20
[    2.558510] CRAT table not found
[    2.558511] Virtual CRAT table created for CPU
[    2.558511] Parsing CRAT table with 1 nodes
[    2.558512] Creating topology SYSFS entries
[    2.558519] Topology: Add CPU node
[    2.558519] Finished initializing topology
[    2.558554] kfd kfd: Initialized module
[    2.558695] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    2.558819] [drm] initializing kernel modesetting (POLARIS12 0x1002:0x699F 0x1462:0x8A90 0xC7).
[    2.558826] [drm] register mmio base: 0xDF300000
[    2.558826] [drm] register mmio size: 262144
[    2.558833] [drm] probing gen 2 caps for device 8086:1901 = 261ad03/e
[    2.558834] [drm] probing mlw for device 8086:1901 = 261ad03
[    2.558843] [drm] UVD is enabled in VM mode
[    2.558843] [drm] UVD ENC is enabled in VM mode
[    2.558845] [drm] VCE enabled in VM mode
[    2.819863] [drm] failed to retrieve link info, disabling eDP
[    3.461721] ATOM BIOS: 113-d09002-h01
[    3.461747] [drm] GPU posting now...
[    3.488389] scsi 4:0:0:0: Direct-Access     SanDisk  SanDisk Cruzer   8.02 PQ: 0 ANSI: 0 CCS
[    3.488457] sd 4:0:0:0: Attached scsi generic sg0 type 0
[    3.488720] sd 4:0:0:0: [sda] 31854591 512-byte logical blocks: (16.3 GB/15.2 GiB)
[    3.488829] sd 4:0:0:0: [sda] Write Protect is off
[    3.488831] sd 4:0:0:0: [sda] Mode Sense: 45 00 00 08
[    3.488947] sd 4:0:0:0: [sda] No Caching mode page found
[    3.488948] sd 4:0:0:0: [sda] Assuming drive cache: write through
[    3.489856] random: crng init done
[    3.491902]  sda: sda1
[    3.492574] sd 4:0:0:0: [sda] Attached SCSI removable disk
[    3.578449] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    3.578457] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    3.578458] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    3.578464] [drm] Detected VRAM RAM=4096M, BAR=256M
[    3.578465] [drm] RAM width 128bits GDDR5
[    3.578551] [TTM] Zone  kernel: Available graphics memory: 8119498 kiB
[    3.578551] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    3.578551] [TTM] Initializing pool allocator
[    3.578554] [TTM] Initializing DMA pool allocator
[    3.578565] [drm] amdgpu: 4096M of VRAM memory ready
[    3.578566] [drm] amdgpu: 4096M of GTT memory ready.
[    3.578610] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    3.578664] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[    3.578696] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.578697] [drm] Driver supports precise vblank timestamp query.
[    3.578839] [drm] AMDGPU Display Connectors
[    3.578839] [drm] Connector 0:
[    3.578840] [drm]   DP-1
[    3.578840] [drm]   HPD5
[    3.578841] [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
[    3.578841] [drm]   Encoders:
[    3.578841] [drm]     DFP1: INTERNAL_UNIPHY1
[    3.578841] [drm] Connector 1:
[    3.578842] [drm]   HDMI-A-3
[    3.578842] [drm]   HPD3
[    3.578843] [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
[    3.578843] [drm]   Encoders:
[    3.578843] [drm]     DFP2: INTERNAL_UNIPHY1
[    3.578843] [drm] Connector 2:
[    3.578843] [drm]   DVI-D-1
[    3.578844] [drm]   HPD4
[    3.578844] [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
[    3.578844] [drm]   Encoders:
[    3.578845] [drm]     DFP3: INTERNAL_UNIPHY
[    3.580932] [drm] Chained IB support enabled!
[    3.582311] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    3.582970] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[    3.610378] EXT4-fs (nvme0n1p3): re-mounted. Opts: (null)
[    3.661366] Adding 1048572k swap on /dev/nvme0n1p4.  Priority:-2 extents:1 across:1048572k SSFS
[    3.685639] EXT4-fs (nvme0n1p5): mounted filesystem with ordered data mode. Opts: (null)
[    3.688379] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null)
[    3.690692] EXT4-fs (nvme0n1p7): mounted filesystem with ordered data mode. Opts: (null)
[    3.696429] EXT4-fs (nvme1n1p1): mounted filesystem with ordered data mode. Opts: (null)
[    4.063178] amdgpu: [powerplay]
                failed to send message 154 ret is 0
[    4.279642] [drm:uvd_v6_0_enc_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 13 test failed
[    4.279657] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <uvd_v6_0> failed -110
[    4.279658] amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
[    4.642250] amdgpu: [powerplay]
                failed to send pre message 133 ret is 0
[    4.642834] amdgpu: [powerplay] DPM is not running right now, no need to disable DPM!
[    4.734787] amdgpu 0000:01:00.0: 000000004bfd450f unpin not necessary
[    4.734790] amdgpu 0000:01:00.0: 000000005674b436 unpin not necessary
[    4.734919] [TTM] Finalizing pool allocator
[    4.734921] [TTM] Finalizing DMA pool allocator
[    4.734959] [TTM] Zone  kernel: Used memory at exit: 0 kiB
[    4.734960] [TTM] Zone   dma32: Used memory at exit: 0 kiB
[    4.734961] [drm] amdgpu: ttm finalized
[    4.734963] amdgpu 0000:01:00.0: Fatal error during GPU init
[    4.734963] [drm] amdgpu: finishing device.
[    4.735208] amdgpu: probe of 0000:01:00.0 failed with error -110
Comment 1 Edward Kigwana 2018-03-15 03:18:06 UTC
Linux version 4.16.0-rc5 (root@i7-tower) (gcc version 7.3.0 (Gentoo 7.3.0 p1.0)) #5 SMP Thu Mar 15 02:57:39 UTC 2018
Comment 2 Edward Kigwana 2018-03-15 06:56:53 UTC
After lots of wrangling, I got it to work with the following module options:

options amdgpu si_support=0 cik_support=0 msi=1 exp_hw_support=0 ppfeaturemask=0 dpm=0 powerplay=0

As a side not adding si support results in guaranteed panic / lockup, not some code gets called when it should not. cik support has no impact. I had to add the last two to finally get into X. So something is clearly still busted.
Comment 3 Alex Deucher 2018-03-15 12:46:48 UTC
Please attach your full dmesg output.
Comment 4 Edward Kigwana 2018-03-16 05:17:04 UTC
Created attachment 138150 [details]
dmesg

Full dmesg output
Comment 5 Edward Kigwana 2018-03-16 05:24:00 UTC
With options amdgpu dpm=0 I can at leave get something. If I do not provide any arguments, the system locks up instantly so without a serial console it is going to be a a beast to capture that.

If you need it, I can try the options that follow since they seemed to work but they did not cause a lock up so I am afraid the options really masked the core issue.

options msi=1 exp_hw_support=0 ppfeaturemask=0
Comment 6 Alex Deucher 2018-03-16 18:58:57 UTC
(In reply to Edward Kigwana from comment #5)
> options msi=1 exp_hw_support=0 ppfeaturemask=0

The msi and exp_hw_support shouldn't have any effect in your case and ppfeaturemask is pretty much irrelevant if dpm is disabled.
Comment 7 Alex Deucher 2018-03-16 19:02:57 UTC
Does disabling the Intel gfx chip or booting with the AMD card as the primary help?  Can you also try booting with amdgpu.dc=1?
Comment 8 Edward Kigwana 2018-03-17 02:59:44 UTC
(In reply to Alex Deucher from comment #7)
> Does disabling the Intel gfx chip or booting with the AMD card as the
> primary help?  Can you also try booting with amdgpu.dc=1?

options amdgpu dpm=0 dc=1 dc_log=1

Still boots and the AMD GPU comes up. I have the AMD GPU is the first card.
Comment 9 Edward Kigwana 2018-03-17 03:10:30 UTC
Created attachment 138164 [details]
dmesg

This is with options amdgpu dpm=0 dc=1 dc_log=1

[drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <amdgpu_powerplay> failed 52428
Comment 10 Edward Kigwana 2018-03-28 01:09:58 UTC
Not sure what commit fixed it for me but with amd-staging-drm-next 525b7b1e13c3214f04c9ab4d72c88f55a7bd4288 I no longer have this issue.


[    2.285918] [drm] initializing kernel modesetting (POLARIS12 0x1002:0x699F 0x1462:0x8A90 0xC7).
[    2.285925] [drm] register mmio base: 0xDF300000
[    2.285925] [drm] register mmio size: 262144
[    2.285929] [drm] probing gen 2 caps for device 8086:1901 = 261ad03/e
[    2.285930] [drm] probing mlw for device 8086:1901 = 261ad03
[    2.285931] [drm] add ip block number 0 <vi_common>
[    2.285932] [drm] add ip block number 1 <gmc_v8_0>
[    2.285933] [drm] add ip block number 2 <tonga_ih>
[    2.285933] [drm] add ip block number 3 <powerplay>
[    2.285934] [drm] add ip block number 4 <dce_v11_0>
[    2.285935] [drm] add ip block number 5 <gfx_v8_0>
[    2.285935] [drm] add ip block number 6 <sdma_v3_0>
[    2.285936] [drm] add ip block number 7 <uvd_v6_0>
[    2.285936] [drm] add ip block number 8 <vce_v3_0>
[    2.285942] [drm] UVD is enabled in VM mode
[    2.285943] [drm] UVD ENC is enabled in VM mode
[    2.285944] [drm] VCE enabled in VM mode
...

As before enabling dpm results in an instant lockup that I can't even begin to debug since the kernel locks up and I don't have a serial console.
Comment 11 Edward Kigwana 2018-03-28 01:11:25 UTC
Created attachment 138387 [details]
dmesg

Shows successful load of amdgpu driver for polaris 12 card.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.