Bug 108778

Summary: [R9 390] amdgpu: Fatal error during GPU init 4.20-rc2
Product: DRI Reporter: Garth Theisen <garththeisen>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED DUPLICATE QA Contact:
Severity: major    
Priority: high CC: erhard_f
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output
none
dmesg (4.19.2, non-working)
none
dmesg (4.18.19, working) none

Description Garth Theisen 2018-11-17 16:33:12 UTC
Created attachment 142497 [details]
dmesg output

Failure on R9 390-amdgpu GPU init
 - 4.20-rc2, 
 - libdrm-git, 
 - amdgpu-git, 
 - mesa-git, 
 - linux-firmware-git ...
 with DC=1

It should be noted for my hardware the 4.19.x fails to init also, 4.18 is usable. 

[    3.463736] ATOM BIOS: 113-GRENADA_PRO_C671_D5_8GB_HY_W81
[    3.463819] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    3.476417] amdgpu 0000:01:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[    3.476422] amdgpu 0000:01:00.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    3.476439] [drm] Detected VRAM RAM=8192M, BAR=256M
[    3.476441] [drm] RAM width 512bits GDDR5
[    3.476559] [TTM] Zone  kernel: Available graphics memory: 8202016 kiB
[    3.476562] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    3.476564] [TTM] Initializing pool allocator
[    3.476574] [TTM] Initializing DMA pool allocator
[    3.476692] [drm] amdgpu: 8192M of VRAM memory ready
[    3.476696] [drm] amdgpu: 8192M of GTT memory ready.
[    3.476753] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.478162] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[    3.484296] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[    3.485706] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    3.485880] [drm] enabling PCIE gen 3 link speeds, disable with amdgpu.pcie_gen2=0
[    3.517143] random: alsactl: uninitialized urandom read (4 bytes read)
[    3.517146] random: alsactl: uninitialized urandom read (4 bytes read)
[    3.768776] EXT4-fs (sda3): re-mounted. Opts: (null)
[    3.869740] random: ln: uninitialized urandom read (6 bytes read)
[    3.940417] Adding 4200992k swap on /dev/sda2.  Priority:-2 extents:1 across:4200992k SS
[    4.008721] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[    4.012836] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[    4.020803] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[    4.612629] urandom_read: 3 callbacks suppressed
[    4.612631] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[    4.640561] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[    4.718632] [2759]: Watching system buttons on /dev/input/event1 (Power Button)
[    4.718741] [2759]: Watching system buttons on /dev/input/event0 (Power Button)
[    4.929195] random: crng init done
[    5.026217] ip (2903) used greatest stack depth: 12064 bytes left
[    5.308183] amdgpu: [powerplay] 
                failed to send message 136 ret is 0
[    5.718431] amdgpu: [powerplay] 
                failed to send message 53 ret is 0
[    6.059350] RTL8211E Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[    6.130924] amdgpu: [powerplay] 
                failed to send message 169 ret is 0
[    6.251781] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[    6.252150] ip (3332) used greatest stack depth: 10712 bytes left
[    6.544459] amdgpu: [powerplay] 
                failed to send message 185 ret is 0
[    6.956332] amdgpu: [powerplay] 
                failed to send message 187 ret is 0
[    7.368232] amdgpu: [powerplay] 
                failed to send message 188 ret is 0
[    7.368486] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    7.368503] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    7.368520] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    7.377726] [drm] Display Core initialized with v3.1.68!
[    7.406619] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    7.406621] [drm] Driver supports precise vblank timestamp query.
[    8.442227] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    8.512170] r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
[    8.512180] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    9.462721] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   10.483198] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   11.503676] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   12.524153] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   13.544631] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   14.565113] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   15.585593] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   16.606083] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   17.626567] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   17.646688] [drm:uvd_v4_2_start [amdgpu]] *ERROR* UVD not responding, giving up!!!
[   17.646803] [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <uvd_v4_2> failed -1
[   18.136785] amdgpu: [powerplay] 
                failed to send message 12d ret is 0
[   18.626654] amdgpu: [powerplay] 
                failed to send message 154 ret is 0
[   18.902403] [drm:uvd_v4_2_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 11 test failed (0xCAFEDEAD)
[   18.902506] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <uvd_v4_2> failed -22
[   18.902509] amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
[   18.902513] amdgpu 0000:01:00.0: Fatal error during GPU init
[   18.902524] [drm] amdgpu: finishing device.
[   19.363103] amdgpu: [powerplay] 
                failed to send message 133 ret is 0
[   19.363109] amdgpu: [powerplay] VI should always have 2 performance levels
[   19.823536] amdgpu: [powerplay] 
                failed to send message 148 ret is 0
[   20.283856] amdgpu: [powerplay] 
                failed to send message 145 ret is 0
[   20.744201] amdgpu: [powerplay] 
                failed to send message 146 ret is 0
[   21.204321] amdgpu: [powerplay] 
                failed to send message 16a ret is 0
[   21.664412] amdgpu: [powerplay] 
                failed to send message 186 ret is 0
[   22.124528] amdgpu: [powerplay] 
                failed to send message 54 ret is 0
[   22.584785] amdgpu: [powerplay] 
                failed to send message 13d ret is 0
[   23.045082] amdgpu: [powerplay] 
                failed to send message 14f ret is 0
[   23.505278] amdgpu: [powerplay] 
                failed to send message 151 ret is 0
[   23.965423] amdgpu: [powerplay] 
                failed to send message 135 ret is 0
[   24.425701] amdgpu: [powerplay] 
                failed to send message 190 ret is 0
[   24.885861] amdgpu: [powerplay] 
                failed to send message 63 ret is 0
[   25.346129] amdgpu: [powerplay] 
                failed to send message 84 ret is 0
[   25.346177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000d71
[   25.346183] PGD 0 P4D 0 
[   25.346188] Oops: 0002 [#1] SMP NOPTI
[   25.346191] CPU: 0 PID: 1377 Comm: kworker/0:5 Not tainted 4.20.0-rc2 #1
[   25.346195] Hardware name: System manufacturer System Product Name/M5A88-M, BIOS 1702    05/01/2013
[   25.346248] Workqueue: events amdgpu_uvd_idle_work_handler [amdgpu]
[   25.346303] RIP: 0010:smu7_powergate_uvd+0xe/0x90 [amdgpu]
[   25.346306] Code: 72 0d 00 00 00 e8 62 ff ff ff 48 89 df e8 9a ff ff ff 31 c0 5b c3 66 0f 1f 44 00 00 53 48 8b 87 a0 01 00 00 40 84 f6 48 89 fb <40> 88 b0 71 0d 00 00 75 41 e8 34 ff ff ff 48 8b 3b ba 01 00 00 00
[   25.346311] RSP: 0018:ffffc90001b33e50 EFLAGS: 00010202
[   25.346314] RAX: 0000000000000000 RBX: ffff88842321cc00 RCX: 0000000100000000
[   25.346317] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88842321cc00
[   25.346319] RBP: ffff88842321cc18 R08: 000073746e657665 R09: 8080808080808080
[   25.346322] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff88841c673cd0
[   25.346324] R13: 0000000000000000 R14: ffff888426a20480 R15: 0000000000000000
[   25.346327] FS:  0000000000000000(0000) GS:ffff888426a00000(0000) knlGS:0000000000000000
[   25.346331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.346333] CR2: 0000000000000d71 CR3: 0000000002422000 CR4: 00000000000406f0
[   25.346336] Call Trace:
[   25.346392]  pp_set_powergating_by_smu+0x68/0x220 [amdgpu]
[   25.346439]  amdgpu_dpm_enable_uvd+0x51/0x60 [amdgpu]
[   25.346445]  process_one_work+0x1e0/0x410
[   25.346448]  worker_thread+0x28/0x3c0
[   25.346452]  ? process_one_work+0x410/0x410
[   25.346455]  kthread+0x10e/0x130
[   25.346458]  ? kthread_park+0x80/0x80
[   25.346461]  ret_from_fork+0x22/0x40
[   25.346464] Modules linked in: amdgpu(+) mfd_core chash i2c_algo_bit gpu_sched fam15h_power k10temp drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea ttm drm drm_panel_orientation_quirks asus_atk0110 acpi_cpufreq
[   25.346477] CR2: 0000000000000d71
[   25.346480] ---[ end trace d53992dc38b48592 ]---
[   25.346533] RIP: 0010:smu7_powergate_uvd+0xe/0x90 [amdgpu]
[   25.346536] Code: 72 0d 00 00 00 e8 62 ff ff ff 48 89 df e8 9a ff ff ff 31 c0 5b c3 66 0f 1f 44 00 00 53 48 8b 87 a0 01 00 00 40 84 f6 48 89 fb <40> 88 b0 71 0d 00 00 75 41 e8 34 ff ff ff 48 8b 3b ba 01 00 00 00
[   25.346541] RSP: 0018:ffffc90001b33e50 EFLAGS: 00010202
[   25.346543] RAX: 0000000000000000 RBX: ffff88842321cc00 RCX: 0000000100000000
[   25.346546] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88842321cc00
[   25.346549] RBP: ffff88842321cc18 R08: 000073746e657665 R09: 8080808080808080
[   25.346551] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff88841c673cd0
[   25.346554] R13: 0000000000000000 R14: ffff888426a20480 R15: 0000000000000000
[   25.346557] FS:  0000000000000000(0000) GS:ffff888426a00000(0000) knlGS:0000000000000000
[   25.346560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.346562] CR2: 0000000000000d71 CR3: 0000000002422000 CR4: 00000000000406f0
Comment 1 erhard_f 2018-11-18 16:39:59 UTC
Created attachment 142506 [details]
dmesg (4.19.2, non-working)

Same here on kernel 4.19.2 with a R9 390. The other 2 cards in my system (R9 Nano) are working fine.

inxi -G
Graphics:  Card-1: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series]
           Card-2: Advanced Micro Devices [AMD/ATI] Hawaii PRO [Radeon R9 290/390]
           Card-3: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series]
           Display Server: X.Org 1.20.0
           drivers: ati,amdgpu (unloaded: modesetting,radeon)
           Resolution: 1280x800@60.00hz
           OpenGL: renderer: llvmpipe (LLVM 6.0, 128 bits)
           version: 3.3 Mesa 18.2.4

This is a regression, as the R9 390 works fine on kernel 4.18.19.
Comment 2 erhard_f 2018-11-18 16:40:32 UTC
Created attachment 142507 [details]
dmesg (4.18.19, working)
Comment 3 Alex Deucher 2018-11-19 14:47:24 UTC
Possibly the same issue as bug 108704.  Does the patch there help?
Comment 4 Garth Theisen 2018-11-19 18:19:29 UTC
After applying the patch 4.19.2 boots without the Init failure.

There seems to be some additional issues present in the dmesg log after the patch ... but this might be occurring for the 4.18.x series also.

[    3.705820] amdgpu: [powerplay] Failed to retrieve minimum clocks.
[    3.705821] amdgpu: [powerplay] Error in phm_get_clock_info 
[    3.705922] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.705932] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.705943] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.720386] [drm] Display Core initialized with v3.1.59!
[    3.748551] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.748552] [drm] Driver supports precise vblank timestamp query.
[    3.786502] [drm] UVD initialized successfully.
[    3.907523] [drm] VCE initialized successfully.
[    3.911228] [drm] fb mappable at 0xD0BD0000
[    3.911230] [drm] vram apper at 0xD0000000
[    3.911232] [drm] size 8294400
[    3.911233] [drm] fb depth is 24
[    3.911234] [drm]    pitch is 7680
[    3.911401] fbcon: amdgpudrmfb (fb0) is primary device
[    3.937471] Console: switching to colour frame buffer device 240x67
[    3.942828] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[    3.949554] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 0
Comment 5 Garth Theisen 2018-11-19 18:31:47 UTC
Reviewing my 4.18.18 log, the dce110_link_encoder_construct is present, but not 'amdgpu: [powerplay] Failed to retrieve minimum clocks.'.

[    3.314952] [drm] amdgpu: 8192M of VRAM memory ready
[    3.314954] [drm] amdgpu: 8192M of GTT memory ready.
[    3.314967] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.315973] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[    3.317330] [drm] Internal thermal controller with fan control
[    3.343843] random: alsactl: uninitialized urandom read (4 bytes read)
[    3.343850] random: alsactl: uninitialized urandom read (4 bytes read)
[    3.347055] [drm] Invalid PCC GPIO: 13!
[    3.347057] [drm] amdgpu: dpm initialized
[    3.356305] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[    3.357433] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    3.357629] [drm] PCIE gen 2 link speeds already enabled
[    3.365448] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.365462] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.365475] [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[    3.377133] [drm] Display Core initialized with v3.1.44!
[    3.403803] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.403805] [drm] Driver supports precise vblank timestamp query.
[    3.442064] [drm] UVD initialized successfully.
[    3.570052] [drm] VCE initialized successfully.
[    3.572268] [drm] fb mappable at 0xD0BD0000
[    3.572269] [drm] vram apper at 0xD0000000
[    3.572270] [drm] size 8294400
[    3.572271] [drm] fb depth is 24
[    3.572271] [drm]    pitch is 7680
[    3.572350] fbcon: amdgpudrmfb (fb0) is primary device
[    3.576367] [drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 148500
[    3.593179] Console: switching to colour frame buffer device 240x67
[    3.599360] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[    3.607152] [drm] Initialized amdgpu 3.26.0 20150101 for 0000:01:00.0 on minor 0

(In reply to Garth Theisen from comment #4)
> After applying the patch 4.19.2 boots without the Init failure.
> 
> There seems to be some additional issues present in the dmesg log after the
> patch ... but this might be occurring for the 4.18.x series also.
> 
> [    3.705820] amdgpu: [powerplay] Failed to retrieve minimum clocks.
> [    3.705821] amdgpu: [powerplay] Error in phm_get_clock_info 
> [    3.705922] [drm] dce110_link_encoder_construct: Failed to get
> encoder_cap_info from VBIOS with error code 4!
> [    3.705932] [drm] dce110_link_encoder_construct: Failed to get
> encoder_cap_info from VBIOS with error code 4!
> [    3.705943] [drm] dce110_link_encoder_construct: Failed to get
> encoder_cap_info from VBIOS with error code 4!
> [    3.720386] [drm] Display Core initialized with v3.1.59!
> [    3.748551] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [    3.748552] [drm] Driver supports precise vblank timestamp query.
> [    3.786502] [drm] UVD initialized successfully.
> [    3.907523] [drm] VCE initialized successfully.
> [    3.911228] [drm] fb mappable at 0xD0BD0000
> [    3.911230] [drm] vram apper at 0xD0000000
> [    3.911232] [drm] size 8294400
> [    3.911233] [drm] fb depth is 24
> [    3.911234] [drm]    pitch is 7680
> [    3.911401] fbcon: amdgpudrmfb (fb0) is primary device
> [    3.937471] Console: switching to colour frame buffer device 240x67
> [    3.942828] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
> [    3.949554] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on
> minor 0
Comment 6 Alex Deucher 2018-11-19 19:12:33 UTC
(In reply to Garth Theisen from comment #5)
> Reviewing my 4.18.18 log, the dce110_link_encoder_construct is present, but
> not 'amdgpu: [powerplay] Failed to retrieve minimum clocks.'.

Those are harmless.
Comment 7 Alex Deucher 2018-11-19 19:12:53 UTC

*** This bug has been marked as a duplicate of bug 108704 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.