Bug 109206 - Kernel 4.20 amdgpu fails to load firmware on Ryzen 2500U
Summary: Kernel 4.20 amdgpu fails to load firmware on Ryzen 2500U
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-01 17:20 UTC by Gavin A.
Modified: 2019-11-16 12:27 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel 4.20 log showing firmware loading crash (174.75 KB, text/plain)
2019-01-01 17:25 UTC, Gavin A.
no flags Details
List of files in /lib/firmware/amdgpu (3.96 KB, text/plain)
2019-01-04 21:59 UTC, Gavin A.
no flags Details
Raven dmcu firmware (23.84 KB, application/macbinary)
2019-01-04 22:00 UTC, Gavin A.
no flags Details
Kernel log on 5.0.0-rc2 showing firmware loading crash (191.95 KB, text/plain)
2019-01-14 14:17 UTC, Nicola Orlando
no flags Details
logged when modprobe amdgpu is run (7.47 KB, text/plain)
2019-03-07 06:08 UTC, Adrian Garay
no flags Details
Journalctl output for kernel 5.0.5-200.fc29.x86_64 (17.28 KB, text/plain)
2019-04-08 13:18 UTC, Talha Khan
no flags Details
Kernel log 5.1.3 showing amdgpu drm crash (78.16 KB, text/plain)
2019-05-22 08:42 UTC, Ondrej Lang
no flags Details
attachment-8612-0.html (494 bytes, text/html)
2019-07-14 20:22 UTC, Michael Eagle
no flags Details
Journal boot from hp envy x360 cp0xxx (220.95 KB, text/plain)
2019-11-14 07:38 UTC, Luya Tshimbalanga
no flags Details

Description Gavin A. 2019-01-01 17:20:57 UTC
On Kernel 4.20.0-arch1, amdgpu psp fails to load firmware, crashing system during boot.  modprobe.blacklist=amdgpu allowed system to boot, without dm.  System is an HP Envy x360 Ryzen 5 2500U Laptop.

Jan 01 08:36:42 mimisbrunnr kernel: Linux agpgart interface v0.103
Jan 01 08:36:42 mimisbrunnr kernel: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] amdgpu kernel modesetting enabled.
Jan 01 08:36:42 mimisbrunnr kernel: Parsing CRAT table with 1 nodes
Jan 01 08:36:42 mimisbrunnr kernel: Ignoring ACPI CRAT on non-APU system
Jan 01 08:36:42 mimisbrunnr kernel: Virtual CRAT table created for CPU
Jan 01 08:36:42 mimisbrunnr kernel: Parsing CRAT table with 1 nodes
Jan 01 08:36:42 mimisbrunnr kernel: Creating topology SYSFS entries
Jan 01 08:36:42 mimisbrunnr kernel: Topology: Add CPU node
Jan 01 08:36:42 mimisbrunnr kernel: Finished initializing topology
Jan 01 08:36:42 mimisbrunnr kernel: checking generic (e0000000 7f0000) vs hw (e0000000 10000000)
Jan 01 08:36:42 mimisbrunnr kernel: fb0: switching to amdgpudrmfb from EFI VGA
Jan 01 08:36:42 mimisbrunnr kernel: Console: switching to colour dummy device 80x25
Jan 01 08:36:42 mimisbrunnr kernel: amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
Jan 01 08:36:42 mimisbrunnr kernel: [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x103C:0x83C6 0xC4).
Jan 01 08:36:42 mimisbrunnr kernel: [drm] register mmio base: 0xFE000000
Jan 01 08:36:42 mimisbrunnr kernel: [drm] register mmio size: 524288
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 0 <soc15_common>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 1 <gmc_v9_0>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 2 <vega10_ih>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 3 <psp>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 4 <gfx_v9_0>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 5 <sdma_v4_0>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 6 <powerplay>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 7 <dm>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] add ip block number 8 <vcn_v1_0>
Jan 01 08:36:42 mimisbrunnr kernel: [drm] VCN decode is enabled in VM mode
Jan 01 08:36:42 mimisbrunnr kernel: [drm] VCN encode is enabled in VM mode
Jan 01 08:36:42 mimisbrunnr kernel: [drm] VCN jpeg decode is enabled in VM mode
Jan 01 08:36:42 mimisbrunnr kernel: ATOM BIOS: SWBRT25890.001
Jan 01 08:36:42 mimisbrunnr kernel: [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Jan 01 08:36:42 mimisbrunnr kernel: amdgpu 0000:03:00.0: VRAM: 256M 0x000000F400000000 - 0x000000F40FFFFFFF (256M used)
Jan 01 08:36:42 mimisbrunnr kernel: amdgpu 0000:03:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Jan 01 08:36:42 mimisbrunnr kernel: amdgpu 0000:03:00.0: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Jan 01 08:36:42 mimisbrunnr kernel: [drm] Detected VRAM RAM=256M, BAR=256M
Jan 01 08:36:42 mimisbrunnr kernel: [drm] RAM width 128bits UNKNOWN
Jan 01 08:36:42 mimisbrunnr kernel: [TTM] Zone kernel: Available graphics memory: 3964154 kiB
Jan 01 08:36:42 mimisbrunnr kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB
Jan 01 08:36:42 mimisbrunnr kernel: [TTM] Initializing pool allocator
Jan 01 08:36:42 mimisbrunnr kernel: [TTM] Initializing DMA pool allocator
Jan 01 08:36:42 mimisbrunnr kernel: [drm] amdgpu: 256M of VRAM memory ready
Jan 01 08:36:42 mimisbrunnr kernel: [drm] amdgpu: 3072M of GTT memory ready.
Jan 01 08:36:42 mimisbrunnr kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Jan 01 08:36:42 mimisbrunnr kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
Jan 01 08:36:42 mimisbrunnr kernel: [drm] use_doorbell being set to: [true]
Jan 01 08:36:42 mimisbrunnr kernel: [drm] Found VCN firmware Version: 1.73 Family ID: 18
Jan 01 08:36:42 mimisbrunnr kernel: [drm] PSP loading VCN firmware
Jan 01 08:36:42 mimisbrunnr kernel: [drm] reserve 0x400000 from 0xf400b00000 for PSP TMR SIZE
Jan 01 08:36:43 mimisbrunnr kernel: [drm:psp_cmd_submit_buf [amdgpu]] *ERROR* failed loading with status (-65530) and ucode id (19)
Jan 01 08:36:43 mimisbrunnr kernel: [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
Jan 01 08:36:43 mimisbrunnr kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
Jan 01 08:36:43 mimisbrunnr kernel: amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
Jan 01 08:36:43 mimisbrunnr kernel: amdgpu 0000:03:00.0: Fatal error during GPU init
Jan 01 08:36:43 mimisbrunnr kernel: [drm] amdgpu: finishing device.
Jan 01 08:36:43 mimisbrunnr kernel: amdgpu 0000:03:00.0: 00000000c149df08 unpin not necessary
Comment 1 Gavin A. 2019-01-01 17:25:26 UTC
Created attachment 142935 [details]
Kernel 4.20 log showing firmware loading crash
Comment 2 David Francis 2019-01-04 15:08:35 UTC
Could you give me a list of all files in /lib/firmware/amdgpu/?
And if it contains a file called raven_dmcu.bin, please attach that file.
Comment 3 Gavin A. 2019-01-04 21:59:06 UTC
Created attachment 142978 [details]
List of files in /lib/firmware/amdgpu
Comment 4 Gavin A. 2019-01-04 22:00:00 UTC
Created attachment 142979 [details]
Raven dmcu firmware
Comment 5 Nicola Orlando 2019-01-14 14:17:00 UTC
Created attachment 143105 [details]
Kernel log on 5.0.0-rc2 showing firmware loading crash

I'm getting the exact same issue on 5.0.0 rc2; kernel log attached. Both the list of files in /lib/firmware/amdgpu/ and the raven_dmcu.bin are identical to the ones already attached.
Comment 6 Chris 2019-01-29 13:41:15 UTC
Still having the same bug in kernel 5.0-rc4 and my raven_dmcu.bin also matches to the previously attached bin file(sha256 a45972418d1c078afbd7884ffd58784954220759bc0d5464ce60165ffe1775bd).

However, in one of my tests I got it to work! So the time it worked I got lucky I somehow misplaced raven_dmcu.bin so when I rebooted the kernel 5.0rc4 and checked /lib/firmware/amdgpu/raven_dmcu.bin it wasn't there. As soon as I put raven_dmcu.bin back in the folder /lib/firmware/amdgpu/ I got this bug again. So the workaround is to delete raven_dmcu.bin from /lib/firmware/amdgpu for now until they fix it. Laptop seems to work just fine. I don't know what that firmware is for but it doesn't look like it's a required file for my HP x360 AMD Ryzen 2500U laptop
Comment 7 Gavin A. 2019-01-31 21:35:49 UTC
Can confirm that removing raven_dmcu.bin allows amdgpu to load.  Note that kernel 4.20 also requires iommu=pt kernel option.
Comment 8 edocod 2019-02-02 11:47:05 UTC
I encounter the same bug on my HP Envy x360, Ryzen 2500U.
Deleting raven_dcmu.bin and setting iommu=pt on kernel 4.20.5 still doesn't solve the bug.
Comment 9 Gavin A. 2019-02-02 16:32:59 UTC
Raven_dmcu.bin can get packed in the initramfs Image if amdgpu is loaded at boot.  Check the initramfs by running “lsinitcpio /boot/name of initramfs.img”.  If you see raven_dmcu.bin in the image from the previous command, remove raven_dmcu.bin from /lib/firmware then run “mkinitcpio -p linux”.
Comment 10 cd 2019-02-08 15:21:20 UTC
I also encounter this bug on my HP Envy x360 Ryzen 5 2500U.

What worked for me as a workaround was to first upgrade the kernel to 4.20.7, then remove "raven_dmcu.bin" and finally run "mkinitcpio -p linux".
Comment 11 Matrix 2019-02-13 01:48:09 UTC
This is the exact same issue I had and the above solution for removing /lib/firmware/amdgpu/raven_dmcu.bin worked for me. 

For Ubuntu Users on 4.20 you'll need to also run 

sudo update-initramfs -u

to update the bootimage. Thanks Chris.
Comment 12 Adrian Garay 2019-02-19 05:38:15 UTC
(In reply to matrix8967 from comment #11)
> This is the exact same issue I had and the above solution for removing
> /lib/firmware/amdgpu/raven_dmcu.bin worked for me. 
> 
> For Ubuntu Users on 4.20 you'll need to also run 
> 
> sudo update-initramfs -u
> 
> to update the bootimage. Thanks Chris.

Removing this file and updating my initramfs allows my 2500u HP x360 to actually boot to the desktop.  Unfortunately, the moment you try to load any game or even the Steam client, the laptop will hard lock.

The desktop is otherwise functional.
Comment 13 Zdenek 2019-02-19 11:26:19 UTC
I get hard lock during LibreOffice start after this workaround. Nothing interesting in logs can be found.
Comment 14 Harry Wentland 2019-02-19 19:27:31 UTC
I'd recommend updating the System BIOS.

Early BIOSes on HP Envy x360 (and possibly other Raven laptops) had trouble loading the DMCU FW.
Comment 15 Harry Wentland 2019-02-19 20:36:50 UTC
Can you see if this patch fixes it: https://patchwork.freedesktop.org/patch/277181/
Comment 16 Matrix 2019-02-20 05:06:25 UTC
Is it possible to upgrade the Bios from Linux? I think it's just a windows-only bios upgrade from HP right? Will it work in Freedos?
Comment 17 Harry Wentland 2019-02-20 21:07:25 UTC
I don't have that laptop but this link mentions a method to "Update the BIOS when Windows does not start": https://support.hp.com/ca-en/document/c00042629

I'd be more curious to know if the patch I posted fixes it.
Comment 18 JerryD 2019-02-22 17:33:48 UTC
(In reply to Adrian Garay from comment #12)
> (In reply to matrix8967 from comment #11)
> > This is the exact same issue I had and the above solution for removing
> > /lib/firmware/amdgpu/raven_dmcu.bin worked for me. 
> > 
> > For Ubuntu Users on 4.20 you'll need to also run 
> > 
> > sudo update-initramfs -u
> > 
> > to update the bootimage. Thanks Chris.
> 
> Removing this file and updating my initramfs allows my 2500u HP x360 to
> actually boot to the desktop.  Unfortunately, the moment you try to load any
> game or even the Steam client, the laptop will hard lock.
> 
> The desktop is otherwise functional.

Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1562530
Comment 19 Tim Carr 2019-02-22 21:27:37 UTC
(In reply to Harry Wentland from comment #15)
> Can you see if this patch fixes it:
> https://patchwork.freedesktop.org/patch/277181/


Wasn't able to get that patch to apply cleanly to either 4.20 or 5.0rc2. I did attempt to manually edit the function in the amdgpu_psp.c file according to the patch, but now have in my log:

"[drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3" 

I'm assuming I attempted to apply the patch to the wrong revision. Behavior for me is somewhat changed, before the patch I got about 4 lines of scrambled pixels about 4/5ths of the way down my screen on anything above kernel 4.19, with the patch applied to 5.0rc2 those lines appear for a couple of seconds and then the screen goes and remains totally blank.

I can attach the full log if you want, but considering I wasn't able to directly apply the patch and instead manually modified the "psp_cmd_submit_buf" function to warn instead of error and not return -EINVAL as described in the patch, I'd rather not clutter this with potentially bad log data. Should that patch be applied to drm-next instead of what is in the mainline kernel?
Comment 20 Gavin A. 2019-03-03 20:22:19 UTC
I changed my kernel to the linux-amd-staging-drm-next-git which already has patch applied.  System boots, X starts fine, and I am able to run intensive 3d app without any performance/rendering issues, no special kernel commandline required.  I get same error message as comment #19, but no “scrambled pixels.”

NB For users with HP Envy x360 15m-bq1**** model laptops, the latest BIOS is F.20 which is not listed in the downloads section of HP support site, but rather under Advisories, https://support.hp.com/us-en/product/hp-envy-15m-bq100-x360-convertible-pc/16851057/model/18967057/document/c06219875 .

NB2 For system stability, I added kernel option idle=nomwait which seems to have made the laptop far less crashy, YMMV.
Comment 21 Tim Carr 2019-03-05 19:24:23 UTC
I can confirm that the amd-staging-drm-next kernel also works for my system.
Comment 22 Adrian Garay 2019-03-07 06:08:54 UTC
Created attachment 143560 [details]
logged when modprobe amdgpu is run
Comment 23 Adrian Garay 2019-03-07 06:12:36 UTC
I am on firmware F.20 and the above does not work for me.  I still see the scrambled line of garbage at boot, but instead of crashing my screen just goes blank. 

If I add modprobe.blacklist=amdgpu to the kernel I can boot, where I then SSH in and record the log file while running modprobe amdgpu from the laptop (attached.)
Comment 24 Adrian Garay 2019-03-07 06:18:58 UTC
Since I didn't clearly specify above, my result was with amd-staging-drm-next, but the behavior is identical to the current 5.0 release.  When the screen goes blank the machine stops responding, but it appears systemd still periodically logs work such as cron jobs.  Laptop has to be hard powered down by holding the power switch.
Comment 25 Nicola Orlando 2019-03-07 11:07:07 UTC
(In reply to Harry Wentland from comment #17)
> I don't have that laptop but this link mentions a method to "Update the BIOS
> when Windows does not start": https://support.hp.com/ca-en/document/c00042629
> 
> I'd be more curious to know if the patch I posted fixes it.

I tried the linux-amd-staging-drm-next kernel (which should already have the patch if the other comments are to be trusted). I have the same error as comment #19, but the screen goes completely blank while the system stays responsive. I can even start X and run glxgears in it, although the screen still stays black and I can't see anything. It runs at 60 FPS for the first ~5 seconds, then drops to a steady 1 FPS.
Comment 26 Tim Carr 2019-03-12 04:41:57 UTC
Edit to my above statement about the amd-staging-drm-next kernel working. Either, I had forgotten to return the raven_dmcu.bin file to it's appropriate folder when I tested it, or upgrading my bios to F.20 messed it up again. I'm now back at the point I was before (Same as comment #24). If Magic SysReq is enabled on the system it can be used to reboot the system safely rather than a hard powerdown.
Comment 27 JerryD 2019-03-12 15:20:57 UTC
I am following 7 serious bug reports on kernel, redhat, freedesktop bugzillas.
This one is by far the most critical when one can not boot the machine. I am happy to try to test things. I am not set up to compile a kernel, but I can try release candidates. If anyone sees something that might work, please let me know.

model name	: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx

Command line: BOOT_IMAGE=/vmlinuz-4.19.15-300.fc29.x86_64 root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait iommu=pt processor.max_cstate=1

DMI: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018
Comment 28 Talha Khan 2019-04-04 14:13:44 UTC
I'm not sure if I want to delete raven_dcmu.bin on my system, would renaming it or moving it have the same effect? I have the same system as JerryD and am running Fedora 29 KDE.
Comment 29 Talha Khan 2019-04-04 18:27:07 UTC
Also, it seems that for the HP Envy x360 laptop, the latest BIOS is back to F.10; F.20 is listed in "previous versions".

https://support.hp.com/us-en/drivers/selfservice/hp-envy-15-bq100-x360-convertible-pc/16851053

Unfortunately there doesn't seem to be a way to downgrade.
Comment 30 Chris 2019-04-04 18:35:13 UTC
(In reply to Talha Khan from comment #29)
> Also, it seems that for the HP Envy x360 laptop, the latest BIOS is back to
> F.10; F.20 is listed in "previous versions".
> 
> https://support.hp.com/us-en/drivers/selfservice/hp-envy-15-bq100-x360-
> convertible-pc/16851053
> 
> Unfortunately there doesn't seem to be a way to downgrade.

Yes renaming should work. I didn't want to delete it either so I just backed up the file and deleted it. F20 is the latest, their site is just wonky. Although I hate that you can't downgrade I partially agree with the decision since it includes security updates you shouldn't roll back. They take that very seriously and I can objectively appreciate that
Comment 31 Talha Khan 2019-04-08 13:18:19 UTC
I moved the raven_dcmu.bin file to another directory, but unfortunately I am still unable to boot any kernel newer 4.20. For me at least, I get a black screen with a horizontal line near the bottom of orange pixels. I will attach the journalctl output from my last boot into kernel 5.0.5.

There was a bug filed in Red Hat bugzilla related to this:

https://bugzilla.redhat.com/show_bug.cgi?id=1668647
Comment 32 Talha Khan 2019-04-08 13:18:57 UTC
Created attachment 143893 [details]
Journalctl output for kernel 5.0.5-200.fc29.x86_64
Comment 33 Alex Deucher 2019-04-08 18:52:13 UTC
(In reply to Talha Khan from comment #31)
> I moved the raven_dcmu.bin file to another directory, but unfortunately I am
> still unable to boot any kernel newer 4.20. For me at least, I get a black
> screen with a horizontal line near the bottom of orange pixels. I will
> attach the journalctl output from my last boot into kernel 5.0.5.

Make sure you update your initrd if you move the file, otherwise, the driver will pick it up from the initrd at load time.
Comment 34 Talha Khan 2019-04-09 17:45:47 UTC
(In reply to Alex Deucher from comment #33)
> (In reply to Talha Khan from comment #31)
> > I moved the raven_dcmu.bin file to another directory, but unfortunately I am
> > still unable to boot any kernel newer 4.20. For me at least, I get a black
> > screen with a horizontal line near the bottom of orange pixels. I will
> > attach the journalctl output from my last boot into kernel 5.0.5.
> 
> Make sure you update your initrd if you move the file, otherwise, the driver
> will pick it up from the initrd at load time.

Thanks Alex.

This time after I moved the raven_dcmu.bin file, I ran the following to update the initramfs image as root:

dracut --kver 5.0.6-200.fc29.x86_64 --force

This time:
1. The black screen with the horizontal line of pixels appeared, but for a split second, and then things started working like normal.
2. I was able to log into KDE Plasma like normal.

I did a quick suspend/resume and that worked also (although there was a split second of scrambled pixels).

Does this mean that there's something wrong with the raven_dcmu.bin file?
Comment 35 JerryD 2019-04-14 02:29:26 UTC
Well, I just ran Fedora updates which brought kernel to 5.0.7-200.fc29 and there was also an update to mesa-dri-drivers.x86_64 18.3.6-1.fc29. My laptop failed to boot with or without the raven_dmcu.bin file after this update.

I then also noted that the 5.06 kernel will boot with or without the raven_dmcu.bin file.  I am leaving that firmware file in place and using 5.06 kernel at the moment.

I suspect we have a "whack-a-mole" bug, also affectionately known as a Heisenbug. Or at least more than one problem overlapping another. Hard to say whether the problem is in the kernel or somewhere else. For sure, I can not boot the 5.07.
Comment 36 Talha Khan 2019-04-15 13:57:16 UTC
(In reply to JerryD from comment #35)
> Well, I just ran Fedora updates which brought kernel to 5.0.7-200.fc29 and
> there was also an update to mesa-dri-drivers.x86_64 18.3.6-1.fc29. My laptop
> failed to boot with or without the raven_dmcu.bin file after this update.
> 
> I then also noted that the 5.06 kernel will boot with or without the
> raven_dmcu.bin file.  I am leaving that firmware file in place and using
> 5.06 kernel at the moment.
> 
> I suspect we have a "whack-a-mole" bug, also affectionately known as a
> Heisenbug. Or at least more than one problem overlapping another. Hard to
> say whether the problem is in the kernel or somewhere else. For sure, I can
> not boot the 5.07.

I also updated my system to kernel 5.0.7-200.fc29 and mesa-dri-drivers.x86_64 version 18.3.6-1.fc29, and here are the results for me:
original initramfs image with raven_dmcu.bin file: Unable to boot, same issue as before.
updated initramfs image without raven_dmcu.bin file: able to boot fine.

It's unfortunate that the issue is so hard to reproduce.
Comment 37 Talha Khan 2019-04-25 15:55:47 UTC
The issue occurs (and workaround works) on kernels 5.0.8 and 5.0.9.
Comment 38 Adrian Garay 2019-04-25 19:10:42 UTC
(In reply to Alex Deucher from comment #33)
> (In reply to Talha Khan from comment #31)
> > I moved the raven_dcmu.bin file to another directory, but unfortunately I am
> > still unable to boot any kernel newer 4.20. For me at least, I get a black
> > screen with a horizontal line near the bottom of orange pixels. I will
> > attach the journalctl output from my last boot into kernel 5.0.5.
> 
> Make sure you update your initrd if you move the file, otherwise, the driver
> will pick it up from the initrd at load time.

Alex, is there a known issue with raven_dmcu.bin and this chipset or is it specific to this laptop?  

No distro with a recent kernel is installable on my laptop unless I nomodeset the kernel and remove this firmware file from the initramfs.

Thanks.
Comment 39 Alex Deucher 2019-04-25 23:01:20 UTC
(In reply to Adrian Garay from comment #38)
> (In reply to Alex Deucher from comment #33)
> > (In reply to Talha Khan from comment #31)
> > > I moved the raven_dcmu.bin file to another directory, but unfortunately I am
> > > still unable to boot any kernel newer 4.20. For me at least, I get a black
> > > screen with a horizontal line near the bottom of orange pixels. I will
> > > attach the journalctl output from my last boot into kernel 5.0.5.
> > 
> > Make sure you update your initrd if you move the file, otherwise, the driver
> > will pick it up from the initrd at load time.
> 
> Alex, is there a known issue with raven_dmcu.bin and this chipset or is it
> specific to this laptop?  

I think it's specific to the sbios version.
Comment 40 Jay Fitzpatrick 2019-05-01 21:05:43 UTC
Issue still present on Fedora30 with Kernel 5.0.9-301.fc30.x86_64

Workaround 

sudo mv /usr/lib/firmware/amdgpu/raven_dmcu.bin /home/XXX/
sudo dracut -f --kver 5.0.9-301.fc30.x86_64


Tested and working on 

HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.17 03/29/2018
Comment 41 Talha Khan 2019-05-02 03:17:03 UTC
Thanks for the update Jay. Did you upgrade or do a fresh install?
Comment 42 Jay Fitzpatrick 2019-05-02 06:36:39 UTC
This was an upgrade from Fedora 29 with the above workaround applied to a v5 kernel.

Booting from the KDE spin of Fedora 30 would only work with limited graphics (nomodset) as the default boot hung.

I do not have access to that machine at the moment but will pull any required logs later.
Comment 43 Talha Khan 2019-05-03 20:20:49 UTC
I updated my Fedora KDE spin system from Fedora 29 to Fedora 30 and had the same experience as Jay's.
Comment 44 Ondrej Lang 2019-05-22 08:42:01 UTC
Created attachment 144316 [details]
Kernel log 5.1.3 showing amdgpu drm crash

Experiencing the same problem. On boot with any kernel > 4.20, the graphics is not initialized, few scrambled lines appear at the bottom of the screen and then the screen goes blank. The system actually boots as when I entered my credentials (on the black screen) and did "cat dmesg > dmesg.txt", when I rebooted with kernel 4.19 the file was there (I have attached it to this thread). The relevant portion of the dmesg log is:

[    5.133929] [drm] REG_WAIT timeout 1us * 100000 tries - mpc1_assert_idle_mpcc line:103
[    5.134034] WARNING: CPU: 2 PID: 367 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:277 generic_reg_wait.cold.0+0x29/0x30 [amdgpu]
...
[    5.134925]  drm_dev_register+0x111/0x150 [drm]
[    5.135335] [drm] Display Core initialized with v3.2.17!
[    5.642999] [drm:hwss_edp_wait_for_hpd_ready [amdgpu]] *ERROR* hwss_edp_wait_for_hpd_ready: wait timed out!
[    6.142512] [drm:hwss_edp_wait_for_hpd_ready [amdgpu]] *ERROR* hwss_edp_wait_for_hpd_ready: wait timed out!
[    6.143119] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    6.143121] [drm] Driver supports precise vblank timestamp query.
[    6.158613] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[    6.161822] [drm] Cannot find any crtc or sizes
[    6.188933] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0

I can also confirm that as a workaround, removing/moving file /lib/firmware/amdgpu/raven_dmcu.bin and regenerating the initramfs ("mkinitcpio -p linux" on Arch linux) allows the 5.1.3 kernel to boot normally (I can also start an X session and everything seems to be fine) and there are no drm errors in dmesg anymore.

System:
HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018
Kernel: 5.1.3-arch1-1-ARCH
Grub kernel parameters: amd_iommu=on iommu=pt idle=nomwait

If you need anything else let me know
Comment 45 Ondrej Lang 2019-05-27 16:43:56 UTC
I just came across this article, which seems to suggest a fix for the issue mentioned in this thread is coming in a future linux-firmware update:

https://www.phoronix.com/scan.php?page=news_item&px=AMD-Raven1-Skip-The-DMCU

It seem s patch has already been proposed to the kernel tree so hopefully this will fix the problem with some laptop models with the Raven Ridge 1 CPUs.

Patch url:

https://lists.freedesktop.org/archives/amd-gfx/2019-May/034307.html
Comment 46 Talha Khan 2019-06-06 15:13:17 UTC
Thanks for the update Ondrej. It seems that the patch may be in place. Since I've updated my Fedora 30 system's kernel to 5.1.6, I haven't had to run dracut to rebuild my initramfs. The raven_dmcu.bin file remains renamed as raven_dmcu.bin.old. Previously, I've had to move/rename the file and run dracut every time a kernel update came out.
Comment 47 Ondrej Lang 2019-06-07 09:15:53 UTC
The patch is not part of the kernel package itself, it is part of the linux-firmware package, there was no new release of linux-firmware since I tested this.

I think the reason why you did not have to rebuld the initramfs is that there was no linux-firmware update, which would again put the raven_dmcu.bin file in /lib/firmware/amdgpu/ folder and also in the initramfs, so you would need to rename/move it again and run dracut to rebuild the initramfs.

once a new version of linux-firmware comes out (with the patch) I will re-test and report results here.
Comment 48 Alex Deucher 2019-06-07 13:52:32 UTC
The patch is for the kernel.
Comment 49 Talha Khan 2019-06-11 16:56:49 UTC
I updated my kernel to 5.1.7 and did not need to run dracut to rebuild my initramfs. I was able to boot just fine.
Comment 50 Ondrej Lang 2019-06-12 09:47:24 UTC
I tested this yesterday with kernel 5.1.8 and if the file raven_dmcu.bin is present in the /lib/firmware/amdgpu/ folder when you are updating the kernel (or manually rebuilding the initramfs), the computer will boot with a blank screen next time.

There are 2 pieces to this. The linux-firmware package provides the binary files (i.e. the raven_dmcu.bin) so every time this package gets updated, you should rename/move the file and rebuild the initramfs. The linux-firmware updates are not as frequent as the kernel updates so if you do the workaround, you might go through several kernel updates without issues, but once linux-firmware updates, you have to repeat the workaround...

All the patch in the kernel will do is to ignore the raven_dmcu.bin file automatically (for raven 1 cpus) when building the initramfs so you don't have to rename/move it every time linux-firmware updates.
Comment 51 moreginger 2019-06-15 11:57:36 UTC
Also cannot boot without nomodeset or modprobe.blacklist=amdgpu. With the latter, I get a black screen if I run `modprobe amdgpu`.

I have Ryzen 2400G.

The trick with moving raven_dmcu.bin (as per below) didn't help.

```
mv /lib/firmware/amdgpu/raven_dmcu.bin ~/
update-initramfs -u
```

On Ubuntu/Linux 5.1.0.
Comment 52 Talha Khan 2019-07-02 13:41:52 UTC
(In reply to Ondrej Lang from comment #50)
> I tested this yesterday with kernel 5.1.8 and if the file raven_dmcu.bin is
> present in the /lib/firmware/amdgpu/ folder when you are updating the kernel
> (or manually rebuilding the initramfs), the computer will boot with a blank
> screen next time.
> 
> There are 2 pieces to this. The linux-firmware package provides the binary
> files (i.e. the raven_dmcu.bin) so every time this package gets updated, you
> should rename/move the file and rebuild the initramfs. The linux-firmware
> updates are not as frequent as the kernel updates so if you do the
> workaround, you might go through several kernel updates without issues, but
> once linux-firmware updates, you have to repeat the workaround...
> 
> All the patch in the kernel will do is to ignore the raven_dmcu.bin file
> automatically (for raven 1 cpus) when building the initramfs so you don't
> have to rename/move it every time linux-firmware updates.

You're right, I had updated to 5.1.15 and it seemed the firmware files were updated as well. I then had to perform the workaround in order to boot into the system.
Comment 53 Ondrej Lang 2019-07-11 09:52:55 UTC
According to the linux kernel 5.2 changelog (https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.2), the fix for the DMCU firmware issue on raven1 platform is included in that release.

I went ahead and tested this and can confirm that I was able to boot without a blank screen into my machine with kernel 5.2 without needing to use the workaround.

I tested with:
1.) re-installed latest linux-firmware package
2.) installed kernel 5.2
3.) re-generated the initramfs
4.) booted into linux using kernel 5.2 and had no blank screen, dmesg output is clean with no erros for amdgpu

Tested on:
HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.21 04/29/2019

I guess if someone else can confirm my findings, maybe on different raven1 hardware, this ticket can be closed.
Comment 54 Joe Coutcher 2019-07-12 03:39:44 UTC
Ondrej - I'm on a fresh install of Ubuntu 19.04 with no workarounds applied, using a similar setup to yours (HP Envy x360 15m-bq121dx.)  I installed kernel 5.2 RC7 (since the AMD64 build of 5.2 final on kernel.ubuntu.com is broken), and updated to the latest linux-firmware package available on the disco feed (1.178.2).  I should also note I'm on HP BIOS firmware version 21.  While the system boots to the desktop environment, there's tons of garbage, and when using Firefox, screen writes are occuring on random parts of the screen.  Also, I attempted running Basemark Web 3.0 in Firefox, and can consistently lock up the machine.  For reference, the kernel version is 5.2.0-050200rc7-lowlatency.
Comment 55 Ondrej Lang 2019-07-12 07:49:05 UTC
(In reply to Joe Coutcher from comment #54)
> Ondrej - I'm on a fresh install of Ubuntu 19.04 with no workarounds applied,
> using a similar setup to yours (HP Envy x360 15m-bq121dx.)  I installed
> kernel 5.2 RC7 (since the AMD64 build of 5.2 final on kernel.ubuntu.com is
> broken), and updated to the latest linux-firmware package available on the
> disco feed (1.178.2).  I should also note I'm on HP BIOS firmware version
> 21.  While the system boots to the desktop environment, there's tons of
> garbage, and when using Firefox, screen writes are occuring on random parts
> of the screen.  Also, I attempted running Basemark Web 3.0 in Firefox, and
> can consistently lock up the machine.  For reference, the kernel version is
> 5.2.0-050200rc7-lowlatency.

Hi Joe,

I'm quite sure your issue is not related to this ticket. The problem in this bug report is quite specific and is related to the raven_dmcu.bin firmware. It has a specific symptom where the screen is not initialized during boot (stays blank / black) so I think you need to report your problem somewhere else. Also, it would be good if you can check the kernel log after crash and see what error messages you have and then google for that specific message to find if someone else already created a bug report for it.

I have been running kernel 5.2 since yesterday and had no issues whatsoever. I also just run the Basemark Web 3.0 benchmark and had no issues.

As for your lockups, I know that the AMD APU had problems with random lockups in earlier kernels (if I remember correctly it was related to the C-state changes of the CPU), I myself had the problem and for me the fix was to add "idle=nomwait" to my kernel parameters. That fixed the random lockups for me. Now I don't know if this issue has already been addressed, last time I tried without the parameter was kernel 5.0 I think and still had lockups, so this might not be related to your specific problem, but as I said, best course of action for you is to inspect the kernel log after a crash, check the error message and then search for a bug report with that error and report your findings there.
Comment 56 Jay Fitzpatrick 2019-07-12 13:10:41 UTC
(In reply to Ondrej Lang from comment #53)
> According to the linux kernel 5.2 changelog
> (https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.2), the fix for
> the DMCU firmware issue on raven1 platform is included in that release.
> 
> I went ahead and tested this and can confirm that I was able to boot without
> a blank screen into my machine with kernel 5.2 without needing to use the
> workaround.
> 
> I tested with:
> 1.) re-installed latest linux-firmware package
> 2.) installed kernel 5.2
> 3.) re-generated the initramfs
> 4.) booted into linux using kernel 5.2 and had no blank screen, dmesg output
> is clean with no erros for amdgpu
> 
> Tested on:
> HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.21 04/29/2019
> 
> I guess if someone else can confirm my findings, maybe on different raven1
> hardware, this ticket can be closed.


Hi Ondrej

While I have not been able to test the 5.2 kernel on my Fedora system I have installed the 5.3 kernel from rawhide and am seeing the same results:

[root@envy ~]# cp /home/XXX/raven_dmcu.bin /usr/lib/firmware/amdgpu/
[root@envy ~]# dracut -f --kver 5.3.0-0.rc0.git2.2.fc31.x86_64
[root@envy ~]# reboot

Tested on HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018
Kernel version 5.3.0-0.rc0.git2.2.fc31.x86_64 

Installing rawhide kernel on Fedora without debug enabled:
sudo dnf config-manager --add-repo=http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/fedora-rawhide-kernel-nodebug.repo
sudo yum upgrade
Comment 57 Joe Coutcher 2019-07-12 15:24:30 UTC
(In reply to Ondrej Lang from comment #55)
> Hi Joe,
> 
> I'm quite sure your issue is not related to this ticket. The problem in this
> bug report is quite specific and is related to the raven_dmcu.bin firmware.
> It has a specific symptom where the screen is not initialized during boot
> (stays blank / black) so I think you need to report your problem somewhere
> else. Also, it would be good if you can check the kernel log after crash and
> see what error messages you have and then google for that specific message
> to find if someone else already created a bug report for it.
> 
> I have been running kernel 5.2 since yesterday and had no issues whatsoever.
> I also just run the Basemark Web 3.0 benchmark and had no issues.
> 
> As for your lockups, I know that the AMD APU had problems with random
> lockups in earlier kernels (if I remember correctly it was related to the
> C-state changes of the CPU), I myself had the problem and for me the fix was
> to add "idle=nomwait" to my kernel parameters. That fixed the random lockups
> for me. Now I don't know if this issue has already been addressed, last time
> I tried without the parameter was kernel 5.0 I think and still had lockups,
> so this might not be related to your specific problem, but as I said, best
> course of action for you is to inspect the kernel log after a crash, check
> the error message and then search for a bug report with that error and
> report your findings there.

Sorry...I was using the Lynx browser to type out the reply last night and didn't include all the details.  :-)

My report was related to this issue.  On every distro I've tried (Ubuntu 19.04, Fedora 30, OpenSuSE Tumbleweed), whenever it tries to initialize the firmware, the system goes to a black screen.  The only way for me to get around it is to boot with nomodeset, remove raven_dmcu.bin, update my initrd, and reboot.

My tests were to provide a baseline: fresh install, no workarounds applied, with kernel 5.2rc7 and latest linux-firmware package (AFAIK.)  Under those conditions, raven_dmcu.bin loads, I can get to the GUI, but seeing garbage on the screen.

When I have some time this weekend, I'll do some more testing/sift through the logs/try building 5.2 final and see if anything jumps out at me.  I'll also retry my tests with idle=nomwait.
Comment 58 Joe Coutcher 2019-07-12 23:09:48 UTC
Update - After adding init=nomwait and iommu=pt to my kernel parameters, I'm no longer seeing garbage on the display, and the system has been stable for the past hour on 5.2rc7.
Comment 59 Jay Fitzpatrick 2019-07-13 09:41:12 UTC
(In reply to Jay Fitzpatrick from comment #56)
> (In reply to Ondrej Lang from comment #53)
> > According to the linux kernel 5.2 changelog
> > (https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.2), the fix for
> > the DMCU firmware issue on raven1 platform is included in that release.
> > 
> > I went ahead and tested this and can confirm that I was able to boot without
> > a blank screen into my machine with kernel 5.2 without needing to use the
> > workaround.
> > 
> > I tested with:
> > 1.) re-installed latest linux-firmware package
> > 2.) installed kernel 5.2
> > 3.) re-generated the initramfs
> > 4.) booted into linux using kernel 5.2 and had no blank screen, dmesg output
> > is clean with no erros for amdgpu
> > 
> > Tested on:
> > HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.21 04/29/2019
> > 
> > I guess if someone else can confirm my findings, maybe on different raven1
> > hardware, this ticket can be closed.
> 
> 
> Hi Ondrej
> 
> While I have not been able to test the 5.2 kernel on my Fedora system I have
> installed the 5.3 kernel from rawhide and am seeing the same results:
> 
> [root@envy ~]# cp /home/XXX/raven_dmcu.bin /usr/lib/firmware/amdgpu/
> [root@envy ~]# dracut -f --kver 5.3.0-0.rc0.git2.2.fc31.x86_64
> [root@envy ~]# reboot
> 
> Tested on HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.20 12/25/2018
> Kernel version 5.3.0-0.rc0.git2.2.fc31.x86_64 
> 
> Installing rawhide kernel on Fedora without debug enabled:
> sudo dnf config-manager
> --add-repo=http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/
> fedora-rawhide-kernel-nodebug.repo
> sudo yum upgrade

--Update--

While kernel versions 5.3.0-0.rc0.git2.2.fc31.x86_64 and 5.3.0-0.rc0.git2.4.fc31.x86_64 versions of the kernel seemed to be pretty stable when it came to booting the system / touchscreen working etc, there was a massive amount of video tearing (within Chrome / Konsole) within my KDE session, enough to force me to roll back to 5.1.16-300.fc30.x86_64

Jay
Comment 60 Michael Eagle 2019-07-14 20:22:30 UTC
Created attachment 144787 [details]
attachment-8612-0.html

I am seeing reports with old BIOS, such as F.19.
I have a 15-cp0001na
https://support.hp.com/ie-en/drivers/selfservice/hp-envy-15-cp0000-x360-convertible-pc/20270303/model/23086446
Latest available is F.42 Rev.A
I am wondering if by any chance would be a match to other models also.
Comment 61 Talha Khan 2019-07-18 14:10:57 UTC
(In reply to Michael Eagle from comment #60)
> Created attachment 144787 [details]
> attachment-8612-0.html
> 
> I am seeing reports with old BIOS, such as F.19.
> I have a 15-cp0001na
> https://support.hp.com/ie-en/drivers/selfservice/hp-envy-15-cp0000-x360-
> convertible-pc/20270303/model/23086446
> Latest available is F.42 Rev.A
> I am wondering if by any chance would be a match to other models also.

The latest BIOS for my HP Envy x360 15-bq100 is F.21:
https://support.hp.com/us-en/drivers/selfservice/swdetails/hp-envy-15-bq100-x360-convertible-pc/16851053/model/18706859/swItemId/ob-232955-1?sku=1ZA02AV

My current kernel is 5.1.17-300 and so far there hasn't been any boot issues yet.
Comment 62 Chris 2019-07-18 18:17:34 UTC
Just wanted to say that on Kernel 5.2rc7, BIOS F.21 I'm able to boot with raven_dmcu.bin in /lib/firmware/amdgpu. I also updated raven_dmcu.bin from the linux firmware git. Not sure which part helped though or all of them together.
Comment 63 Talha Khan 2019-07-29 14:02:25 UTC
I updated my kernel to 5.1.19, and the firwmare files were updated, but I wasn't able to boot. I had to boot back into 5.1.16 and perform the workaround before being able to boot into 5.1.19.
Comment 64 Luya Tshimbalanga 2019-11-14 07:38:23 UTC
Created attachment 145952 [details]
Journal boot from hp envy x360 cp0xxx

The bug caused the Raven Ridge firmware to fail is back on kernel 5.3.11. Even the latest git version is affected
Comment 65 Luya Tshimbalanga 2019-11-14 07:41:51 UTC
Change the title reflecting the impact on Raven Ridge
Comment 66 Alex Deucher 2019-11-14 13:58:26 UTC
The dmcu firmware is no longer loaded on affected raven systems.
Comment 67 Alex Deucher 2019-11-14 14:51:13 UTC
(In reply to Luya Tshimbalanga from comment #65)
> Change the title reflecting the impact on Raven Ridge

This is completely unrelated to your issue.  Please don't appropriate other bugs for unrelated issues.
Comment 68 Luya Tshimbalanga 2019-11-14 16:58:57 UTC
(In reply to Alex Deucher from comment #67)
> (In reply to Luya Tshimbalanga from comment #65)
> > Change the title reflecting the impact on Raven Ridge
> 
> This is completely unrelated to your issue.  Please don't appropriate other
> bugs for unrelated issues.

Sorry about that.
Comment 69 Jay Fitzpatrick 2019-11-16 12:27:38 UTC
Upgraded to 5.3.11-300.fc31.x86_64 this morning with linux-firmware-20191022-103.fc31.noarch and no evidence of regression

Jay


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.