After returning from a 2-weeks-vacation, I saw that some new and interesting features made it into https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next which I had used before (at commit 94097b0f7f1bfa54b3b1f8b0d74bbd271a0564e4), so I updated my kernel to the current-as-of-today version (at commit 43dd6fde5df450938568885249b836eb376e2ad6) - but found that X11 would not start anymore with the new version.
The symptom more specifically is: Booting to the console is fine.
When I invoke "X" (manually), the console remains visible, and the Xorg.0.log output indefinitely pauses after the messages
[ 36.622] (EE) AMDGPU(0): Failed to allocate scanout buffer memory
[ 36.623] (EE) AMDGPU(0): Failed to allocate scanout buffer memory
[ 36.623] (EE) AMDGPU(0): failed to set mode: Invalid argument
have been emitted. (These messages are not present with the older, working kernel version.)
At about the same time, the following dmesg output is emitted:
[ 36.405078] ------------[ cut here ]------------
[ 36.405090] WARNING: CPU: 5 PID: 758 at drivers/gpu/drm/drm_mode_object.c:294 drm_object_property_get_value+0x22/0x30 [drm]
[ 36.405090] Modules linked in: ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp xt_owner xt_mark iptable_nat cmac nf_conntrack_ipv4 c
pufreq_ondemand nf_defrag_ipv4 nf_nat_ipv4 nf_nat msr bnep nf_conntrack iptable_mangle iptable_filter nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_realtek
snd_hda_codec_generic btusb btrtl btbcm btintel snd_hda_codec_hdmi bluetooth igb snd_hda_intel snd_hda_codec ptp ecdh_generic pps_core rfkill snd_hda_core
dca crc16 snd_hwdep snd_pcm edac_mce_amd snd_timer kvm_amd snd soundcore sp5100_tco kvm tpm_tis tpm_tis_core input_leds evdev i2c_piix4 shpchp irqbypass pcs
pkr led_class tpm button 8250_dw acpi_cpufreq sch_fq_codel usbip_host usbip_core sg exfat(O) it87(O) hwmon_vid ip_tables x_tables algif_skcipher af_alg sd_m
od uas usb_storage serio_raw atkbd
[ 36.405113] libps2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ccp rng_core ahci libahc
i xhci_pci xhci_hcd libata usbcore scsi_mod usb_common i8042 serio amdgpu i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
xfs libcrc32c crc32c_generic crc32c_intel dm_crypt dm_mod dax nvme nvme_core i2c_dev
[ 36.405125] CPU: 5 PID: 758 Comm: Xorg Tainted: G W O 4.13.0-rc5-amd+ #6
[ 36.405125] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
[ 36.405126] task: ffff8807fa35a940 task.stack: ffffc90008be0000
[ 36.405134] RIP: 0010:drm_object_property_get_value+0x22/0x30 [drm]
[ 36.405135] RSP: 0018:ffffc90008be3bc8 EFLAGS: 00010282
[ 36.405136] RAX: ffffffffa04b9340 RBX: ffff8807f5ac0000 RCX: 0000000000000000
[ 36.405136] RDX: ffffc90008be3be8 RSI: ffff8807fa3f4880 RDI: ffff8807f6b84028
[ 36.405137] RBP: ffffc90008be3bc8 R08: ffff8807fa3f6520 R09: ffff8807ed303c00
[ 36.405137] R10: 0000000000000040 R11: 0000000000000000 R12: ffff8807f6b84000
[ 36.405138] R13: 00000000ffffffea R14: ffff8807faf17980 R15: ffff8807f6b84028
[ 36.405138] FS: 00007f42ac7fc940(0000) GS:ffff88081ed40000(0000) knlGS:0000000000000000
[ 36.405139] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 36.405140] CR2: 000000706e122018 CR3: 00000007f790f000 CR4: 00000000003406e0
[ 36.405140] Call Trace:
[ 36.405175] amdgpu_dm_connector_atomic_set_property+0x10a/0x180 [amdgpu]
[ 36.405184] drm_atomic_set_property+0x186/0x4a0 [drm]
[ 36.405191] drm_mode_obj_set_property_ioctl+0x12d/0x280 [drm]
[ 36.405199] ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 36.405206] drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm]
[ 36.405212] drm_ioctl_kernel+0x5d/0xb0 [drm]
[ 36.405219] drm_ioctl+0x32a/0x400 [drm]
[ 36.405226] ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 36.405229] ? lru_cache_add_active_or_unevictable+0x36/0xb0
[ 36.405249] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 36.405251] do_vfs_ioctl+0xa5/0x600
[ 36.405253] ? handle_mm_fault+0xd8/0x230
[ 36.405254] SyS_ioctl+0x79/0x90
[ 36.405256] entry_SYSCALL_64_fastpath+0x13/0x94
[ 36.405257] RIP: 0033:0x7f42aa0c30c7
[ 36.405258] RSP: 002b:00007ffe62874da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 36.405259] RAX: ffffffffffffffda RBX: 00007f42aa387aa0 RCX: 00007f42aa0c30c7
[ 36.405259] RDX: 00007ffe62874de0 RSI: 00000000c01064ab RDI: 000000000000000b
[ 36.405260] RBP: 00007f42aa387af8 R08: 000000706e121b80 R09: 0000000000000001
[ 36.405260] R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000020
[ 36.405260] R13: 0000000000000004 R14: 00007f42aa387af8 R15: 0000000000000000
[ 36.405261] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 46 60 55 48 89 e5 48 8b 80 70 03 00 00 48 83 78 20 00 75 07 e8 60 ff ff ff 5d c3 <0f> ff e8 57 ff ff ff 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48
[ 36.405277] ---[ end trace c128c94b0c5a469b ]---
[ 36.532674] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* Atomic check failed with err: -22
(However, scary dmesg output like this, with a call trace at amdgpu_dm_connector_atomic_set_property, do also occur for "working" kernel versions, unlike the cited X11 messages above, but the "*ERROR* Atomic check failed with err: -22" line only occurs with the "new, broken" kernel.)
At this point, the consolse is still visible, and if I use "Alt+F2" or such to switch to another virtual console I can work with that console. If I switch back to the virtual console that I started X from, then the X server ends (without ever having displayed anything).
Since this symptom was 100% reproducible with the new kernel, I started a "git bisect" on the amd-staging-drm-next kernel, which led to the following result:
ebbf7337e2daacacef3e01114e6be68a2a4f11b4 is the first bad commit
Author: Charlene Liu <firstname.lastname@example.org>
Date: Tue Aug 22 20:15:28 2017 -0400
drm/amd/display: Block 6Ghz timing if SBIOS set HDMI_6G_en to 0
Signed-off-by: Charlene Liu <email@example.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <firstname.lastname@example.org>
:040000 040000 0f221431fffb401f50d49e9dab16ca2d93bb6388 51f57c497d9d1d7263847e246e9aa794032e9112 M drivers
From looking at https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next&id=ebbf7337e2daacacef3e01114e6be68a2a4f11b4 I cannot see how this patch could prevent my X from starting (using HDMI, 3840x2160), but it is 100% reproducible: X11 starts with the git before this commit, but not with this git commit included, I tried multiple reboot-cycles to verify this.
Please attach your xorg log, dmesg output, and xorg conf if you are using one.
Created attachment 134299 [details]
Xorg.0.log written while symptoms of Bug 102820 occur
Created attachment 134300 [details]
dmesg (filtered by "grep -i -w -e drm -e amdgpu")
I attached the Xorg.0.log as requested, the "dmesg" output as a whole would be somewhat hard to edit for potentially sensitive content, thus I filtered it through "grep -i -w -e drm -e amdgpu" and hope that is good enough.
I do not use an xorg.conf file, but I do have some minor customization in /etc/X11/xorg.conf.d/* files that were supplied by the Linux Arch distribution:
DisplaySize 508 285
HorizSync 20.0 - 150.0
VertRefresh 23.96 - 90.0
# 3840x2160p at 24Hz 16:9
ModeLine "3840x2160@24" 297.000 3840 5116 5204 5500 2160 2168 2178 2250 +hsync +vsync
# 3840x2160p at 25Hz 16:9
ModeLine "3840x2160@25" 297.000 3840 4896 4984 5280 2160 2168 2178 2250 +hsync +vsync
# 3840x2160p at 30Hz 16:9
ModeLine "3840x2160@30" 297.000 3840 4016 4104 4400 2160 2168 2178 2250 +hsync +vsync
# 3840x2160p at 50Hz 16:9
ModeLine "3840x2160@50" 594.000 3840 4896 4984 5280 2160 2168 2178 2250 +hsync +vsync
# 3840x2160p at 60Hz 16:9
ModeLine "3840x2160@60" 594.000 3840 4016 4104 4400 2160 2168 2178 2250 +hsync +vsync
Option "Monitor-HDMI-A-0" "Monitor0"
Option "TearFree" "On" # [<bool>]
Viewport 0 0
(The once manually added ModeLines are not currently in use.)
Two additional remarks:
- I meanwhile verified that X runs fine when I compile the current amd-staging-drm-next with only commit ebbf7337e2daacacef3e01114e6be68a2a4f11b4 reverted.
- The kernel Call Trace starting with "amdgpu_dm_connector_atomic_set_property+0x10a/0x180 [amdgpu]" is probably an unrelated issue, as it also occurs without commit ebbf7337e2daacacef3e01114e6be68a2a4f11b4.
Just as an update: This very bug still occurs with https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next as of today, and it is still fixed by reverting commit ebbf7337e2daacacef3e01114e6be68a2a4f11b4
Could somebody comment what this commit is good for, given that it seems to only prevent X11 from running with certain 4k HDMI displays?
I think it blocks modes that require 6Ghz timings if the platform isn't validated for them. Does everything work correctly if you remove the modes from your monitor section?
(In reply to Alex Deucher from comment #7)
> I think it blocks modes that require 6Ghz timings if the platform isn't
> validated for them.
What does "platform isn't validated for 6Ghz timings" mean?
Both the RX 460 and the 4k TV I use are officially advertised as supporting 4k @ 60Hz, and indeed they work just fine in that mode if commit ebbf7337e2daacacef3e01114e6be68a2a4f11b4 is not part of the kernel.
> Does everything work correctly if you remove the modes
> from your monitor section?
Removing the (not really used) ModeLine statements from the X11 config does in fact change the symptoms when commit ebbf7337e2daacacef3e01114e6be68a2a4f11b4 is present: X11 will start then, but only to use what xrandr says to be a "3840x2160 @ 30Hz" mode, but which the display picks up as some not-really-filling-the-whole-screen signal, with black borders on both sides (so an image is shown but compressed horizontally - in the usually active "just scan" mode and also even if the display is manually forced to a 16:9 aspect ratio).
(In reply to dwagner from comment #8)
> (In reply to Alex Deucher from comment #7)
> > I think it blocks modes that require 6Ghz timings if the platform isn't
> > validated for them.
> What does "platform isn't validated for 6Ghz timings" mean?
> Both the RX 460 and the 4k TV I use are officially advertised as supporting
> 4k @ 60Hz, and indeed they work just fine in that mode if commit
> ebbf7337e2daacacef3e01114e6be68a2a4f11b4 is not part of the kernel.
Harry or Jordan can verify, but I think it means the board maker did not validate the HDMI connector for 6Ghz timings. The asic may support it, but the board has to be validated to make sure the physical connector supports it and the traces are not too long, etc. Some boards only support 4k@60 over DP.
> > Does everything work correctly if you remove the modes
> > from your monitor section?
> Removing the (not really used) ModeLine statements from the X11 config does
> in fact change the symptoms when commit
> ebbf7337e2daacacef3e01114e6be68a2a4f11b4 is present: X11 will start then,
> but only to use what xrandr says to be a "3840x2160 @ 30Hz" mode, but which
> the display picks up as some not-really-filling-the-whole-screen signal,
> with black borders on both sides (so an image is shown but compressed
> horizontally - in the usually active "just scan" mode and also even if the
> display is manually forced to a 16:9 aspect ratio).
The driver drops the higher bandwidth modes for HDMI connectors if the board does not support them.
Alex is correct about the intention of this change. I've never played around with the modeline before so don't fully understand the impact of having that in the xorg.conf. It sounds like that's forcing certain modes which we then can't support due to the commit you mention.
We have an open issue where we don't correctly filter out modes if we can't support them for whatever reason. Fixing that might help you but we don't have anyone looking at it yet.
As for the behavior you're seeing with the modelines removed, I don't fully understand what you're seeing. Mind posting a picture? It sounds like we should be driving the monitor in 4k30 at that point but seems like something goes wrong there, from your description.
(In reply to Alex Deucher from comment #9)
> Harry or Jordan can verify, but I think it means the board maker did not
> validate the HDMI connector for 6Ghz timings. The asic may support it, but
> the board has to be validated to make sure the physical connector supports
> it and the traces are not too long, etc. Some boards only support 4k@60
> over DP.
This is the manufacturers page advertising my graphics board:
It couldn't be more affirmative regarding support of HDMI 2.0b and high refresh rates for 4k modes...
(And practical experience during the last months also tells me: Yes, HDMI 4k @ 60Hz output is stable - even when using a 4m length HDMI cable.
> The driver drops the higher bandwidth modes for HDMI connectors if the board
> does not support them.
I have looked if there are any firmware upgrades for this card or any hints from others regarding lack of 4k 60 Hz support, but found neither. (I only found unofficial firmware that switches on shader units the manufacturer keeps dormant, but I am not using that.)
Notice that the trailing dash in above link is part of the link, without it the page is not found: http://www.xfxforce.com/en-us/products/amd-radeon-rx-400-series/rx-460-4gb-heatsink-rx-460p4hfg5-
(In reply to dwagner from comment #11)
> This is the manufacturers page advertising my graphics board:
> It couldn't be more affirmative regarding support of HDMI 2.0b and high
> refresh rates for 4k modes...
It's pretty vague unfortunately regarding HDMI:
"Latest Display Connections
Ready for the latest displays
Radeon™ GPUs with the Polaris architecture support HDMI® 2.0b and DisplayPort™ 1.3 for compatibility with a new generation of monitors that would make any gamer excited:
• 1080p @ 240Hz • 1440p @ 240Hz • 4K @ 120Hz • 1440p ultra-wide @ 190Hz"
It does not explicitly say 4K@60 on HDMI. The high refresh rates may only be available on DP. HDMI 2.0b does not imply 4K@60.
> (And practical experience during the last months also tells me: Yes, HDMI 4k
> @ 60Hz output is stable - even when using a 4m length HDMI cable.
Your board may work, but others might not depending on the board, cable, monitor, etc.
> > The driver drops the higher bandwidth modes for HDMI connectors if the board
> > does not support them.
> I have looked if there are any firmware upgrades for this card or any hints
> from others regarding lack of 4k 60 Hz support, but found neither. (I only
> found unofficial firmware that switches on shader units the manufacturer
> keeps dormant, but I am not using that.)
The display features that are validated are stored in the vbios and the vbios is updated by the board vendor based on what features they validated on the board.
(In reply to Alex Deucher from comment #13)
> It's pretty vague unfortunately regarding HDMI:
> It does not explicitly say 4K@60 on HDMI. The high refresh rates may only
> be available on DP.
Well then, if you suspect XFX to be sneaky bitches, I should refer you to the page of the actual reseller that I bought my XFX RX 460 from:
which clearly states: "The 2.0b HDMI port carries a 4K/UHD resolution signal at 60 Hz and permits the sending and receiving of encrypted signals using the HDCP 2.2 protocol (4K streaming, 4K Blu-Rays)."
And that corresponds well to the many reports of Windows users you find on the Internet that confirm XFX RX460 cards do in fact drive HDMI displays at 4k 60Hz also under Windows - or should we assume the Windows drivers to be broken and all those users just being "lucky"?
> HDMI 2.0b does not imply 4K@60.
Support of 4k @ 60Hz was the first and foremost feature advertised by the HDMI licensors as being the main reason for introducing HDMI 2.0! - Here's their press release from back then:
and a clear statement on HDMI 2.0b supporting 4k@60Hz:
> Your board may work, but others might not depending on the board, cable,
> monitor, etc.
Yes, the GPU board cannot guarantee anything with regards to cabling or monitors - but how is that a reason for a driver software to keep users from even trying to use this fundamentally important, reseller-promised, broadly-reported-to-be-working-under-Windows feature?
Should audio drivers discard LFE channels because some sound card vendor cannot validate that the user connected a capable sub-woofer to his amplifier...?
(In reply to Harry Wentland from comment #10)
> As for the behavior you're seeing with the modelines removed, I don't fully
> understand what you're seeing. Mind posting a picture? It sounds like we
> should be driving the monitor in 4k30 at that point but seems like something
> goes wrong there, from your description.
The display is driven in 3840x2160 @ 30Hz with modelines removed and commit present - but the picture fills only ~80% in the middle of the screen horizontally (100% vertically).
If I use "xrandr --output HDMI-A-0 --mode 3840x2160 --rate 24" to switch to 24Hz, then the picture fills the whole screen.
Without the commit (still without Modelines), when I use "xrandr --output HDMI-A-0 --mode 3840x2160 --rate 30" to voluntarily only use 30Hz, then the picture fills the whole screen. So with and without the commit, different parameters seem to be used to output 3840x2160 @ 30Hz.
(If really required, I can shoot a photo later, but it doesn't really show anything remarkable except for the two black bars to the left and the right of the screen.)
That's interesting. No picture needed anymore. I get it now.
This is really weird behavior. Do you have the actual TV model by any chance? If I get a chance I'd love to see if I can find something similar in the office and repro it.
As for 4k60 support, you're right that that's usually entailed by HDMI 2.0 but like Alex said HDMI 2.0 doesn't necessarily imply 4k60. In your case it looks like our Video BIOS doesn't report that 4k60 (i.e. 6GB) is validated. I'll try to find out more.
(In reply to Harry Wentland from comment #16)
> This is really weird behavior. Do you have the actual TV model by any
It's an LG 55EG9609 TV. A link to a manual: https://www.lg.com/de/lgecs.downloadFile.ldwf?DOC_ID=20150135519057&what=MANUAL&fromSystem=LG.COM&fileId=IMgqHFIlfEO4t7Hfb0BBA&ORIGINAL_NAME_b1_a1=4_MFL68823613_06_151020.pdf
(And of course, the GPU is connected to one of the two HDMI 2.0 ports where "HDMI ULTRA HD Deep Colour" is possible and switched on - in LG's lingo that is what enables the higher clocked modes.)
Created attachment 137476 [details] [review]
drm/amd/display: Default HDMI6G support to true. Log VBIOS table error.
Can you see if this helps? Our Windows driver definitely checks the HDMI6G flag from VBIOS but it will default to allow 6G on HDMI if the VBIOS check fails.
This patch is porting the same behavior in the hopes that it will help with your issue.
If this patch works a dmesg log with the amdgpu.dc_log=1 option on the kernel would help us understand the root cause a bit better.
If it doesn't work it's back to the drawing board for me.
(In reply to Harry Wentland from comment #19)
> Created attachment 137476 [details] [review] [review]
> drm/amd/display: Default HDMI6G support to true. Log VBIOS table error.
> Can you see if this helps? Our Windows driver definitely checks the HDMI6G
> flag from VBIOS but it will default to allow 6G on HDMI if the VBIOS check
> This patch is porting the same behavior in the hopes that it will help with
> your issue.
I had to change "ctx->logger" into "enc110->base.ctx->logger" to make your patch
compile (applied on today's head of amd-staging-drm-next).
Yes, that patch changes the behaviour for the better: HDMI 2.0 modes - especially 4k@60Hz work fine with this patch applied on my system. Tried multiple reboots, result was consistent.
> If this patch works a dmesg log with the amdgpu.dc_log=1 option on the
> kernel would help us understand the root cause a bit better.
I did enable amdgpu.dc_log=1 on the kernel command line - but there is no
"Failed to get encoder_cap_info from VBIOS..." message visible in dmesg, which makes me wonder what makes the new code path differ from the old one. (Attaching dmesg output below.)
Created attachment 137487 [details]
dmesg output after Harry's recent patch for the "6G" check was applied
Thanks for fixing and testing the patch. I'll get it reviewed and merged.
It looks like the dc_log=1 didn't take. I'd expect a lot more spam from DC if it took. It should be fine in any 4.15 RC but there might still be a bugfix for it that didn't make it into 4.15. I don't remember. amd-staging-drm-next should be good with the log option.
Either way, looks like the VBIOS info isn't what we expect on some boards.
Marking resolved as fix has been in mainline for a while now. If this is still an issue feel free to reopen.