Created attachment 137859 [details]
xorg-server-18.104.22.1681, xf86-video-ati-18.0.0 (radeon) on Gentoo:
X manages to start after a couple of errors, but without drm/compositor.
Downgrading to xorg-server-1.19.5 solves the problem.
Created attachment 137860 [details]
Kernel 4.15.7, mesa 18.0.0_rc4, libdrm 2.4.91
Please attach the Xorg log file corresponding to a failure.
Created attachment 137861 [details]
The Xorg log file shows a crash 1734 seconds after boot. Does anything appear in dmesg around the same time?
Can you bisect xserver to find the first commit this happens with? Note that you may need rebuild the xf86-video-ati and xf86-input-libinput drivers after moving to a different xserver Git snapshot.
Presumably you can avoid the problem with
Option "AccelMethod" "EXA"
Does the problem also occur with the modesetting driver instead of radeon?
Created attachment 137934 [details]
/var/log/messages during X crash
I can't seem to reproduce the crash atm, so I am posting a snippet of the syslog of the time this crash probably appeared.
Both EXA or the modesetting driver are indeed avoiding the problem and compositor and framebuffers work fine then.
It will take me some more time before I can try a bisect though, as I don't have permanent access to this box.
Created attachment 138054 [details]
sceenshot while gdm start
Have similar situation using xorg-server-22.214.171.1241, xf86-video-ati-18.0.0 (radeon) on Gentoo.
The screen blurred a few second when gdm has just started. Attch a screenshot.
[ +0.652607] radeon 0000:01:00.0: evergreen_cs_track_validate_texture:855 texture bo too small (layer size 4325376, offset 0, max layer 1, depth 1, bo size 4096) (1408 768)
[ +0.000032] [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
Mar 12 17:16:04 /usr/libexec/gdm-x-session: radeon: The kernel rejected CS, see dmesg for more information (-22).
Mar 12 17:16:04 kernel: radeon 0000:01:00.0: evergreen_cs_track_validate_texture:855 texture bo too small (layer size 4325376, offset 0, max layer 1, depth 1, bo size 4096) (1408 768)
Mar 12 17:16:04 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
Downgrading to xorg-server-1.19.5 solves the problem, too.
I opened a discussion about a similar error here. Or maybe this is the same error?
That discussion includes screenshots of the error and some log files.
(In reply to chris0033547 from comment #10)
> I opened a discussion about a similar error here. Or maybe this is the same
Looks like it's the same.
I think I know what the issue is, and have been slowly working towards a solution, but I don't know when it'll be ready for testing. Meanwhile, the options to avoid this issue with Xorg 1.20 are using EXA or the modesetting driver.
*** Bug 106797 has been marked as a duplicate of this bug. ***
(In reply to Michel Dänzer from comment #12)
> *** Bug 106797 has been marked as a duplicate of this bug. ***
Submitter of this duplicate bug (since it seemed different in some ways).
For the record, I don't see any issues until after the SDDM greeter, after KDE initiates. Then it consistently garbles the screens under Mesa 18.0.4-1, under earlier version it just makes getting the other monitors up impossible but the first GPU monitors are usuable.
(In reply to Ian Kidd from comment #13)
> Submitter of this duplicate bug (since it seemed different in some ways).
Does your issue happen with the Xorg modesetting driver as well? If yes, it's probably a different issue after all; otherwise it's probably the same root cause.
Daniel said everything is working in the downstream bug report (https://bugs.gentoo.org/649736#c7)
(In reply to Matt Turner from comment #15)
> Daniel said everything is working in the downstream bug report
He must be getting lucky; I'm also not running into the issue (fundamentatlly a BO lifetime mismatch between the Xorg driver and Mesa) on my work laptop, but it's still there.
I'm also watching this thread.
I have an HD 5570 / Redwood PRO running three displays with Arch Linux. Upgrading xorg-server from 1.19.6-2 to 1.20.0-7 and xf86-video-ati from 1:7.10.0-1 to 1:18.0.1-2 causes display corruption with scan-out in the wrong sequence, cursor disappearing, failure to configure all three displays, and gigabytes of repetitive warnings in the system log, in the nature of:
kernel: radeon 0000:05:00.0: evergreen_cs_track_validate_cb:481 cb bo too small (layer size 15728640, offset 0, max layer 1, bo size 5242880, slice 61439)
kernel: radeon 0000:05:00.0: evergreen_cs_track_validate_cb:485 problematic surf: (3840 1024) (4 4 1 1 2 1024 2)
kernel: radeon 0000:05:00.0: evergreen_packet3_check:1948 invalid cmd stream 666
kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
kernel: [drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed 0xa
kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -2!
kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000A, can't create framebuffer
Sometimes the display can be made clear by switching to a virtual terminal and back to X, but generally, the display corruption will vary each time the X server is restarted, suggesting that the hardware configuration is not being set consistently or properly. Downgrading mesa does not resolve the problem, and the problem tracks with the X server version.
The Xorg log showed:
[ 82831.532] (WW) RADEON(0): get vblank counter failed: Invalid argument
[ 82831.532] (WW) RADEON(0): flip queue failed: Invalid argument
[ 82831.532] (WW) RADEON(0): Page flip failed: Invalid argument
[ 82831.532] (EE) RADEON(0): present flip failed
but then, blindly trying
<device screen="0" driver="dri2">
<option name="vblank_mode" value="0" />
seem to resolve these errors - which makes no sense to me, since the driver is suppose to default to dri3.
For reference, the openbox window manager, commonly used with lxqt, cannot be made to display correctly, but kwin will work sometimes, and the wayland compositors do not seem to be affected.
I also have another machine with an HD 4350 / RV710 running a single display which does _not_ show any of these problems at all.
is this the same issue: https://bugs.freedesktop.org/show_bug.cgi?id=106913
An odd thing I've noticed, maybe representing the same underlying issue, emerging at the same time as the "Invalid command stream" problem, with the upgrade from xorg-server 1.19 to 1.20, I see differing kinds of display corruption with successive restarts of the X server. For instance, sometimes the cursor is normal but the remainder of the display is corrupted, but other times, only the cursor is corrupted and the remainder of the display is normal. In particular, in the session I currently have running, the cursor is corrupted on the desktop, but normal over any application window, and the interesting thing is that there are none of the "Invalid command stream" errors to be seen in the log.
Additionally, I upgraded Arch Linux on an old laptop with NVidia "GeForce 7150M / nForce 630M" hardware, and started seeing similar different results when restarting the X server, for instance, after trying kwin_wayland and Xwayland, and then restarting the X11 server running with nouveau. And so, this display corruption seems not tied to just the radeon driver.
I am given the impression that there is a connection between the "Invalid command stream" errors and these screen corruption errors.
Also, I was reading at https://en.wikipedia.org/wiki/Direct_Rendering_Manager
"In 2014 Matt Roper (Intel) developed the universal planes (or unified planes) concept by which framebuffers (primary planes), overlays (secondary planes) and cursors (cursor planes) are all treated as a single type of object with an unified API. Universal planes support provides a more consistent DRM API with fewer, more generic ioctls."
In that sense, it appears that the display corruption may be moving between "primary planes" and "cursor planes", as a naive interpretation.
Michel, if you think that these two issues, the "Invalid command stream" errors and the differing forms of display corruption with X server restarts, please let me know, and I will avoid filing a separate bug report for the display corruption, though I don't know if this is hardware driver related, DRM/render related, or KMS related.
This patch series should fix it: https://patchwork.freedesktop.org/series/45979/
(In reply to Michel Dänzer from comment #20)
> This patch series should fix it:
Thank you, works for me! (I'm not the OP though) I've been using it for the weekend, everything been fine, and I didn't see cursor or sddm corruptions anymore.
Tested-by: Konstantin Kharlamov <email@example.com>
Thanks for the report and for testing the patches, fixed in Git master:
Author: Michel Dänzer <firstname.lastname@example.org>
Date: Fri Jun 29 17:57:03 2018 +0200
glamor: Use GBM for BO allocation when possible
Another test from me:
I created a package for Archlinux using commit 3c4c0213c11d623cba7adbc28dde652694f2f758 and it seems that this error is fixed!
*** Bug 106913 has been marked as a duplicate of this bug. ***
Sorry - I don't seem to have been on the CC list. I just tested this Sunday, July 22.
I am running with the mkkot Arch Linux package, xf86-video-ati-git, 1:18.0.1.r12.g3c4c0213-1, 22 Jul 2018. I have the HD 5570 / Redwood PRO running three displays.
Starting lxqt with openbox, the original "Invalid command stream" errors are not seen, but I still see:
kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000C, can't create framebuffer
kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x00000400, can't create framebuffer
The 1st display will receive sync, but the screen is black with no cursor, and the 2nd and 3rd displays receive nothing and go to suspend.
Repeatedly stopping and re-starting lxqt seems to always produce the same result, where before, sometimes the screens would light, with various kinds of display corruption, in addition to the the black screen result described. Now, just the black screens.
I can run weston and kwin_wayland, but the regular X11 server is unusable.
(In reply to James from comment #25)
> kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000C,
> can't create framebuffer
That's bug 107297, a regression due to the fix for this bug, which is now fixed in upstream Git master as well.
Yes! That fixed it. Michael, thank you very much for your work.
Oops - I apologize for misspelling your name. Thank you Michel.
*** Bug 107528 has been marked as a duplicate of this bug. ***
*** Bug 107819 has been marked as a duplicate of this bug. ***
*** Bug 107876 has been marked as a duplicate of this bug. ***