Bug 105381 - [RV710][DRM] Invalid command stream
Summary: [RV710][DRM] Invalid command stream
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 106797 106913 107528 107819 107876 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-03-07 13:56 UTC by Daniel
Modified: 2018-09-10 09:46 UTC (History)
12 users (show)

See Also:
i915 platform:
i915 features:


Attachments
/var/log/messages (2.01 KB, text/plain)
2018-03-07 13:56 UTC, Daniel
no flags Details
dmesg (2.26 KB, text/plain)
2018-03-07 13:57 UTC, Daniel
no flags Details
Xorg.0.log (49.34 KB, text/plain)
2018-03-07 14:14 UTC, Daniel
no flags Details
/var/log/messages during X crash (3.20 KB, text/plain)
2018-03-09 12:18 UTC, Daniel
no flags Details
sceenshot while gdm start (394.42 KB, image/jpeg)
2018-03-13 03:02 UTC, Petrus
no flags Details

Description Daniel 2018-03-07 13:56:43 UTC
Created attachment 137859 [details]
/var/log/messages

xorg-server-1.19.99.901, xf86-video-ati-18.0.0 (radeon) on Gentoo:

X manages to start after a couple of errors, but without drm/compositor.

Downgrading to xorg-server-1.19.5 solves the problem.
Comment 1 Daniel 2018-03-07 13:57:05 UTC
Created attachment 137860 [details]
dmesg
Comment 2 Daniel 2018-03-07 14:02:06 UTC
Kernel 4.15.7, mesa 18.0.0_rc4, libdrm 2.4.91
Comment 3 Michel Dänzer 2018-03-07 14:07:20 UTC
Please attach the Xorg log file corresponding to a failure.
Comment 4 Daniel 2018-03-07 14:14:23 UTC
Created attachment 137861 [details]
Xorg.0.log
Comment 5 Michel Dänzer 2018-03-07 14:25:53 UTC
The Xorg log file shows a crash 1734 seconds after boot. Does anything appear in dmesg around the same time?

Can you bisect xserver to find the first commit this happens with? Note that you may need rebuild the xf86-video-ati and xf86-input-libinput drivers after moving to a different xserver Git snapshot.

Presumably you can avoid the problem with

 Option "AccelMethod" "EXA"

?

Does the problem also occur with the modesetting driver instead of radeon?
Comment 6 Daniel 2018-03-09 12:18:01 UTC
Created attachment 137934 [details]
/var/log/messages during X crash
Comment 7 Daniel 2018-03-09 12:55:45 UTC
I can't seem to reproduce the crash atm, so I am posting a snippet of the syslog of the time this crash probably appeared.

Both EXA or the modesetting driver are indeed avoiding the problem and compositor and framebuffers work fine then.

It will take me some more time before I can try a bisect though, as I don't have permanent access to this box.
Comment 8 Petrus 2018-03-13 03:02:42 UTC
Created attachment 138054 [details]
sceenshot while gdm start

Have similar situation using xorg-server-1.19.99.901, xf86-video-ati-18.0.0 (radeon) on Gentoo.

The screen blurred a few second when gdm has just started. Attch a screenshot.

dmesg:
[  +0.652607] radeon 0000:01:00.0: evergreen_cs_track_validate_texture:855 texture bo too small (layer size 4325376, offset 0, max layer 1, depth 1, bo size 4096) (1408 768)
[  +0.000032] [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !

/var/log/messages:
Mar 12 17:16:04  /usr/libexec/gdm-x-session[2158]: radeon: The kernel rejected CS, see dmesg for more information (-22).
Mar 12 17:16:04  kernel: radeon 0000:01:00.0: evergreen_cs_track_validate_texture:855 texture bo too small (layer size 4325376, offset 0, max layer 1, depth 1, bo size 4096) (1408 768)
Mar 12 17:16:04  kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !
Comment 9 Petrus 2018-03-13 03:07:33 UTC
Downgrading to xorg-server-1.19.5 solves the problem, too.
Comment 10 chris0033547 2018-05-23 21:19:39 UTC
Hi,

I opened a discussion about a similar error here. Or maybe this is the same error?

https://bbs.archlinux.org/viewtopic.php?id=237291

That discussion includes screenshots of the error and some log files.

Best Regards.
Comment 11 Michel Dänzer 2018-05-24 08:41:53 UTC
(In reply to chris0033547 from comment #10)
> I opened a discussion about a similar error here. Or maybe this is the same
> error?
> 
> https://bbs.archlinux.org/viewtopic.php?id=237291

Looks like it's the same.


I think I know what the issue is, and have been slowly working towards a solution, but I don't know when it'll be ready for testing. Meanwhile, the options to avoid this issue with Xorg 1.20 are using EXA or the modesetting driver.
Comment 12 Michel Dänzer 2018-06-04 09:55:17 UTC
*** Bug 106797 has been marked as a duplicate of this bug. ***
Comment 13 Ian Kidd 2018-06-04 14:15:48 UTC
(In reply to Michel Dänzer from comment #12)
> *** Bug 106797 has been marked as a duplicate of this bug. ***

Submitter of this duplicate bug (since it seemed different in some ways).

For the record, I don't see any issues until after the SDDM greeter, after KDE initiates.  Then it consistently garbles the screens under Mesa 18.0.4-1, under earlier version it just makes getting the other monitors up impossible but the first GPU monitors are usuable.
Comment 14 Michel Dänzer 2018-06-04 14:24:55 UTC
(In reply to Ian Kidd from comment #13)
> Submitter of this duplicate bug (since it seemed different in some ways).

Does your issue happen with the Xorg modesetting driver as well? If yes, it's probably a different issue after all; otherwise it's probably the same root cause.
Comment 15 Matt Turner 2018-06-08 15:29:37 UTC
Daniel said everything is working in the downstream bug report (https://bugs.gentoo.org/649736#c7)
Comment 16 Michel Dänzer 2018-06-08 15:35:09 UTC
(In reply to Matt Turner from comment #15)
> Daniel said everything is working in the downstream bug report

He must be getting lucky; I'm also not running into the issue (fundamentatlly a BO lifetime mismatch between the Xorg driver and Mesa) on my work laptop, but it's still there.
Comment 17 James 2018-06-12 23:05:15 UTC
I'm also watching this thread.

I have an HD 5570 / Redwood PRO running three displays with Arch Linux.  Upgrading xorg-server from 1.19.6-2 to 1.20.0-7 and xf86-video-ati from  1:7.10.0-1 to 1:18.0.1-2 causes display corruption with scan-out in the wrong sequence, cursor disappearing, failure to configure all three displays, and gigabytes of repetitive warnings in the system log, in the nature of:

 kernel: radeon 0000:05:00.0: evergreen_cs_track_validate_cb:481 cb[0] bo too small (layer size 15728640, offset 0, max layer 1, bo size 5242880, slice 61439)
 kernel: radeon 0000:05:00.0: evergreen_cs_track_validate_cb:485 problematic surf: (3840 1024) (4 4 1 1 2 1024 2)
 kernel: radeon 0000:05:00.0: evergreen_packet3_check:1948 invalid cmd stream 666
 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !

and

 kernel: [drm:radeon_cs_parser_relocs [radeon]] *ERROR* gem object lookup failed 0xa
 kernel: [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -2!
 kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000A, can't create framebuffer

Sometimes the display can be made clear by switching to a virtual terminal and back to X, but generally, the display corruption will vary each time the X server is restarted, suggesting that the hardware configuration is not being set consistently or properly.  Downgrading mesa does not resolve the problem, and the problem tracks with the X server version.

The Xorg log showed:

[ 82831.532] (WW) RADEON(0): get vblank counter failed: Invalid argument
[ 82831.532] (WW) RADEON(0): flip queue failed: Invalid argument
[ 82831.532] (WW) RADEON(0): Page flip failed: Invalid argument
[ 82831.532] (EE) RADEON(0): present flip failed

but then, blindly trying

~/.drirc

<driconf>
    <device screen="0" driver="dri2">
        <application name="Default">
            <option name="vblank_mode" value="0" />
        </application>
    </device>
</driconf>

seem to resolve these errors - which makes no sense to me, since the driver is suppose to default to dri3.

For reference, the openbox window manager, commonly used with lxqt, cannot be made to display correctly, but kwin will work sometimes, and the wayland compositors do not seem to be affected.

I also have another machine with an HD 4350 / RV710 running a single display which does _not_ show any of these problems at all.
Comment 18 mkkot 2018-06-14 21:56:40 UTC
Hello,
is this the same issue: https://bugs.freedesktop.org/show_bug.cgi?id=106913
?
Comment 19 James 2018-06-23 19:23:03 UTC
An odd thing I've noticed, maybe representing the same underlying issue, emerging at the  same time as the "Invalid command stream" problem, with the upgrade from xorg-server 1.19 to 1.20, I see differing kinds of display corruption with successive restarts of the X server.  For instance, sometimes the cursor is normal but the remainder of the display is corrupted, but other times, only the cursor is corrupted and the remainder of the display is normal.  In particular, in the session I currently have running, the cursor is corrupted on the desktop, but normal over any application window, and the interesting thing is that there are none of the "Invalid command stream" errors to be seen in the log.

Additionally, I upgraded Arch Linux on an old laptop with NVidia  "GeForce 7150M / nForce 630M" hardware, and started seeing similar different results when restarting the X server, for instance, after trying kwin_wayland and Xwayland, and then restarting the X11 server running with nouveau.  And so, this display corruption seems not tied to just the radeon driver.

I am given the impression that there is a connection between the "Invalid command stream" errors and these screen corruption errors.

Also, I was reading at https://en.wikipedia.org/wiki/Direct_Rendering_Manager
"In 2014 Matt Roper (Intel) developed the universal planes (or unified planes) concept by which framebuffers (primary planes), overlays (secondary planes) and cursors (cursor planes) are all treated as a single type of object with an unified API.[151] Universal planes support provides a more consistent DRM API with fewer, more generic ioctls.[33]"

In that sense, it appears that the display corruption may be moving between "primary planes" and "cursor planes", as a naive interpretation.

Michel, if you think that these two issues, the "Invalid command stream" errors and the differing forms of display corruption with X server restarts, please let me know, and I will avoid filing a separate bug report for the display corruption, though I don't know if this is hardware driver related, DRM/render related, or KMS related.
Comment 20 Michel Dänzer 2018-07-05 12:23:04 UTC
This patch series should fix it: https://patchwork.freedesktop.org/series/45979/
Comment 21 Hi-Angel 2018-07-08 22:17:13 UTC
(In reply to Michel Dänzer from comment #20)
> This patch series should fix it:
> https://patchwork.freedesktop.org/series/45979/

Thank you, works for me! (I'm not the OP though) I've been using it for the weekend, everything been fine, and I didn't see cursor or sddm corruptions anymore.

Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Comment 22 Michel Dänzer 2018-07-10 15:26:40 UTC
Thanks for the report and for testing the patches, fixed in Git master:

commit 3c4c0213c11d623cba7adbc28dde652694f2f758
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Fri Jun 29 17:57:03 2018 +0200

    glamor: Use GBM for BO allocation when possible
Comment 23 mkkot 2018-07-22 11:22:06 UTC
Another test from me:
I created a package for Archlinux using commit 3c4c0213c11d623cba7adbc28dde652694f2f758 and it seems that this error is fixed!
https://bugs.archlinux.org/task/58874
Comment 24 mkkot 2018-07-22 11:23:24 UTC
*** Bug 106913 has been marked as a duplicate of this bug. ***
Comment 25 James 2018-07-22 17:02:18 UTC
Sorry - I don't seem to have been on the CC list.  I just tested this Sunday, July 22.

I am running with the mkkot Arch Linux package, xf86-video-ati-git, 1:18.0.1.r12.g3c4c0213-1, 22 Jul 2018.  I have the HD 5570 / Redwood PRO running three displays.

Starting lxqt with openbox, the original "Invalid command stream" errors are not seen, but I still see:

 kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000C, can't create framebuffer
 kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x00000400, can't create framebuffer

The 1st display will receive sync, but the screen is black with no cursor, and the 2nd and 3rd displays receive nothing and go to suspend.

Repeatedly stopping and re-starting lxqt seems to always produce the same result, where before, sometimes the screens would light, with various kinds of display corruption, in addition to the the black screen result described.  Now, just the black screens.

I can run weston and kwin_wayland, but the regular X11 server is unusable.
Comment 26 Michel Dänzer 2018-07-23 06:56:50 UTC
(In reply to James from comment #25)
>  kernel: radeon 0000:05:00.0: No GEM object associated to handle 0x0000000C,
> can't create framebuffer

That's bug 107297, a regression due to the fix for this bug, which is now fixed in upstream Git master as well.
Comment 27 James 2018-07-23 14:27:42 UTC
Yes!  That fixed it.  Michael, thank you very much for your work.
Comment 28 James 2018-07-23 14:33:06 UTC
Oops - I apologize for misspelling your name.  Thank you Michel.
Comment 29 Michel Dänzer 2018-08-15 15:50:05 UTC
*** Bug 107528 has been marked as a duplicate of this bug. ***
Comment 30 Michel Dänzer 2018-09-04 07:38:13 UTC
*** Bug 107819 has been marked as a duplicate of this bug. ***
Comment 31 Michel Dänzer 2018-09-10 09:46:32 UTC
*** Bug 107876 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.