Bug 25884 - xine-ui's on-screen display triggers KMS errors with r100 and rv280
Summary: xine-ui's on-screen display triggers KMS errors with r100 and rv280
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-04 03:45 UTC by Chris Rankin
Modified: 2010-06-08 13:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
possible fix (12.25 KB, patch)
2010-03-19 09:10 UTC, Alex Deucher
no flags Details | Splinter Review
revised patch - much better, but no cigar yet (11.81 KB, patch)
2010-03-20 14:53 UTC, Chris Rankin
no flags Details | Splinter Review

Description Chris Rankin 2010-01-04 03:45:02 UTC
This
Comment 1 Chris Rankin 2010-01-04 03:48:22 UTC
Play a video using xine, and then right-click on the video window to bing up the context menu. Select "Stream / Information (OSD)". KMS badness happens.

On r100, the dmesg log fills us with messages like:
[drm:r100_cs_packet_parse] *ERROR* Packet (0:0:99) end after CS buffer (34) !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:r100_cs_packet_parse] *ERROR* Packet (0:0:99) end after CS buffer (90) !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:r100_cs_packet_parse] *ERROR* Packet (0:0:99) end after CS buffer (34) !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

On rv280, the Xserver stops responding entirely.

Note that this bug is similar to this RedHat bug for R300:
https://bugzilla.redhat.com/show_bug.cgi?id=540133
Comment 3 Chris Rankin 2010-01-13 14:50:57 UTC
(In reply to comment #2)
> Can you try mesa from git master or the 7.7 branch?  Dave fixed some issues
> there:

Both these fixes seem to be one-liners in a much stabler part of Mesa, and so I patched them both by hand into Fedora's SRPM and then installed the new mesa-libGL and mesa-dri-drivers package on the PC with the rv100. And it made absolutely no difference whatsoever. Mind you, the comment for the radeon_cs_legacy.c patch suggests that it is for the NON-KMS path, so maybe that's no surprise? (This bug is about the KMS path.)
Comment 4 Chris Rankin 2010-01-23 12:01:40 UTC
This bug is still present with the following configuration:

Stock Linux kernel 2.6.32.5

And Fedora 12 packages:
mesa-dri-drivers-7.7-2.fc12.i686
mesa-libGL-7.7-2.fc12.i686
libdrm-2.4.17-1.fc12.i686
xorg-x11-drv-ati-6.13.0-0.20.20091221git4b05c47ac.fc12.i686

However, the dmesg errors have now changed to an endless stream of:

[drm:r100_cs_packet_parse] *ERROR* Unknown packet type 1 at 0 !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

This bug is also present with my rv280, and my Radeon 9550 (although the Xserver doesn't stop responding with this card - the video just temporarily freezes instead), but curiously not at all with my HD4890.
Comment 5 Chris Rankin 2010-03-19 05:20:24 UTC
This bug is still present with vanilla Linux 2.6.33.1.
Comment 6 Pauli 2010-03-19 06:54:28 UTC
xine is using clip regions to render the text top of video. The number of clip regions is too high for single command stream which causes trouble.

Maybe better implementation would be using VBO for vertex data so we could handle large number of clip regions better. (and possible only emit them when there is change)

In any case it would be better option to render the text on top of video would be providing way of rendering some image on top of video frame.

Comment 7 Chris Rankin 2010-03-19 07:09:37 UTC
(In reply to comment #6)
> Maybe better implementation would be using VBO for vertex data so we could
> handle large number of clip regions better. (and possible only emit them when
> there is change)

Hmm, this bug exists on lowly Radeon 7000 cards as well; do they support VBOs?

> The number of clip regions is too high for single command stream which causes
> trouble.

Does the maximum size of a single command stream increase in subsequent Radeon iterations? Because on rv100 and rv280, the X server must be restarted, but the X server recovers with my RV350 and handles the stream fine with my RV790.

(BTW, how does a "stream" have a "maximum size" anyway? By mere definition of a "stream", I'd expect it to "start" and then chatter away until it met an "end" marker. So I suspect it doesn't work like that really ... ;-).)
Comment 8 Michel Dänzer 2010-03-19 07:11:33 UTC
Pauli, please elaborate. How exactly are 'too many cliprects' causing the symptoms described in this report?
Comment 9 Pauli 2010-03-19 07:20:08 UTC
> --- Comment #8 from Michel Dänzer <michel@daenzer.net>  2010-03-19 07:11:33 PST ---
> Pauli, please elaborate. How exactly are 'too many cliprects' causing the
> symptoms described in this report?

Because r200 textured video is rendering the clipping regions using
vertices embed into command stream. So end result is that vertex data
takes more space than single CS array can hold. For xine case that I
tested it took about 28k dwords.

Single buffer can hold only 16k dwords.
Comment 10 Michel Dänzer 2010-03-19 07:23:54 UTC
On Fri, 2010-03-19 at 16:20 +0200, Pauli Nieminen wrote: 
> > --- Comment #8 from Michel Dänzer <michel@daenzer.net>  2010-03-19 07:11:33 PST ---
> > Pauli, please elaborate. How exactly are 'too many cliprects' causing the
> > symptoms described in this report?
> 
> Because r200 textured video is rendering the clipping regions using
> vertices embed into command stream. So end result is that vertex data
> takes more space than single CS array can hold. For xine case that I
> tested it took about 28k dwords.
> 
> Single buffer can hold only 16k dwords.

So the textured video code needs to be fixed to use smaller draw packets
in several CS. Even when using VBOs limits like this need to be dealt
with somehow.


Comment 11 Chris Rankin 2010-03-19 07:25:37 UTC
For reference, with vanilla 2.6.33.1 and my RV350, my dmesg log now fills up
with messages like this:

[drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

But the X server still recovers when the on-screen display disappears, and then
normal playback resumes.
Comment 12 Chris Rankin 2010-03-19 07:28:28 UTC
(In reply to comment #10)
> So the textured video code needs to be fixed to use smaller draw packets
> in several CS. Even when using VBOs limits like this need to be dealt
> with somehow.

I have a minor shed-load of different Radeons to test any changes with, BTW ;-).

Comment 13 Alex Deucher 2010-03-19 09:10:19 UTC
Created attachment 34246 [details] [review]
possible fix

Does this patch help?
Comment 14 Chris Rankin 2010-03-19 16:19:44 UTC
(In reply to comment #13)
> Does this patch help?

I'll test wth the rv100 and rv280 tomorrow, but the results with the RV350 at least are unchanged:

[drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
Comment 15 Chris Rankin 2010-03-20 04:19:26 UTC
(In reply to comment #13)
> Does this patch help?

I've just tested with my rv280, and:
- xine-ui's on-screen display appears correctly in the window
- the dmesg log does not fill up with errors
- the X server does not need restarting afterwards.

However, the video window is now completely black too :-)!
Comment 16 Chris Rankin 2010-03-20 08:17:31 UTC
(In reply to comment #13)
> Does this patch help?

Same results with my rv100 as with my rv280 - xine-ui's on-screen display now behaves correctly, but the movie windows is now completely black too.

Comment 17 Alex Deucher 2010-03-20 11:22:24 UTC
(In reply to comment #16)
> (In reply to comment #13)
> > Does this patch help?
> 
> Same results with my rv100 as with my rv280 - xine-ui's on-screen display now
> behaves correctly, but the movie windows is now completely black too.
> 

Anything in dmesg?
Comment 18 Chris Rankin 2010-03-20 11:50:34 UTC
(In reply to comment #17)
> Anything in dmesg?

No, dmesg is "clean" for both rv100 and rv280.
Comment 19 Chris Rankin 2010-03-20 11:57:40 UTC
That's an odd for-loop:

+    if (radeon_cs_space_remaining(pScrn) > (nBox * 3 * vtx_count + 5)) {
+	    loops = 1;
+	    nBox_loop = nBox;
+    } else {
+	    loops = nBox;
+	    nBox_loop = 1;
+    }
+
+    for (i = 0; i > loops; i++) {

Surely this loop can never execute...?
Comment 20 Alex Deucher 2010-03-20 12:04:18 UTC
(In reply to comment #19)
> That's an odd for-loop:
> 
> +    if (radeon_cs_space_remaining(pScrn) > (nBox * 3 * vtx_count + 5)) {
> +           loops = 1;
> +           nBox_loop = nBox;
> +    } else {
> +           loops = nBox;
> +           nBox_loop = 1;
> +    }
> +
> +    for (i = 0; i > loops; i++) {
> 
> Surely this loop can never execute...?
> 

Whoops, yeah, reverse the logic on those.
Comment 21 Chris Rankin 2010-03-20 12:47:15 UTC
(In reply to comment #20)
> Whoops, yeah, reverse the logic on those.

OK, I've just tested my rv100 again and the video is now back in the window :-). But activating the on-screen display now crashes the X server instead. The Xorg.0.log file confirms the crash is in radeon_drv.so, but then we probably guessed that..

There's nothing in the dmesg log except:
Process 3650(Xorg) has RLIMIT_CORE set to 0
Aborting core
Comment 22 Chris Rankin 2010-03-20 13:02:42 UTC
    if (radeon_cs_space_remaining(pScrn) > (nBox * 3 * vtx_count + 5)) {
            loops = 1;
            nBox_loop = nBox;
    } else {
            loops = nBox;
            nBox_loop = 1;
    }

    for (i = 0; i < loops; i++) {
        ...

        while (nBox_loop--) {
            ...

Does this mean that nBox_loop will be zero when the for-loop executes for the *second* time? Which means that the while-loop isn't going to stop again until it wraps that 32 bit integer round.


Comment 23 Chris Rankin 2010-03-20 13:26:28 UTC
I added the following line directly beneath the while-loop:

    nBox_loop = 1;

The idea is that this line won't have any effect unless the for-loop executes more than once. And now the X server doesn't crash, and the on-screen display appears :-)!

But it's still not right. In fact, it is now misbehaving more like my RV350 misbehaves instead. The PC with the rv100 is running vanilla 2.6.32.10, and the dmesg log is now full of messages like this:

[drm:r100_cs_track_check] *ERROR* [drm] No buffer for color buffer 0 !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

And as with the RV350, the video window is freezing while the on-screen-display is visible, apart from a small band at the very top.
Comment 24 Chris Rankin 2010-03-20 14:43:54 UTC
The rv280's X server is also surviving xine-ui's on-screen display now with the revised patch, but it looks like the R200's command parser still has an issue with IMMDs. (Whatever they are...)

[drm:r100_cs_track_check] *ERROR* IMMD draw 14932 dwors but needs 18 dwords
[drm:r100_cs_track_check] *ERROR* VAP_VF_CNTL.NUM_VERTICES 3, VTX_SIZE 6
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
[drm:r100_cs_packet_parse] *ERROR* Packet (0:3:14932) end after CS buffer (9306) !
[drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

The "No buffer for z buffer" error is consistent with the RV350's results. (The rv280 is in a PC running vanilla 2.6.33.1.)
Comment 25 Chris Rankin 2010-03-20 14:53:50 UTC
Created attachment 34269 [details] [review]
revised patch - much better, but no cigar yet

I've fixed some bugs in the original patch, and now the rv100 and rv280 misbehave more like the RV350. (The patch also exposes another bug in the R200's command parser.) But the X server no longer needs to be restarted after activating xine-ui's on-screen display, which is something.
Comment 26 Alex Deucher 2010-03-20 16:16:09 UTC
(In reply to comment #24)
> The rv280's X server is also surviving xine-ui's on-screen display now with the
> revised patch, but it looks like the R200's command parser still has an issue
> with IMMDs. (Whatever they are...)
> 
> [drm:r100_cs_track_check] *ERROR* IMMD draw 14932 dwors but needs 18 dwords
> [drm:r100_cs_track_check] *ERROR* VAP_VF_CNTL.NUM_VERTICES 3, VTX_SIZE 6
> [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
> [drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
> [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
> [drm:r100_cs_track_check] *ERROR* [drm] No buffer for z buffer !
> [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
> [drm:r100_cs_packet_parse] *ERROR* Packet (0:3:14932) end after CS buffer
> (9306) !
> [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
> 
> The "No buffer for z buffer" error is consistent with the RV350's results. (The
> rv280 is in a PC running vanilla 2.6.33.1.)
> 

The problem is you need to re-emit all the previous 3D state (default state and state setup in the textured video functions) if you cross over into a new command buffer.  I'm not sure of a clean way to handle this without some sort of state tracking.  We should probably track the state like the mesa driver does and mark it as dirty when we flush so it can be re-emitted in the next command buffer.
Comment 27 Chris Rankin 2010-03-20 17:01:22 UTC
(In reply to comment #26)
> The problem is you need to re-emit all the previous 3D state (default state and
> state setup in the textured video functions) if you cross over into a new
> command buffer.

So does this mean that R300+ was already splitting command buffers, like the patch does for R100 and R200?

Note that my RV790 has no problems with xine-ui's on-screen display. Is this because it already correctly splits across command buffers, or because its default buffer size is big enough not to need splitting in the first place?
Comment 28 Alex Deucher 2010-03-21 09:06:47 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > The problem is you need to re-emit all the previous 3D state (default state and
> > state setup in the textured video functions) if you cross over into a new
> > command buffer.
> 
> So does this mean that R300+ was already splitting command buffers, like the
> patch does for R100 and R200?

R3xx->R5xx only emits one set of verts per draw command since it doesn't have a native rect primitive but rather uses clipped triangles, so it has to update the clipping info between draws.

> 
> Note that my RV790 has no problems with xine-ui's on-screen display. Is this
> because it already correctly splits across command buffers, or because its
> default buffer size is big enough not to need splitting in the first place?
> 

R6xx+ is less of an issue since it uses vertex buffers rather than embedding verts in the command stream (r6xx+ doesn't support inline verts like older asics).

None of the ddx accel code (EXA, Xv) tracks 3D state which would be needed to solve this properly.
Comment 29 Chris Rankin 2010-03-21 10:21:55 UTC
(In reply to comment #28)
> None of the ddx accel code (EXA, Xv) tracks 3D state which would be needed to
> solve this properly.

Interestingly, I've also noticed that sometimes the "struct radeon_renderbuffer" is either NULL or incorrect in Mesa R300's emit_zb_offset() function. (This is with KMS enabled.) Could it just be a coincidence that both textured video and Mesa are complaining about bad/missing z buffers now?

Comment 30 Alex Deucher 2010-03-21 10:31:19 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > None of the ddx accel code (EXA, Xv) tracks 3D state which would be needed to
> > solve this properly.
> 
> Interestingly, I've also noticed that sometimes the "struct
> radeon_renderbuffer" is either NULL or incorrect in Mesa R300's
> emit_zb_offset() function. (This is with KMS enabled.) Could it just be a
> coincidence that both textured video and Mesa are complaining about bad/missing
> z buffers now?
> 

They are unrelated.  The command check just makes sure the zbuffer is valid.  In the mesa case there's a bug in either mesa or the 3d driver that causes the depth buffer pointer to be incorrect.  This results in a command buffer with an invalid z buffer which the CS checker flags.  In the ddx, when we flush in the Xv drawing code, the new command buffer is missing the 3D state that was set earlier in the function, so the CS checker flags it as well.  two different bugs that result in the same drm error.
Comment 31 Alex Deucher 2010-04-16 17:54:42 UTC
Fixed in git master. commmits:
c1136f94b80346065893f8a43c0fbf60c8c9b057 - 6c07816025f70e522986614c29c300ea13e5d932
Comment 32 Chris Rankin 2010-04-17 14:28:54 UTC
(In reply to comment #31)
> Fixed in git master. commmits:
> c1136f94b80346065893f8a43c0fbf60c8c9b057 -
> 6c07816025f70e522986614c29c300ea13e5d932

I have now tested this:
- The RV350 is fine.
- The rv280 now shows the on-screen display correctly and there are no messages written to the dmesg log. So I would agree that this is fixed, except that the video frame rate while the on-screen display is active seems to drop considerably.
- The r100 also shows the on-screen display correctly without filling the dmesg log, but the video picture now "ghosts" horribly like an old TV set with a poor signal. This "ghosting" now happens regardless of whether the on-screen display is active or not. The CPU usage for video playback is also a lot higher.
Comment 33 Alex Deucher 2010-04-18 10:51:01 UTC
The CPU usage will be higher because the state and huge numbers of vertexes will have to be send to the drm.  The real fix is probably to either use vertex buffers for Xv or fix xine to not use clipping for the OSD.  I'll take a look at the r1xx issues.
Comment 34 Chris Rankin 2010-06-08 13:10:16 UTC
(In reply to comment #33)
> I'll take a look at the r1xx issues.

The rv100 issue now looks fixed with the following commit:

commit 426114b4a99d37b394efe3336968bb0ab9b6e9ae
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jun 8 11:34:35 2010 +1000

    xv: fix Xv on M6/RV100 under KMS.
    
    pRADEONEnt->HasCRTC2 wasn't setup under KMS.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.