Bug 28438 - [bisected] incorrect character in gnome-terminal under compiz
[bisected] incorrect character in gnome-terminal under compiz
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
unspecified
All Linux (All)
: high normal
Assigned To: Kristian Høgsberg
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-08 02:30 UTC by Yi Sun
Modified: 2010-11-23 23:24 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
incorrect character in gnome-terminal under compiz (273.08 KB, image/x-png)
2010-07-22 01:29 UTC, Yi Sun
Details
the modified patch (1.25 KB, text/plain)
2010-07-27 01:57 UTC, Yi Sun
Details
Submit batch buffers from flush callback chain (2.98 KB, patch)
2010-07-29 15:41 UTC, Kristian Høgsberg
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Yi Sun 2010-06-08 02:30:37 UTC
System Environment:
--------------------------
Arch:           x86-64
Platform:       G45

Libdrm:       (master)2.4.20-19-g73a42a645201a85ce2fe4fc77754df67e5097fc9
Xserver:      (master)xorg-server-1.8.0-309-g643cb6e87c10ab554c03ada81930001a8ebcc909
Xf86_video_intel:   (master)2.11.0-151-g6db1e5231b7a0e79611f771d4efea686f7849e04

Bug detailed description:
-------------------------
The characters typed in console are wrong with the compiz. Just like bug #27421. But it is correct without compiz. 
The issue comes out more frequently when typing at the bottom of the console.
By bisect,we found that the commit a86e4852f463c289eba019a4d231fbd6cae82f27 is  good. The following commit  c4775a27e3aaa2006b98f225387499b79bc609ef can't be built successfully and the commit 0bf1ddd60a3660974ad54e410533eeb481949364 fixed its built issue so it is the first bad commit which can compiled and tested.

Reproduce steps:
----------------
1.start gnome
2.open a terminal
3.typ character
Comment 1 Yi Sun 2010-06-13 02:11:26 UTC
More mesa commit information are as following:

(master)commit a86e4852f463c289eba019a4d231fbd6cae82f27
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Mon May 10 16:18:55 2010 -0400

    intel: Only flush fake front buffer on API level glFlush()

    Without this patch, any old intel_flush() call will cause a round trip to
    the server and do a copy from fake to real front.  We only actually
    guarantee that frontbuffer results show up when glFlush() ia called, so
    move the flushing to intel_glFlush().

    We also need to flush fake to front before getting new buffers, but
    we just handle that manually.


(master)commit 0bf1ddd60a3660974ad54e410533eeb481949364
Author: Vinson Lee <vlee@vmware.com>
Date:   Mon May 10 15:41:22 2010 -0700

    i915: Drop intelFlush().

    This was missed in commit c4775a27e3aaa2006b98f225387499b79bc609ef.
    Fixes i915 build.
Comment 2 Gordon Jin 2010-06-21 19:14:49 UTC
Kristian, can you reproduce?
Comment 3 Yi Sun 2010-06-25 01:49:56 UTC
The issue also exists on the 7.8 branch. With bisect, we found the culprit commit is: 7c50d29f7ced3d60e52ee0146d982b49ea421de2 
And the one following it is the first good commit:
c642f3941ba2ab68135037e1fcb1c29dcb820de7 

More commit information is as following:

commit 7c50d29f7ced3d60e52ee0146d982b49ea421de2
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Fri Jan 8 12:35:47 2010 -0500

    intel/DRI2: add DRI2flushExtension support with invalidate hook

    Needed to support the SwapBuffers code properly.

    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>

commit c642f3941ba2ab68135037e1fcb1c29dcb820de7
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Jan 4 16:26:17 2010 -0500

    xdemos/glsync: handle no sync method better

    Print out count, finish rendering, etc.  Makes the -sn option more useful.
Comment 4 Chris Wilson 2010-07-04 09:25:29 UTC
Although the bisect seems to imply a regression in mesa, I am curious if the flushing fixes for i965 in 2.12 were perhaps the underlying bug?
Comment 5 Kristian Høgsberg 2010-07-19 05:05:17 UTC
I don't actually see anything with xterm, but I can get gnome-terminal to skip rendering in some cases.  Are you using xterm or gnome-terminal?  Can you attach a screenshot of the bad rendering?  Are you running compiz in direct rendering mode and does it happen if you run it in indirect mode?
Comment 6 Yi Sun 2010-07-22 01:29:00 UTC
Created attachment 37297 [details]
incorrect character in gnome-terminal  under compiz
Comment 7 Yi Sun 2010-07-22 01:43:16 UTC
we are using gnome-terminal to reproduce the issue. 
And this issue only turns out in direct rendering mode. When we ran compiz in indirect rendering mode(compiz --indirect-rendering), the bug disappeared.
Comment 8 Eric Anholt 2010-07-26 18:55:39 UTC
I have a patch at:

http://cgit.freedesktop.org/~anholt/xf86-video-intel/log/?h=compositing-fix

that may fix the problem -- if someone can reproduce the problem can test with with compositing-fix~1 and compositing-fix, that would be nice.
Comment 9 Yi Sun 2010-07-27 01:56:39 UTC
(In reply to comment #8)
> I have a patch at:
> http://cgit.freedesktop.org/~anholt/xf86-video-intel/log/?h=compositing-fix
> that may fix the problem -- if someone can reproduce the problem can test with
> with compositing-fix~1 and compositing-fix, that would be nice.

We modified the patch a little, and then build 2D driver with the new patch.
The bug doesn't come out again.
Comment 10 Yi Sun 2010-07-27 01:57:34 UTC
Created attachment 37411 [details]
the modified patch
Comment 11 Kristian Høgsberg 2010-07-27 06:23:02 UTC
(In reply to comment #8)
> I have a patch at:
> 
> http://cgit.freedesktop.org/~anholt/xf86-video-intel/log/?h=compositing-fix
> 
> that may fix the problem -- if someone can reproduce the problem can test with
> with compositing-fix~1 and compositing-fix, that would be nice.

I don't think this is the patch we're looking for.  We used to have a getbuffers round trip for each glXBindTexImage, but with the invalidate events we can cache the bo name in the client and avoid that.  The round trip was probably what kept things in sync, and tfp probably broke when we stopped doing that.  But it has to work without it.

The way tfp works is that the server submits rendering to against a drawable and sends out the damage events.  The server does write the event to the client output buffer before the driver emits the rendering to the batch buffer, but the client output buffer isn't actually written to the client until after the block handler, where we submit the batch buffer.

So the ordering on the server side seems to be OK, except in the case where the client output buffer overflows and we flush it without submitting the batch buffer.  This sounds like it could be the problem, but it isn't since the corruption happens even without the overflow condition triggering.
Comment 12 Kristian Høgsberg 2010-07-27 09:15:38 UTC
(In reply to comment #11)
> (In reply to comment #8)
> > I have a patch at:
> > 
> > http://cgit.freedesktop.org/~anholt/xf86-video-intel/log/?h=compositing-fix
> > 
> > that may fix the problem -- if someone can reproduce the problem can test with
> > with compositing-fix~1 and compositing-fix, that would be nice.
> 
> I don't think this is the patch we're looking for.  We used to have a
> getbuffers round trip for each glXBindTexImage, but with the invalidate events
> we can cache the bo name in the client and avoid that.  The round trip was
> probably what kept things in sync, and tfp probably broke when we stopped doing
> that.  But it has to work without it.
> 
> The way tfp works is that the server submits rendering to against a drawable
> and sends out the damage events.  The server does write the event to the client
> output buffer before the driver emits the rendering to the batch buffer, but
> the client output buffer isn't actually written to the client until after the
> block handler, where we submit the batch buffer.
> 
> So the ordering on the server side seems to be OK, except in the case where the
> client output buffer overflows and we flush it without submitting the batch
> buffer.  This sounds like it could be the problem, but it isn't since the
> corruption happens even without the overflow condition triggering.

Indeed, I just bisected the problem myself and the offending commit is:

commit f829e76d8835382b8a52224dfbb2556360e41ffc
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Tue May 11 10:32:40 2010 -0400

    intel: Don't update renderbuffers in intelSetTexBuffer2 if we have invalidat

which is the commit where I removed the getbuffers roundtrip on glXBindTexImage.
Comment 13 Kristian Høgsberg 2010-07-29 15:41:12 UTC
Created attachment 37447 [details] [review]
Submit batch buffers from flush callback chain 

This is the fix for the root cause.  It's a patch against xf86-video-intel.  Sent to intel-gfx for review.
Comment 14 Kristian Høgsberg 2010-07-31 19:21:14 UTC
Patch committed to xf86-video-intel, closing bug.


commit 69d65f9184006eac790efcff78a0e425160e95aa
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Thu Jul 29 18:31:48 2010 -0400

    Submit batch buffers from flush callback chain
    
    There are a few cases where the server will flush client output buffers
    but our block handler only catches the most common (before going into select
    If the server flushes client buffers before we submit our batch buffer,
    the client may receive a damage event for rendering that hasn't happened yet
    
    Instead, we can hook into the flush callback chain, which the server will
    invoke just before flushing output.  This lets us submit batch buffers
    before sending out events, preserving ordering.
    
    Fixes 28438: [bisected] incorrect character in gnome-terminal under compiz
    https://bugs.freedesktop.org/show_bug.cgi?id=28438
    
    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Comment 15 Yi Sun 2010-11-23 23:24:58 UTC
Already fixed.