Bug 47418 - [sna] bizarre corruption in upload buffers
Summary: [sna] bizarre corruption in upload buffers
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 48619 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-03-16 09:53 UTC by Magnus Kessler
Modified: 2012-05-24 03:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (40.73 KB, text/plain)
2012-03-16 11:30 UTC, Magnus Kessler
no flags Details
Xorg log with crash backtrace (42.64 KB, text/plain)
2012-03-16 11:31 UTC, Magnus Kessler
no flags Details
Align the row[index] (1.95 KB, patch)
2012-03-30 12:49 UTC, Chris Wilson
no flags Details | Splinter Review
Rendering artefacts in gimp menu (35.38 KB, image/png)
2012-03-31 02:22 UTC, Magnus Kessler
no flags Details

Description Magnus Kessler 2012-03-16 09:53:46 UTC
The xorg server crashes reproducibly within the intel driver.

To reproduce, use firefox 11 and go to any twitter page, such as
http://twitter.com/#!/therealcwilson

Program received signal SIGSEGV, Segmentation fault.
0x00007f1211bec837 in inplace_row (width=86, row=0x7fff17f0fbb0 "\377", 
    active=0x7fff17f0f7c0)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1503
1503    ../xf86-video-intel-9999/src/sna/sna_trapezoids.c: No such file or directory.
        in ../xf86-video-intel-9999/src/sna/sna_trapezoids.c
(gdb) bt
#0  0x00007f1211bec837 in inplace_row (width=86, row=0x7fff17f0fbb0 "\377", 
    active=0x7fff17f0f7c0)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1503
#1  tor_inplace (converter=0x7fff17f0eea0, buf=0x7fff17f0fbb0 "\377", 
    scratch=<optimized out>, mono=<optimized out>)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1729
#2  0x00007f1211bef9b4 in trapezoid_mask_converter (op=3 '\003', 
    src=<optimized out>, dst=<optimized out>, maskFormat=<optimized out>, 
    src_x=<optimized out>, src_y=<optimized out>, ntrap=16, traps=0x2e4b50c)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:3547
#3  0x00007f1211bf616d in sna_composite_trapezoids (op=<optimized out>, 
    src=<optimized out>, dst=0x2e38b40, maskFormat=0x194f368, 
    xSrc=<optimized out>, ySrc=<optimized out>, ntrap=16, traps=0x2e4b50c)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:4484
#4  0x00000000004fb921 in ProcRenderTrapezoids (client=0x2a4ee70)
    at /usr/src/debug/x11-base/xorg-server-1.12.0/xorg-server-1.12.0/render/render.c:777
#5  0x0000000000437a91 in Dispatch ()
    at /usr/src/debug/x11-base/xorg-server-1.12.0/xorg-server-1.12.0/dix/dispatch.c:439
#6  0x00000000004264ca in main (argc=<optimized out>, argv=0x7fff17f0ff48, 
    envp=<optimized out>)
    at /usr/src/debug/x11-base/xorg-server-1.12.0/xorg-server-1.12.0/dix/main.c:287
Comment 1 Chris Wilson 2012-03-16 10:15:12 UTC
Doesn't trigger the issue here. Can you please tell me which version of the driver you are currently using?
Comment 2 Chris Wilson 2012-03-16 11:14:07 UTC
Xorg.log would be useful for the identification of your system, and valgrind would help identify the bug and/or recompiling with no optimisation and printing the locals.
Comment 3 Magnus Kessler 2012-03-16 11:30:03 UTC
The crash happens on xorg-server 1.12 (compiled from source on gentoo), with commit 63c0d10faee3c7cca050505c2e81c416119e57e9 of the xf86-video-intel driver. 3D accelleration is provided by a mesa (git-c079574).

This is a 64-bit kernel (3.2.11 + tuxonice patches).

The desktop environment is KDE, with KWin compositing enabled. Firefox-11 (also compiled from source) seems to trigger this bug on many different sites.
Comment 4 Magnus Kessler 2012-03-16 11:30:39 UTC
Created attachment 58574 [details]
Xorg log
Comment 5 Magnus Kessler 2012-03-16 11:31:33 UTC
Created attachment 58575 [details]
Xorg log with crash backtrace
Comment 6 Chris Wilson 2012-03-16 15:19:40 UTC
I've tried this on gen2-6 with firefox 11/12 and with cairo-1.10.2/cairo-1.11.4. And still not reproduced your crash. Can you please just start X under valgrind and launch firefox? And perhaps provide a disassembly of your inplace_row()?
Comment 7 Magnus Kessler 2012-03-17 01:29:16 UTC
This crash may very well be compiler optimization related. I cannot reproduce it with these flags: 

CFLAGS="-ggdb -O0 -march=core2 -pipe"
CXXFLAGS=${CFLAGS}

These flags exhibit the crash behaviour on gcc-4.6.2:

CFLAGS="-ggdb -O2 -march=core2 -mssse3 -msse4.1 -mno-sse4.2 -fno-builtin-memcmp -funit-at-a-time -pipe -ftree-vectorize -floop-interchange -floop-strip-mine -floop-block -pipe"
CXXFLAGS="${CFLAGS}"

I have been able to bisect the code and the first bad commit appears to be

http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=fba49e1bb8e5b6b0e3ceace2dbddb5796ece954e

sna/traps: Fix off-by-one for filling vertical segments in tor_inplace

If the last solid portion was exactly 4-pixels wide, we would miss filling in the mask.
Comment 8 Chris Wilson 2012-03-17 02:27:33 UTC
Can you try:

commit e31d9dacafe060dc86de801114b475fdd0142eb6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Mar 17 09:21:00 2012 +0000

    sna/traps: Align indices for unrolled memset in row_inplace()
    
    The compiler presumes that the uint64_t write is naturally aligned and
    so may emit code that crashes with an unaligned moved. To workaround
    this, make sure the write is so aligned.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=47418
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 9 Magnus Kessler 2012-03-17 04:36:50 UTC
With commit e31d9dacafe060dc86de801114b475fdd0142eb6 I can no longer reproduce the crash. Many thanks!
Comment 10 Magnus Kessler 2012-03-30 10:58:24 UTC
something is still not right in inplace_row (sna_trapezoids.c)

With the latest version I get the following crash when starting gimp:

Program received signal SIGSEGV, Segmentation fault.
0x00007fe86258e98f in inplace_row (width=25, 
    row=0x2137a0c "\377\377\377\377\377\377\377\377", active=0x7fffe3ff39a0)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1515
1515    ../xf86-video-intel-9999/src/sna/sna_trapezoids.c: No such file or directory.
        in ../xf86-video-intel-9999/src/sna/sna_trapezoids.c
(gdb) bt
#0  0x00007fe86258e98f in inplace_row (width=25, 
    row=0x2137a0c "\377\377\377\377\377\377\377\377", active=0x7fffe3ff39a0)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1515
#1  tor_inplace (converter=0x7fffe3ff3180, buf=0x0, scratch=<optimized out>, 
    mono=<optimized out>)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:1752
#2  0x00007fe862593f12 in trapezoid_span_fallback (op=3 '\003', 
    src=<optimized out>, dst=<optimized out>, maskFormat=<optimized out>, 
    src_x=<optimized out>, src_y=<optimized out>, ntrap=9, traps=0x20bb654)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:4377
#3  0x00007fe86259815d in sna_composite_trapezoids (op=<optimized out>, 
    src=<optimized out>, dst=0x1c36e60, maskFormat=0x19973c8, 
    xSrc=<optimized out>, ySrc=<optimized out>, ntrap=9, traps=0x20bb654)
    at ../xf86-video-intel-9999/src/sna/sna_trapezoids.c:4570
#4  0x00000000004fb921 in ProcRenderTrapezoids (client=0x1c38de0)
    at /usr/src/debug/x11-base/xorg-server-1.12.0-r1/xorg-server-1.12.0/render/render.c:777
#5  0x0000000000437a91 in Dispatch ()
    at /usr/src/debug/x11-base/xorg-server-1.12.0-r1/xorg-server-1.12.0/dix/dispatch.c:439
#6  0x00000000004264ca in main (argc=<optimized out>, argv=0x7fffe3ff40d8, 
    envp=<optimized out>)
    at /usr/src/debug/x11-base/xorg-server-1.12.0-r1/xorg-server-1.12.0/dix/main.c:287
Comment 11 Chris Wilson 2012-03-30 11:08:19 UTC
Just to check, this is still with -O3 etc?

I think the culprit this time is that the row itself is not 8-byte aligned going into the function: row=0x2137a0c.

Let me change the alignment preamble to take that into account.
Comment 12 Chris Wilson 2012-03-30 11:13:36 UTC
Can you try...?

commit ee075ced844350785685a0f93f88f1dc310bcc73
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 30 19:09:30 2012 +0100

    sna/traps: Align the pointer not the indices
    
    Magnus found that inplace_row was still crashing on his setup when it
    tried to perform an 8-byte aligned write to an unaligned pointer. This
    time it looks like the row pointer itself was not 8-byte aligned, so
    instead of assuming that and fixing up the indices, ensure that the
    (index+row) results in an 8-byte aligned value.
    
    Reported-by: Magnus Kessler <Magnus.Kessler@gmx.net>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47418
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 13 Magnus Kessler 2012-03-30 12:11:08 UTC
No luck. The crash still happens at line 1515 (i.e. right after the changes made in this commit). And yes, this is still with the optimiizations mentioned previously.
Comment 14 Chris Wilson 2012-03-30 12:23:11 UTC
Hmm, can you attach gdb and paste another backtrace along with a disassembly and as many locals as gdb can find?
Comment 15 Chris Wilson 2012-03-30 12:49:12 UTC
Created attachment 59289 [details] [review]
Align the row[index]

I've proven myself an imbecile too often tonight, so lets test this patch first. :(
Comment 16 Magnus Kessler 2012-03-30 13:56:46 UTC
This patch fixes gimp crashing on startup. For that, it's a

Tested-by: Magnus Kessler <Magnus.Kessler@gmx.net>

However, the drop-down menus in gimp show severe rendering artefacts, both in their text and icons. It looks mostly like every second row and column is missing there, but sometimes multiple rows or columns are left blank. The menu entries get their normal look back once the mouse moves over them. I'm not sure if this is a consequence of your fix, or a completely different issue. Other GTK+ applications (notably Firefox) look OK.
Comment 17 Chris Wilson 2012-03-30 14:05:43 UTC
Thanks.

commit 6f2814db6f7b89e94e54b8d73c7e176ab7d1c469
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 30 20:45:55 2012 +0100

    sna/traps: Align the pointer+index
    
    It's the location of the pixels within the row that matter for
    alignment!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47418
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Magnus Kessler <Magnus.Kessler@gmx.net>

I had a quick look at the gimp, poking various menus and drop-down lists, nothing struck me out of the ordinary.

First can you try running without optimisations and see if still occurs? Are you able to capture it in a screenshot (just so I can be sure I was poking in the right places!)?
Comment 18 Magnus Kessler 2012-03-31 02:22:59 UTC
Created attachment 59303 [details]
Rendering artefacts in gimp menu

The rendering artefacts appear with or without optimization. They come in a variety of patterns, some of which appear in the attached screenshot. I have observed horizontal, vertical, and even diagonal stripes of missing pixels.
Comment 19 Chris Wilson 2012-03-31 02:41:58 UTC
That's a definitely a different level of corruption. And the damage is persistent (until re-rendered) so data loss on upload I'd guess. Can you grab a whole screen shot so I can try to work out which areas are most affected (and so any pattern behind the corruption)?

Also can you compile with --enable-debug=full and attach the full Xorg.0.log for a gimp session (or the last 1 MiB I guess will do)? And if you could compile http://cgit.freedesktop.org/~ickle/linux-2.6/ using the vmap branch and xf86-video-intel with --enable-sna --enable-vmap that would help with one query. (I'll see if I can reproduce this on a stock kernel as well.)
Comment 20 Chris Wilson 2012-03-31 04:07:51 UTC
I went back to a stock ubuntu kernel (3.2.0) without vmap on 965gm (which should be close enough to your gm45) and nothing unusual happened. Can I ask you dig a little deeper and see if you can find the trigger? Try different WM, different themes and different kernels.
Comment 21 Chris Wilson 2012-04-09 07:51:45 UTC
Can you please retest with

commit 7f0bede3e7e3f92a637d1c886304b16afc0e34f2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 9 10:48:08 2012 +0100

    sna/traps: Use a temporary variable for the write pointer

Though as I have not reproduced the corruption you've seen, I can't be certain if this is related. Hopefully it is if the corruption only started after the introduction of tor_inplace().
Comment 22 Magnus Kessler 2012-04-09 14:26:14 UTC
(In reply to comment #21)
> Can you please retest with
> 
> commit 7f0bede3e7e3f92a637d1c886304b16afc0e34f2
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Apr 9 10:48:08 2012 +0100
> 
>     sna/traps: Use a temporary variable for the write pointer

No, this commit doesn't make any difference regarding the corruption I'm seeing.
Comment 23 Chris Wilson 2012-04-12 12:42:33 UTC
*** Bug 48619 has been marked as a duplicate of this bug. ***
Comment 24 Chris Wilson 2012-04-12 12:45:01 UTC
Looks like the common element here is KDE? If you try a different WM (and DE) does the issue persist?
Comment 25 Chris Wilson 2012-04-20 08:25:39 UTC
As an aside, having chased yet another bug due to an interaction with gcc, can either of you confirm if this bug is still present if you recompile the xserver and the xf86-video-intel (maybe even pixman) with -O0?
Comment 26 Magnus Kessler 2012-04-22 00:31:26 UTC
The problems with font rendering appear indeed to be related to the use of compositing in KDE's kwin window manager. If I use no window manager, or even turn compositing off in kwin, the font corruption no longer appears.

The optimization level in gcc makes no difference, and xorg-server, xf86-video-intel and pixman compiled with -O0 have the same issue.
Comment 27 Chris Wilson 2012-04-29 04:48:46 UTC
Hmm, can either of you reproduce with ./configure --enable-sna --enable-debug=full and attach the resulting Xorg.0.log (it will be huge)?
Comment 28 Magnus Kessler 2012-05-02 03:06:57 UTC
Gentoo bug https://bugs.gentoo.org/show_bug.cgi?id=409593 suggests, that the font corruption is due to some changes in cairo after 1.11.2. And indeed, after downgrading cairo to 1.11.2, I no longer observe the problem, even with compositing enabled in kwin.

The gentoo bug points to https://bugs.freedesktop.org/show_bug.cgi?id=47266#c142, which claims to have bisected to cairo commit af9fbd176b145f042408ef5391eef2a51d7531f8 ("Introduce a new compositor architecture")
Comment 29 Chris Wilson 2012-05-02 03:19:23 UTC
(In reply to comment #28)
> Gentoo bug https://bugs.gentoo.org/show_bug.cgi?id=409593 suggests, that the
> font corruption is due to some changes in cairo after 1.11.2. And indeed, after
> downgrading cairo to 1.11.2, I no longer observe the problem, even with
> compositing enabled in kwin.

They are not bugs in cairo, but do suggest which upload path is going wrong.
Comment 30 Chris Wilson 2012-05-23 13:58:45 UTC
I did make some tweaks the upload buffers and idle detection which I feel at least touch the implication code paths here, so I'd appreciate if you could give me a status update on the occurrence of this bug? Thanks.
Comment 31 Magnus Kessler 2012-05-24 03:15:08 UTC
All menus in gimp-2.6.x now render correctly with current versions of xf86-video-intel, libdrm and mesa.
Comment 32 Chris Wilson 2012-05-24 03:39:34 UTC
There are a number of standout commits in the interval between tests. However, I'm going to tentatively take this as finally fixed. Thanks for the bug report and all the testing! Keep your eyes peeled for further issues...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.