Bug 73108

Summary: crash in _sna_pixmap_move_to_cpu in 2.99.906
Product: xorg Reporter: Michael Meeks <michael.meeks>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: chris
Version: unspecified   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=71482
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log - same stack trace pretty much. none

Description Michael Meeks 2013-12-28 23:43:30 UTC
Running libreoffice to render something or other, sadly this really screwed up the middle of a 30 minute profiling run in callgrind against a deadline ... [ wow I hate Xorg bugs in production ], I got:

Program received signal SIGSEGV, Segmentation fault.
__memset_sse2 () at ../sysdeps/i386/i686/multiarch/memset-sse2.S:298
298     ../sysdeps/i386/i686/multiarch/memset-sse2.S: No such file or directory.
(gdb) bt
#0  __memset_sse2 () at ../sysdeps/i386/i686/multiarch/memset-sse2.S:298
#1  0xb6bf0a44 in memset (__len=<optimized out>, __ch=<optimized out>,
    __dest=<optimized out>) at /usr/include/bits/string3.h:84
#2  _sna_pixmap_move_to_cpu (pixmap=pixmap@entry=0x8c0aec8,
    flags=flags@entry=3) at sna_accel.c:2110
#3  0xb6bf3b81 in sna_drawable_move_region_to_cpu (drawable=0x8c0aec8,
    region=region@entry=0xbfb39ba8, flags=flags@entry=3) at sna_accel.c:2479
#4  0xb6c4c987 in trapezoid_span_inplace__x8r8g8b8 (op=<optimized out>,
    dst=dst@entry=0x8bef1b0, src=src@entry=0x8ca6150, src_x=src_x@entry=45,
    src_y=src_y@entry=6, maskFormat=maskFormat@entry=0x85c2208,
    flags=flags@entry=2, ntrap=ntrap@entry=16, traps=traps@entry=0x8d4191c)
    at sna_trapezoids_precise.c:2689
#5  0xb6c4ed05 in precise_trapezoid_span_inplace (sna=sna@entry=0xb5b08000,
    op=op@entry=3 '\003', src=src@entry=0x8ca6150, dst=dst@entry=0x8bef1b0,
    maskFormat=maskFormat@entry=0x85c2208, flags=flags@entry=2,
    src_x=src_x@entry=45, src_y=src_y@entry=6, ntrap=ntrap@entry=16,
    traps=traps@entry=0x8d4191c, fallback=fallback@entry=false)
    at sna_trapezoids_precise.c:2926
#6  0xb6c31019 in trapezoid_span_inplace (fallback=false, traps=0x8d4191c,
    ntrap=16, src_y=6, src_x=45, flags=2, maskFormat=0x85c2208, dst=0x8bef1b0,
    src=0x8ca6150, op=3 '\003', sna=0xb5b08000) at sna_trapezoids.h:153
#7  sna_composite_trapezoids (op=3 '\003', src=0x8ca6150, dst=0x8bef1b0,
    maskFormat=0x85c2208, xSrc=45, ySrc=6, ntrap=16, traps=0x8d4191c)
---Type <return> to continue, or q <return> to quit---
    at sna_trapezoids.c:669
#8  0x0815771e in CompositeTrapezoids (op=3 '\003', pSrc=0x8ca6150,
    pDst=0x8bef1b0, maskFormat=0x85c2208, xSrc=45, ySrc=6, ntrap=16,
    traps=traps@entry=0x8d4191c) at picture.c:1640
#9  0x0815c82b in ProcRenderTrapezoids (client=0x8b81178) at render.c:759
#10 0x08157b7d in ProcRenderDispatch (client=0x8b81178) at render.c:1989
#11 0x0807eecd in Dispatch () at dispatch.c:432
#12 0x0806cf6a in main (argc=12, argv=0xbfb3c464, envp=0xbfb3c498)
    at main.c:298
(gdb) l
293     in ../sysdeps/i386/i686/multiarch/memset-sse2.S
(gdb) up
#1  0xb6bf0a44 in memset (__len=<optimized out>, __ch=<optimized out>,
    __dest=<optimized out>) at /usr/include/bits/string3.h:84
warning: Source file is more recent than executable.
84        return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
(gdb) l
79            && (!__builtin_constant_p (__ch) || __ch != 0))
80          {
81            __warn_memset_zero_len ();
82            return __dest;
83          }
84        return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest));
85      }
86
87      #ifdef __USE_BSD
88      __fortify_function void
(gdb) p __dest
$1 = <optimized out>
(gdb) up
#2  _sna_pixmap_move_to_cpu (pixmap=pixmap@entry=0x8c0aec8,
    flags=flags@entry=3) at sna_accel.c:2110
2110                            memset(pixmap->devPrivate.ptr, priv->clear_color,
(gdb) l
2105                    }
2106
2107                    if (priv->clear_color == 0 ||
2108                        pixmap->drawable.bitsPerPixel == 8 ||
2109                        priv->clear_color == (1 << pixmap->drawable.depth) - 1) {
2110                            memset(pixmap->devPrivate.ptr, priv->clear_color,
2111                                   pixmap->devKind * pixmap->drawable.height);
2112                    } else {
2113                            pixman_fill(pixmap->devPrivate.ptr,
2114                                        pixmap->devKind/sizeof(uint32_t),
(gdb) p pixmap
$2 = (struct _Pixmap *) 0x8c0aec8
(gdb) p *pixmap
$3 = {drawable = {type = 1 '\001', class = 0 '\000', depth = 32 ' ',
    bitsPerPixel = 32 ' ', id = 67111130, x = 0, y = 0, width = 60,
    height = 60, pScreen = 0x85cb738, serialNumber = 761839},
  devPrivates = 0x8c0aefc, refcnt = 3, devKind = 240, devPrivate = {
    ptr = 0xb4517000, val = -1269731328, uval = 3025235968,
    fptr = 0xb4517000}, screen_x = 0, screen_y = 0, usage_hint = 0,
  master_pixmap = 0x8dde2c0}
(gdb) p pixmap->devKind
$4 = 240
(gdb) p pixmap->drawable.height
$5 = 60

this is the openSUSE 13.1 package with this recent changelog:

* Sun Dec 01 2013 hrvoje.senjan@gmail.com
- U_sna-Add-the-missing-braces-around-the-conditional-bl.patch:
  fixes regression from 2.99.906 release (fdo#71605, bnc#853085)

* Sat Nov 30 2013 hrvoje.senjan@gmail.com
- U_sna_correct_handling_of_cropped_images.patch:
  Fix X crashes triggered by wrong handling of cropped
  XvImages (bnc#852531)

* Wed Nov 27 2013 tiwai@suse.de
- U_sna-Process-Damage-relative-to-dst-pDrawable-not-its.patch:
  Fix corrupted output with Emacs and others (bnc#852620)

* Thu Nov 14 2013 hrvoje.senjan@gmail.com
- Update to 3.0 prerelease 2.99.906
  + Fix damage handling when rendering to a partially damaged GPU
    surface. Regression in 2.99.905 (fdo#70527)
  + Use asprintf() instead of sprintf()
    Regression in 2.99.905 (fdo#70835), (bnc#847762)
  + Improve accounting for fence overallocation on older gen2/3, and
    improve the tiling mechanism to fit into the same aperture
    constraints (fdo#70924)
  + Add an extra GPU flush on Sandybridge to fix some rare font
    corruption
  + Rasterise lines through all clip boxes
    (fdo#70802
  + Fix regression from stricter handling of failures to move a
    GC to the GPU. Regression in 2.99.905. (fdo#71415), (bnc#847941)
  + Fix various fail along the memcpy_xor paths, including
    inadequate error handling and integer overflow (fdo#70527)
  + Fix outside-of-target stipple uploads (lp#1247785)
  + Fix clip detection for long glyphs
    Incomplete bug fix (causing a regression) in 2.99.905
    (fdo#70527)
  + Fix VSync for the render engine (Xv) on Haswell (fdo#70527)
Comment 1 Michael Meeks 2013-12-28 23:45:15 UTC
Since I suffered bug#71482 on this hardware before, I suspect that this may well be related to the deep joy experienced there :-) but just a quick guess. Reproduces worryingly frequently.
Comment 2 Chris Wilson 2013-12-30 10:09:04 UTC
That pixmap and stacktrace look consistent, maybe *priv, *priv->cpu_bo and *priv->gpu_bo may help. What would be useful would be using --enable-debug just to check for the obvious signs of insanity leading up to that point.

Is this triggered by any libreoffice activity or is there a more precise recipe for reproduction?
Comment 3 Chris Wilson 2013-12-30 10:42:14 UTC
And for reference I've just pushed .907, but nothing strikes me as being a fix - still worth checking out. Can you also please attach your Xorg.0.log just in case there is any peculiarity in it?
Comment 4 Michael Meeks 2014-01-01 11:40:49 UTC
Interestingly, I just updated to:

rpm -q --changelog xorg-x11-server | head
* Mon Dec 16 2013 msrb@suse.com
- u_exa-only-draw-valid-trapezoids.patch
  * Fix possible x server crash using invalid trapezoids.
    (bnc#853846 CVE-2013-6424)

And I don't see the crash anymore - quite possibly it was the CVE patch that fixed this => marking fixed for now =) thanks !
Comment 5 Michael Meeks 2014-01-15 14:09:30 UTC
My bad - the bug is still there; just got it again - 'priv' is sadly optimized out - sorry about that; and I'd lost this bug when I hit it.

I attach the Xorg log too. It is triggered for me by using youtube & the flash player on openSUSE 13.1 - play a few videos, switch to the next one, seek a bit: bang [ and I'd expect that to happen in the video stream myself not this code-path ;-].
Comment 6 Michael Meeks 2014-01-15 14:10:02 UTC
Created attachment 92144 [details]
Xorg log - same stack trace pretty much.
Comment 7 Chris Wilson 2014-01-16 10:41:09 UTC
Hmm, see also bug 73351. What we have there is pixmap->devPrivate.ptr != priv->ptr. If gdb allows you, can you also please print *pixmap; p *priv; p *priv->cpu_bo; p *priv->gpu_bo?
Comment 8 Chris Wilson 2014-01-17 09:30:05 UTC
Presuming it is bug 73351, it should be fixed by

commit 5f3ee21a307a4ff4db189bd53e58a70ec01ee6bc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jan 17 08:40:34 2014 +0000

    sna: Nullify pixmap->devPrivate.ptr after promoting CPU bo to GPU
    
    When we convert a CPU bo into a GPU bo, we need to remove any dangling
    shadow pointers we use for devPrivate.ptr. Whilst the bo remains alive
    these are incoherent, but if we ever replace the GPU bo (for example to
    change tiling for DRI2) then the dangling pointer becomes invalid and
    will explode on next use.
    
    Reported-by: Mike Aury <mike.auty@gmail.com>
    Reported-by:  Marti Raudsepp <marti@juffo.org>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73351
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

The concern is that you have a different stacktrace which may indicate another issue. In any case the extra assertions added for #73351 should also help here.

Please try testing with xf86-video-intel.git and see if that resolves the issue or gives us more debug information.
Comment 9 Michael Meeks 2014-01-29 20:00:00 UTC
Tried to apply the patch you pointed out, but it conflicted with SUSE 13.1's package; so gave up and built / installed git from hash:

commit 2425f03432de9bedeeda14ddbc5742cf7ce22874
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jan 28 19:17:14 2014 +0000

    sna: Check for a hang after a spurious error return from set-domain-ioctl

I've been doing 'dangerous' stuff like watching things on youtube, and using the browser generally on a loaded machine for a couple of days, and (so far) nothing has happened: but perhaps I've trained myself not to do risky stuff ;-)

Either way - latest git looks -much- better. I also seem to have lost the image corruption on scroll I was enjoying before (just loading firefox with a page of images would give some random corruption as I scrolled - whereby the image would be filled with vertical stripes).

Thanks =)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.