Bug 71415

Summary: Xorg SIGSEGV in sna_do_copy() for 2.99.905
Product: xorg Reporter: Gustavo Rubio <gus>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: fabio.coatti, gus, michael.meeks
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg GDB output
none
dmesg output of thinkpad Edge E530
none
lspci -vv output of thinkpad Edge E530
none
dmesg outout of acer Aspire S5-390
none
lspci -vv output of acer Aspire S5-390 none

Description Gustavo Rubio 2013-11-09 02:51:34 UTC
Created attachment 88916 [details]
Xorg GDB output

Crash happens when using empathy chat application on OpenSuSE 13.1 RC2. Filled a bug in novell's bugzilla too but I thought that, after the backtrace I've got from GDB, I should report this upstream.

I can reproduce this issue in two environments:

 - A thinkpad laptop with an Intel 3000 video chip
 - An acer aspire laptop with an Intel 4000 video chip

Steps to reproduce:

 1. Open empathy
 2. Scroll in any chat window that has the scroll decoration and some messages
 3. Xorg crashes and get's me back to GDM

Both machines have different chipsets I think (one sandy, one ivy) so I'm guessing it does not have anything to do with the BIOS as Stefan on novell's bugzilla sugested, plus there were no BIOS updates, at least for the Aspire, available.

That being said I tried to debug Xorg by SSH. I'm attaching both machines dmesg, lspci specs with -vv, uname output and the GDB stacktrace I could get wich is the same for both machines. I lacked some debuginfo symbols installed or that's what GDB told me, if you need anymore info please let me know.
Comment 1 Gustavo Rubio 2013-11-09 02:52:51 UTC
Created attachment 88917 [details]
dmesg output of thinkpad Edge E530
Comment 2 Gustavo Rubio 2013-11-09 02:53:20 UTC
Created attachment 88918 [details]
lspci -vv output of thinkpad Edge E530
Comment 3 Gustavo Rubio 2013-11-09 02:55:48 UTC
Created attachment 88919 [details]
dmesg outout of acer Aspire S5-390
Comment 4 Gustavo Rubio 2013-11-09 02:56:19 UTC
Created attachment 88920 [details]
lspci -vv output of acer Aspire S5-390
Comment 5 Chris Wilson 2013-11-09 11:07:40 UTC
Can you please do a 'p *gc' in frame 0 of the sigsegv?
Comment 6 Gustavo Rubio 2013-11-10 00:27:32 UTC
(In reply to comment #5)
> Can you please do a 'p *gc' in frame 0 of the sigsegv?

Hello Chris,

I'm not an expert on GDB, I hope I've fot the right info. 

This is the value of the 'gc' array:

$2 = {pScreen = 0x24f0ce0, depth = 24 '\030', alu = 3 '\003', lineWidth = 0, dashOffset = 0, numInDashList = 2, dash = 0x808728 <DefaultDash> "\004\004",
  lineStyle = 0, capStyle = 1, joinStyle = 0, fillStyle = 0, fillRule = 0, arcMode = 1, subWindowMode = 0, graphicsExposures = 1, clientClipType = 0, miTranslate = 1,
  tileIsPixel = 1, fExpose = 1, freeCompClip = 1, scratch_inuse = 0, unused = 0, planemask = 18446744073709551615, fgPixel = 0, bgPixel = 1, tile = {pixmap = 0x0,
    pixel = 0}, stipple = 0x2565f10, patOrg = {x = 0, y = 0}, font = 0x264ad40, clipOrg = {x = 0, y = 0}, clientClip = 0x0, stateChanges = 0, serialNumber = 8340,
  funcs = 0x0, ops = 0x7f7786edbea0 <sna_gc_ops>, devPrivates = 0x2b3cb50, pRotatedPixmap = 0x0, pCompositeClip = 0x0}

Let me know if you need more info. Thanks.
Comment 7 Chris Wilson 2013-11-10 08:45:41 UTC
Ah, that's what I guessed.

Can you please compile xf86-video-intel.git with --enable-debug (or apply
diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 497e0d1..388e075 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -14947,6 +14947,8 @@ sna_validate_gc(GCPtr gc, unsigned long changes, DrawablePtr drawable)
            (gc->clientClipType != CT_NONE && (changes & (GCClipXOrigin | GCClipYOrigin))))
                miComputeCompositeClip(gc, drawable);
 
+       assert(gc->pCompositeClip);
+
        sna_gc(gc)->changes |= changes;
 }
and ./configure --enable-debug $*) and run under gdb. Please report where it then fails.
Comment 8 Chris Wilson 2013-11-11 09:27:47 UTC
*** Bug 71396 has been marked as a duplicate of this bug. ***
Comment 9 Chris Wilson 2013-11-11 10:30:41 UTC
*** Bug 71482 has been marked as a duplicate of this bug. ***
Comment 10 Michael Meeks 2013-11-11 11:28:58 UTC
I attached gdb before starting LibreOffice; and got to:

(gdb) b sna_accel.c:14950
Breakpoint 2 at 0xb6beada9: file sna_accel.c, line 14950.
(gdb) c

Breakpoint 2, sna_validate_gc (gc=0x96c0158, changes=917504, drawable=0x969ec50) at sna_accel.c:14950
14950		sna_gc(gc)->changes |= changes;
(gdb) condition 2 gc->pCompositeClip == 0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb6bee678 in sna_do_copy (src=src@entry=0x96b5ec8, dst=dst@entry=0x9c72868, gc=gc@entry=0xa016e70, sx=sx@entry=0, sy=sy@entry=0, 
    width=width@entry=300, height=height@entry=213, dx=dx@entry=5, dy=dy@entry=-115, copy=0xb6c08fe0 <sna_copy_boxes>, 
    bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:6158
6158		if (gc->pCompositeClip->data)
(gdb) p *gc
$1 = {pScreen = 0x9346738, depth = 24 '\030', alu = 3 '\003', lineWidth = 0, dashOffset = 0, numInDashList = 2, 
  dash = 0x8262bc8 <DefaultDash> "\004\004", lineStyle = 0, capStyle = 1, joinStyle = 0, fillStyle = 0, fillRule = 0, arcMode = 1, 
  subWindowMode = 0, graphicsExposures = 0, clientClipType = 0, miTranslate = 1, tileIsPixel = 1, fExpose = 1, freeCompClip = 1, 
  scratch_inuse = 0, unused = 0, planemask = 4294967295, fgPixel = 0, bgPixel = 1, tile = {pixmap = 0x0, pixel = 0}, stipple = 0x9383990, 
  patOrg = {x = 0, y = 0}, font = 0x9384340, clipOrg = {x = 0, y = 0}, clientClip = 0x0, stateChanges = 0, serialNumber = 2802969, funcs = 0x0, 
  ops = 0xb6d1aa20 <sna_gc_ops>, devPrivates = 0xa016ec4, pRotatedPixmap = 0x0, pCompositeClip = 0x0}
(gdb) bt
#0  0xb6bee678 in sna_do_copy (src=src@entry=0x96b5ec8, dst=dst@entry=0x9c72868, gc=gc@entry=0xa016e70, sx=sx@entry=0, sy=sy@entry=0, 
    width=width@entry=300, height=height@entry=213, dx=dx@entry=5, dy=dy@entry=-115, copy=0xb6c08fe0 <sna_copy_boxes>, 
    bitPlane=bitPlane@entry=0, closure=closure@entry=0x0) at sna_accel.c:6158
#1  0xb6bee9f6 in sna_copy_area (src=0x96b5ec8, dst=0x9c72868, gc=0xa016e70, src_x=0, src_y=0, width=300, height=213, dst_x=5, dst_y=-115)
    at sna_accel.c:6247
#2  0x0807ac7f in ProcCopyArea (client=0xa01c390) at dispatch.c:1626
#3  0x0807eecd in Dispatch () at dispatch.c:432
#4  0x0806cf6a in main (argc=12, argv=0xbfb15404, envp=0xbfb15438) at main.c:298

of course gdb lieth through it's teeth - but (presumably) if correct, that points to the sna_validate_gc not getting called for this gc (somehow) (?)
Comment 11 Michael Meeks 2013-11-11 11:34:02 UTC
Added the assert back (but also my horrible work-around to not crash). I've done all the things that made it crash before and not hit the assert at all - so, gdb-aside, I would assume that it's not going through sna_validate_gc for this crashing path; or the gc is corrupted later =) HTH.
Comment 12 Chris Wilson 2013-11-11 11:45:40 UTC
Hmm gc->funcs is also 0 which is bad and also seemingly impossible. Ok, this is starting to look clearly like sna_gc_move_to_gpu() is being called to restore a sna_gc_move_to_cpu() that never occurred.

I think the relevant fix is

commit d41f847c75c3bce85fda6e7508995b45679944e8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Nov 2 21:01:26 2013 +0000

    sna: Jump to the right escape target when bypassing a self-copy
    
    Another fix for
    
    commit e3f15cbf39696edae9f716bdcfbb7032ec7d7e3f [2.99.905]
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Tue Oct 22 15:19:15 2013 +0100
    
        sna: Move gc back to GPU after failure to move it to CPU
Comment 13 Gustavo Rubio 2013-11-12 03:43:04 UTC
I'm sorry for the late reply, been busy on other stuff.

Chris: do you still need me to debug with the patch applied? I see mr. Meeks info might have been useful already.

Just FYI, you may already know it, but I switched from sna to uxa and the problem dissapears on both machines. This on the driver config for AccelMethod.
Comment 14 Chris Wilson 2013-11-12 08:24:57 UTC
I just need someone to confirm that the bug is indeed fixed in xf86-video-intel.git
Comment 15 Chris Wilson 2013-11-13 10:21:29 UTC
Assuming that I have indeed identified the issue correctly and the fix is complete...
Comment 16 Michael Meeks 2013-11-13 12:07:19 UTC
Sorry - would test but for the fact that my X server now doesn't crash (with my hack-around) and thus I've been running/working for several days. Also that finding the patch in git, extracting it is a bit of a PITA etc.
Comment 17 Gustavo Rubio 2013-11-15 20:54:59 UTC
I can confirm the fix, at least working for me. 906 no longer crashes empathy. I can try with LibreOffice though.

OpenSuSE 13.1, Intel 4000.

Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.