Bug 99358

Summary: Xorg crashes with SIGSEGV in sna_set_cursor_position()
Product: xorg Reporter: Igor Mammedov <qwerty0987654321>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: amg1127, andyrtr, ian.frost, jdschwa, knetl.j, miles, peter.hutterer, pgn674, rishi.is, robert, tsujan2000, waltercool, zelial
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=109003
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log
none
Take the input lock for RecolorCursor
none
Take input lock for xf86TransprentCursor
none
Take input lock for CheckHWCursor
none
Take input lock for CheckHWCursor none

Description Igor Mammedov 2017-01-11 10:25:21 UTC
Created attachment 128887 [details]
Xorg log

Crash happens randomly and it could take from half an hour to 2 days.
It seems that crash happens when moving cursor.

I've used xorg-x11-drv-intel from the latest git at commit 028c946df08 but crash happens anyway.

Here is crash backtrace:
Process 1715 (Xorg) of user 16585 dumped core.          
                Stack trace of thread 1728:
                #0  0x00007fdd4e5f0d54 sna_set_cursor_position (intel_drv.so)
                #1  0x00000000004bbea2 xf86MoveCursor (Xorg)
                #2  0x0000000000585eb3 miPointerMoveNoEvent (Xorg)
                #3  0x0000000000586cb4 miPointerSetPosition (Xorg)
                #4  0x000000000044d64e positionSprite.part.7 (Xorg)
                #5  0x000000000044de53 fill_pointer_events (Xorg)
                #6  0x000000000044f6df GetPointerEvents (Xorg)
                #7  0x000000000044fc90 QueuePointerEvents (Xorg)
                #8  0x00007fdd4c101cb5 xf86libinput_handle_motion (libinput_drv.so)
                #9  0x00007fdd4c102880 xf86libinput_read_input (libinput_drv.so)
                #10 0x000000000059cb1c InputReady (Xorg)
                #11 0x000000000059f181 ospoll_wait (Xorg)
                #12 0x000000000059c976 InputThreadDoWork (Xorg)
                #13 0x00007fdd530ac6ca start_thread (libpthread.so.0)
                #14 0x00007fdd52de6f7f __clone (libc.so.6)
                
                Stack trace of thread 1715:
                #0  0x00007fdd530b538d __lll_lock_wait (libpthread.so.0)
                #1  0x00007fdd530aeeca pthread_mutex_lock (libpthread.so.0)
                #2  0x000000000059c860 input_lock (Xorg)
                #3  0x00000000004bc386 xf86SetCursor (Xorg)
                #4  0x00000000004babf5 xf86CursorSetCursor (Xorg)
                #5  0x000000000058654b miPointerUpdateSprite (Xorg)
                #6  0x000000000058679a miPointerDisplayCursor (Xorg)
                #7  0x00000000004c9511 CursorDisplayCursor (Xorg)
                #8  0x0000000000518700 AnimCurDisplayCursor (Xorg)
                #9  0x000000000043fe48 ChangeToCursor (Xorg)
                #10 0x0000000000441287 WindowHasNewCursor (Xorg)
                #11 0x000000000046a948 ChangeWindowDeviceCursor (Xorg)
                #12 0x0000000000531dc6 ProcXIChangeCursor (Xorg)
                #13 0x0000000000437055 Dispatch (Xorg)
                #14 0x000000000043afd8 dix_main (Xorg)
                #15 0x00007fdd52cff401 __libc_start_main (libc.so.6)
                #16 0x0000000000424cfa _start (Xorg)
                
                Stack trace of thread 1722:
                #0  0x00007fdd530b2460 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007fdd4e634539 __run__ (intel_drv.so)
                #2  0x00007fdd530ac6ca start_thread (libpthread.so.0)
                #3  0x00007fdd52de6f7f __clone (libc.so.6)

and gdb output:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  sna_set_cursor_position (scrn=<optimized out>, x=734, y=196) at sna_display.c:6332
6332				int xhot = sna->cursor.ref->bits->xhot;
[Current thread is 1 (Thread 0x7fdd49af3700 (LWP 1728))]
(gdb) bt
#0  0x00007fdd4e5f0d54 in sna_set_cursor_position (scrn=<optimized out>, x=734, y=196) at sna_display.c:6332
#1  0x00000000004bbea2 in xf86MoveCursor ()
#2  0x0000000000585eb3 in miPointerMoveNoEvent ()
#3  0x0000000000586cb4 in miPointerSetPosition ()
#4  0x000000000044d64e in positionSprite.part.7 ()
#5  0x000000000044de53 in fill_pointer_events ()
#6  0x000000000044f6df in GetPointerEvents ()
#7  0x000000000044fc90 in QueuePointerEvents ()
#8  0x00007fdd4c101cb5 in xf86libinput_handle_motion (pInfo=<optimized out>, pInfo=<optimized out>, event=
    0x7fdd44008b40) at xf86libinput.c:1254
#9  0x00007fdd4c101cb5 in xf86libinput_handle_event (event=event@entry=0x7fdd44008b40) at xf86libinput.c:1910
#10 0x00007fdd4c102880 in xf86libinput_read_input (pInfo=<optimized out>) at xf86libinput.c:1995
#11 0x000000000059cb1c in InputReady ()
#12 0x000000000059f181 in ospoll_wait ()
#13 0x000000000059c976 in InputThreadDoWork ()
#14 0x00007fdd530ac6ca in start_thread () at /lib64/libpthread.so.0
#15 0x00007fdd52de6f7f in clone () at /lib64/libc.so.6

(gdb) p sna->cursor
$1 = {cursors = 0x1cc6b80, info = 0x1712d60, ref = 0x1d9c310, serial = 5871, fg = 4294967295, bg = 4278190080, 
  size = 64, disable = false, active = true, last_x = 734, last_y = 196, max_size = 256, use_gtt = true, 
  num_stash = 0, stash = 0x1bd3310, scratch = 0x7fdd55411010}
(gdb) p sna->cursor.ref
$2 = (CursorPtr) 0x1d9c310
(gdb) p sna->cursor.ref->bits
$3 = (CursorBitsPtr) 0x1d9c348
(gdb) p sna->cursor.ref->bits->xhot
$4 = 4
(gdb) info locals
xhot = <optimized out>
yhot = <optimized out>
v = {v = {3.6462044663083995e-321, 2.6894028653599915e-317, 1.0000000000000444}}
hot = {v = {6.9459898994898221e-310, 2147483647, 6.9459898995133397e-310}}
crtc = 0x170a7b0
sna_crtc = 0x170a5b0
cursor = 0x1cc6bc0
arg = {flags = 0, crtc_id = 45, x = -2266, y = -601, width = 29351552, height = 0, handle = 0}
xf86_config = 0x1707af0
sna = 0x7fdd55453000
sigio = 0
c = 2


Reference to Fedora BZ https://bugzilla.redhat.com/show_bug.cgi?id=1384486 with the same issue.

According to above BZ, the issue mainly seen with docked Lenovo Thinkpads in multi-display setups but there is report [comment 50] where it's seen on desktop.

xorg-x11-server-Xorg-1.19.0-3.fc25.x86_64
xorg-x11-drv-libinput-0.23.0-2.fc25.x86_64

Xorg log is in attachment.
Comment 1 Brian J. Murrell 2017-01-13 16:58:28 UTC
So to clarify, I think "docked"/not-docked is a red-herring.

What seems to be the common factor in the downstream bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1384486 is having a rotated screen.

I'm not sure if multiple screens has been determined to be a common factor but I know it's one I share as well as several others.
Comment 2 Chris Wilson 2017-01-16 22:23:13 UTC
SIGSEGV on what appears to be a valid pointer (at least gdb thinks it is). Most notable about the locals is the fb/bg which appear set, suggesting this is not an ARGB cursor but a bitmap. And there seems to be a missed lock in xfree86 for XRecolorCursor.
Comment 3 Chris Wilson 2017-01-16 22:23:39 UTC
Created attachment 128990 [details] [review]
Take the input lock for RecolorCursor
Comment 4 Chris Wilson 2017-01-16 22:38:25 UTC
Created attachment 128991 [details] [review]
Take input lock for xf86TransprentCursor

Another path missing a lock.
Comment 5 Igor Mammedov 2017-01-20 09:28:21 UTC
With patches from comments 3 and 4 applied it managed tnot crash for 2~days,
but it did crash in the end.

I've split line where it crashes to find out offending pointer so here it goes:

       Message: Process 1565 (Xorg) of user 16585 dumped core.
                
                Stack trace of thread 1577:
                #0  0x00007f28b2fce188 sna_set_cursor_position (intel_drv.so)
                #1  0x00000000004bc462 xf86MoveCursor (Xorg)
                #2  0x0000000000586063 miPointerMoveNoEvent (Xorg)
                #3  0x0000000000586e64 miPointerSetPosition (Xorg)
                #4  0x000000000044d6ae positionSprite (Xorg)
                #5  0x000000000044deb3 positionSprite (Xorg)
                #6  0x000000000044f75f GetPointerEvents (Xorg)
                #7  0x000000000044fd10 QueuePointerEvents (Xorg)
                #8  0x00007f28b0d10cb5 xf86libinput_handle_motion (libinput_drv.so)
                #9  0x00007f28b0d11880 xf86libinput_read_input (libinput_drv.so)
                #10 0x000000000059ccec InputReady (Xorg)
                #11 0x000000000059f351 ospoll_wait (Xorg)
                #12 0x000000000059cb46 InputThreadDoWork (Xorg)
                #13 0x00007f28b78706ca start_thread (libpthread.so.0)
                #14 0x00007f28b75aaf7f __clone (libc.so.6)
                
                Stack trace of thread 1565:
                #0  0x00007f28b787938d __lll_lock_wait (libpthread.so.0)
                #1  0x00007f28b7872eca pthread_mutex_lock (libpthread.so.0)
                #2  0x000000000059ca30 input_lock (Xorg)
                #3  0x00000000004bc246 xf86SetCursor (Xorg)
                #4  0x00000000004bacd5 xf86CursorSetCursor (Xorg)
                #5  0x00000000005866fb miPointerUpdateSprite (Xorg)
                #6  0x000000000058694a miPointerDisplayCursor (Xorg)
                #7  0x00000000004c9601 CursorDisplayCursor (Xorg)
                #8  0x0000000000518830 AnimCurDisplayCursor (Xorg)
                #9  0x000000000043fea8 ChangeToCursor (Xorg)
                #10 0x00000000004412e7 WindowHasNewCursor (Xorg)
                #11 0x000000000046a9c8 ChangeWindowDeviceCursor (Xorg)
                #12 0x0000000000531f76 ProcXIChangeCursor (Xorg)
                #13 0x00000000004370b5 Dispatch (Xorg)
                #14 0x000000000043b038 dix_main (Xorg)
                #15 0x00007f28b74c3401 __libc_start_main (libc.so.6)
                #16 0x0000000000424d1a _start (Xorg)
                
                Stack trace of thread 1566:
                #0  0x00007f28b7876460 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f28b300b769 __run__ (intel_drv.so)
                #2  0x00007f28b78706ca start_thread (libpthread.so.0)
                #3  0x00007f28b75aaf7f __clone (libc.so.6)

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f28b2fce188 in sna_set_cursor_position (scrn=0x1a2b700, x=119, y=523) at sna_display.c:6333
6333	                        CursorBitsPtr bits = ref->bits;

(gdb) l
6331			if (crtc->transform_in_use) {
6332	                        CursorPtr ref = sna->cursor.ref;
6333	                        CursorBitsPtr bits = ref->bits;
6334				int xhot = bits->xhot;
6335				int yhot = sna->cursor.ref->bits->yhot;
6336				struct pict_f_vector v, hot;

(gdb) p sna->cursor.ref
$1 = (CursorPtr) 0x2478ef0
(gdb) p *sna->cursor.ref
$2 = {bits = 0x2478f28, foreRed = 0, foreGreen = 0, foreBlue = 0, backRed = 65535, backGreen = 65535, 
  backBlue = 65535, refcnt = 4, devPrivates = 0x2478f20, id = 20973559, serialNumber = 1368, name = 0}

(gdb) p sna->cursor
$3 = {cursors = 0x1eba540, info = 0x1a37c80, ref = 0x2478ef0, serial = 47981, fg = 4278190080, bg = 4294967295, 
  size = 64, disable = false, active = true, last_x = 119, last_y = 523, max_size = 256, use_gtt = true, 
  num_stash = 0, stash = 0x1e6f980, scratch = 0x7f28b99ac010}
Comment 6 Chris Wilson 2017-01-20 09:52:36 UTC
Created attachment 129063 [details] [review]
Take input lock for CheckHWCursor

Missed a rather important one where we update the hw cursor.
Comment 7 Chris Wilson 2017-01-20 09:55:44 UTC
Created attachment 129064 [details] [review]
Take input lock for CheckHWCursor

s/break/goto unlock/
Comment 8 Chris Wilson 2017-01-21 10:39:24 UTC
*** Bug 99431 has been marked as a duplicate of this bug. ***
Comment 9 Alex Fiestas 2017-01-22 00:46:20 UTC
I'm hitting the same issue, the transformation instead of rotation is scale, but I guess that what matters is having a transformation.

The issue is not reproducible with xorg-server 1.18.4.
Comment 10 Alex Fiestas 2017-01-22 00:56:11 UTC
The crash seems to be gone after all three patches are applied.

Before patches I had a 100% success ratio crashing Xorg within 10seconds of moving the mouse at the top left corner of my screen.

After patches I have not been able to crash it.

Thanks Chris!
Comment 11 Igor Mammedov 2017-01-27 11:44:05 UTC
(In reply to Chris Wilson from comment #7)
> Created attachment 129064 [details] [review] [review]
> Take input lock for CheckHWCursor
> 
> s/break/goto unlock/

it hasn't crashed for a week with this and previous 2 patches.
Taking in account other positive feedback, I'd consider issue as fixed.
Comment 12 ian.frost 2017-01-27 21:23:53 UTC
Hi 
Excuse the dumb question|
Where do I get these Patches from?
Comment 13 Peter Hutterer 2017-02-01 03:01:35 UTC
fwiw, these patches were in a scratch build here and it looks like they fix the crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1384486#c77
Comment 14 languitar 2017-02-07 10:14:12 UTC
I am experiencing this as well with a rotated screen. Would be nice if the patches could get applied soon.
Comment 15 Peter Hutterer 2017-02-07 23:53:02 UTC
commits 7198a6d4e74f684cb383b3e0f70dd2bae405e6e7, 
cfddd919cce4178baba07959e5e862d02e166522 and
3eb964e25243056dd998f52d3b00171b71c89189
Comment 16 Chris Wilson 2017-02-10 20:03:39 UTC
*** Bug 99548 has been marked as a duplicate of this bug. ***
Comment 17 Chris Wilson 2017-02-10 22:21:55 UTC
*** Bug 99084 has been marked as a duplicate of this bug. ***
Comment 18 Tsu Jan 2017-08-15 18:27:03 UTC
The crash still happens with xorg-server-1.19.3-2 and with the same backtrace (miPointerSetPosition), although maybe less frequently. I have Intel with modesetting under Manjaro.

The crash is random and I haven't found a way to reproduce it but it may be related to fast cursor movements.
Comment 19 Tsu Jan 2017-08-15 18:56:28 UTC
On second thought, this may be another bug:

Stack trace of thread 9284:
#0  0x0000000000583652 n/a (Xorg)
#1  0x0000000000584504 miPointerSetPosition (Xorg)
#2  0x000000000044cfde n/a (Xorg)
#3  0x000000000044d7e3 n/a (Xorg)
#4  0x000000000044f08f GetPointerEvents (Xorg)
#5  0x000000000044f640 QueuePointerEvents (Xorg)
#6  0x00007f979de361e5 n/a (libinput_drv.so)
#7  0x00007f979de36d60 n/a (libinput_drv.so)
#8  0x000000000059a4bc n/a (Xorg)
#9  0x000000000059cbb1 n/a (Xorg)
#10 0x000000000059a316 n/a (Xorg)
#11 0x00007f97a5957049 start_thread (libpthread.so.0)
#12 0x00007f97a5697f0f __clone (libc.so.6)

Stack trace of thread 9282:
#0  0x00007f97a596054c __lll_lock_wait (libpthread.so.0)
#1  0x00007f97a5959976 pthread_mutex_lock (libpthread.so.0)
#2  0x000000000059a200 input_lock (Xorg)
#3  0x0000000000595752 TimerSet (Xorg)
#4  0x000000000054b4c2 AccessXFilterReleaseEvent (Xorg)
#5  0x000000000057ac39 n/a (Xorg)
#6  0x00000000004d4675 n/a (Xorg)
#7  0x00000000004369e5 n/a (Xorg)
#8  0x000000000043a968 n/a (Xorg)
#9  0x00007f97a55ca4ca __libc_start_main (libc.so.6)
#10 0x000000000042464a _start (Xorg)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.