Bug 23100 - XTest-related crash in master
Summary: XTest-related crash in master
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Input/other (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Peter Hutterer
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-02 21:01 UTC by Tom Jaeger
Modified: 2009-08-12 21:40 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
test case (407 bytes, text/x-csrc)
2009-08-02 21:01 UTC, Tom Jaeger
no flags Details
0001-Xi-get-device-changed-data-from-DeviceChangedEvents-.patch (3.65 KB, patch)
2009-08-02 23:50 UTC, Peter Hutterer
no flags Details | Splinter Review
new test case (858 bytes, text/x-csrc)
2009-08-07 13:41 UTC, Tom Jaeger
no flags Details
0001-Xext-allocate-a-separate-event-list-for-XTest-events.patch (2.90 KB, patch)
2009-08-09 17:10 UTC, Peter Hutterer
no flags Details | Splinter Review

Description Tom Jaeger 2009-08-02 21:01:07 UTC
Created attachment 28277 [details]
test case

I've finally found a way to reproduce the crash I've been seeing for a while: Run the attached program, which just moves the pointer in a circle using XTestFakeMotionEvent.  The X server crashes as soon as I use my wacom pen (the trackpoint seems to work fine).
Comment 1 Tom Jaeger 2009-08-02 21:06:05 UTC
(One possible) Backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fa86d0 (LWP 25856)]
0x081725cb in ListKeyInfo (dev=0xcc26230, info=0xce595f8) at ../../Xi/xiquerydevice.c:293
293         XkbDescPtr xkb = dev->key->xkbInfo->desc;
(gdb) bt
#0  0x081725cb in ListKeyInfo (dev=0xcc26230, info=0xce595f8) at ../../Xi/xiquerydevice.c:293
#1  0x0816af0c in XISendDeviceChangedEvent (device=0xcc26230, master=0xcbfc540, dce=0xa1dd270) at ../../Xi/exevents.c:732
#2  0x0816b05f in ChangeMasterDeviceClasses (device=0xcc26230, dce=0xa1dd270) at ../../Xi/exevents.c:768
#3  0x0816b0ef in UpdateDeviceState (device=0xcc26230, event=0xa1dd270) at ../../Xi/exevents.c:800
#4  0x0816bbc5 in ProcessOtherEvent (ev=0xa1dd270, device=0xcc26230) at ../../Xi/exevents.c:1044
#5  0x0817bac0 in ProcessPointerEvent (ev=0xa1dd270, mouse=0xcc26230) at ../../xkb/xkbAccessX.c:733
#6  0x080a9445 in mieqProcessDeviceEvent (dev=0xcc26230, event=0xa1dd270, screen=0x0) at ../../mi/mieq.c:396
#7  0x08136ebe in ProcXTestFakeInput (client=0xd3f2e80) at ../../Xext/xtest.c:429
#8  0x08136fee in ProcXTestDispatch (client=0xd3f2e80) at ../../Xext/xtest.c:464
#9  0x0807a207 in Dispatch () at ../../dix/dispatch.c:426
#10 0x080670f0 in main (argc=10, argv=0xbfd50fb4, envp=0xbfd50fe0) at ../../dix/main.c:282

dev->key is NULL here.
Comment 2 Peter Hutterer 2009-08-02 22:42:48 UTC
verified, thanks for the test program.

Caused by a race condition where the device changes before the device changed event is processed internally.
Comment 3 Peter Hutterer 2009-08-02 23:50:14 UTC
Created attachment 28278 [details] [review]
0001-Xi-get-device-changed-data-from-DeviceChangedEvents-.patch

needs a bit more cleanup and there are two values we don't have in the internal DCE. But it fixes the crash for me and it's the right approach.

If you could test that, much appreciated.
Comment 4 Tom Jaeger 2009-08-03 07:26:38 UTC
(In reply to comment #3)
> Created an attachment (id=28278) [details]
> 0001-Xi-get-device-changed-data-from-DeviceChangedEvents-.patch
> 
> needs a bit more cleanup and there are two values we don't have in the internal
> DCE. But it fixes the crash for me and it's the right approach.
> 
> If you could test that, much appreciated.
> 

Thanks, this improves the situation somewhat.  The server won't crash right away now, but only after a few seconds of playing with the pen.  Backtrace below:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7f9c6d0 (LWP 4733)]
0x00f5e422 in __kernel_vsyscall ()
(gdb) bt
#0  0x00f5e422 in __kernel_vsyscall ()
#1  0x002396d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0x0023b098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0x0027724d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4  0x0027d604 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5  0x0027f5b6 in free () from /lib/tls/i686/cmov/libc.so.6
#6  0x080b2198 in Xfree (ptr=0xc4ff7b8) at ../../os/utils.c:1159
#7  0x0816bb03 in ProcessRawEvent (ev=0x9520270, device=0xbf69230)
    at ../../Xi/exevents.c:1030
#8  0x0816bbba in ProcessOtherEvent (ev=0x9520270, device=0xbf69230)
    at ../../Xi/exevents.c:1061
#9  0x0817bd44 in ProcessPointerEvent (ev=0x9520270, mouse=0xbf69230)
    at ../../xkb/xkbAccessX.c:733
#10 0x080a9445 in mieqProcessDeviceEvent (dev=0xbf69230, event=0x9520270,
    screen=0x0) at ../../mi/mieq.c:396
#11 0x08136ebe in ProcXTestFakeInput (client=0xc915498)
    at ../../Xext/xtest.c:429
#12 0x08136fee in ProcXTestDispatch (client=0xc915498)
    at ../../Xext/xtest.c:464
#13 0x0807a207 in Dispatch () at ../../dix/dispatch.c:426
#14 0x080670f0 in main (argc=10, argv=0xbfd9d434, envp=0xbfd9d460)
    at ../../dix/main.c:282
Comment 5 Peter Hutterer 2009-08-05 23:35:47 UTC
this is definitely a different crash but I haven't figured out what the reason for it is. valgrind complains in eventToRawEvent.

the best trigger I found so far is to twist the pen, but that doesn't always work either.
Comment 6 Peter Hutterer 2009-08-06 22:33:06 UTC
I've spent the last two days trying to figure out the cause and failed.
What I've found so far:
in eventToRawEvent, count_bits returns the number of bits set in the event. Based on that memory is allocated and later filled.

When the memory is filled, the BitIsOn condition later the number of bits set exceeds what was reported before and we get a memory overrun.
usually, if bits 1 and 2 are set BitIsOn also reports for bits 33 to 35. So some pointer is out and overwrites memory.

The problem though is that it is extremely hard to reproduce. I had a run yesterday where a twist of the pen would reproduce it more or less reliably, today I've had it run for ages and not see anything. This includes runs with a wacom virtual test device, no success.
If you manage to script a testcase that triggers this reliably, this would be much appreciated.

That aside, I'll push the patch since it addresses one issue.
Comment 7 Tom Jaeger 2009-08-07 13:41:45 UTC
Created attachment 28431 [details]
new test case
Comment 8 Tom Jaeger 2009-08-07 13:52:47 UTC
I can reproduce the issue reliably now:  Set a breakpoint at xtest.c:428, then use the new test program (which fakes device events instead of core events, you might need to change the name of the device), and continue execution after the breakpoint has been hit.

Here's a backtrace of where the memory gets overwritten:

#0  0x00410be7 in memset () from /lib/tls/i686/cmov/libc.so.6
#1  0x0810ad8c in init_event (dev=0xc95fc58, event=0x9de7978, ms=54232152) at ../../dix/getevents.c:132
#2  0x0810c960 in GetKeyboardValuatorEvents (events=0x9de7258, pDev=0xc95fc58, type=3, key_code=36, first_valuator=0, num_valuators=0, valuators=0x0)
    at ../../dix/getevents.c:919
#3  0x0810c756 in GetKeyboardEvents (events=0x9de7250, pDev=0xc95fc58, type=3, key_code=36) at ../../dix/getevents.c:846
#4  0x080d1db6 in xf86PostKeyEventP (device=0xc95fc58, key_code=36, is_down=0, is_absolute=0, first_valuator=0, num_valuators=0, valuators=0x0)
    at ../../../../hw/xfree86/common/xf86Xinput.c:950
#5  0x080d1e41 in xf86PostKeyboardEvent (device=0xc95fc58, key_code=36, is_down=0) at ../../../../hw/xfree86/common/xf86Xinput.c:964
#6  0x0058959d in ?? () from /usr/lib/xorg/modules/input/evdev_drv.so
#7  0x080c7f0b in xf86SigioReadInput (fd=12, closure=0xc95ea38) at ../../../../hw/xfree86/common/xf86Events.c:304
#8  0x081ad2c0 in xf86SIGIO (sig=29) at ../../../../../hw/xfree86/os-support/linux/../shared/sigio.c:118
#9  <signal handler called>
#10 SmartScheduleTimer (sig=14) at ../../os/utils.c:1237
#11 <signal handler called>
#12 ProcXTestFakeInput (client=0xd1403a8) at ../../Xext/xtest.c:428
#13 0x08130a0f in ProcXTestDispatch (client=0xd1403a8) at ../../Xext/xtest.c:464
#14 0x08079a4b in Dispatch () at ../../dix/dispatch.c:426
#15 0x080670be in main (argc=8, argv=0xbfaa8844, envp=0xbfaa8868) at ../../dix/main.c:282

So the issue here is that processing of an XTest-generated event is interrupted in a signal handler by an event coming from a real device.
Comment 9 Peter Hutterer 2009-08-09 17:10:10 UTC
Created attachment 28460 [details] [review]
0001-Xext-allocate-a-separate-event-list-for-XTest-events.patch

Thank you! found and fixed, problem is caused by XTest using the same event list as the SIGIO handling code, thus overwriting the event in some cases.

Please give this patch a try before I push it.
Comment 10 Tom Jaeger 2009-08-09 20:57:42 UTC
(In reply to comment #9)
> Created an attachment (id=28460) [details]
> 0001-Xext-allocate-a-separate-event-list-for-XTest-events.patch
> 
> Thank you! found and fixed, problem is caused by XTest using the same event
> list as the SIGIO handling code, thus overwriting the event in some cases.
> 
> Please give this patch a try before I push it.
> 

While this patch probably alleviates the symptoms, I feel like it doesn't address the underlying issue that client request handling and input event handling operate on the same data in a not "thread"-safe way, but the former might be interrupted by the latter at any time.  I'll see if I can come up with a scenario where this is a problem, and I still have another unexplained crash that I need to investigate which might have the same cause.
Comment 11 Peter Hutterer 2009-08-09 21:05:45 UTC
signal handling does the following:
- signal interrupts
- GetPointerEvents & friends
- stack events onto event queue
- finish signal handler, resume

Then, in the main loop it's basically
- process client requests
- process events

XTestFakeInput does
- GetPointerEvents & friends
- process events

so even though xtest may get interrupted, the events generated during this interrupt won't be processed until the XTest request has been finished (and the fake events fully processed).
this bug is caused because xtest shared some data with the signal handler, now that it's separated the SIGIO shouldn't interfer at all.
Comment 12 Tom Jaeger 2009-08-09 21:19:46 UTC
> so even though xtest may get interrupted, the events generated during this
> interrupt won't be processed until the XTest request has been finished (and the
> fake events fully processed).
> this bug is caused because xtest shared some data with the signal handler, now
> that it's separated the SIGIO shouldn't interfer at all.
> 

Thanks for the explanation, I'm testing this patch right now.  I'll let you know if there's any problems I encounter.
Comment 13 Peter Hutterer 2009-08-12 21:40:26 UTC
both patches pushed. Please let me know if there's any other issues with this.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.