Created attachment 129794 [details] Reproducer helper Description: Xorg 1.19.x crashes in intel driver with an abort in some random cases. Steps to reproduce: 1. Build and run the attached test program 2. Hover the stylus of a wacom tablet to cause the map/unmaps of the window (the test app hides and show a window after n X events) Actual result: After some time, Xorg crashes with a segfault: (EE) Segmentation fault at address 0x80 #0 0x00000000071b491f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00000000071b651a in __GI_abort () at abort.c:89 #2 0x00000000005a15ce in OsAbort () at utils.c:1355 #3 0x00000000005a7203 in AbortServer () at log.c:877 #4 0x00000000005a7fed in FatalError (f=f@entry=0x5d56f0 "Caught signal %d (%s). Server aborting\n") at log.c:1015 #5 0x000000000059e86e in OsSigHandler (signo=11, sip=<optimized out>, unused=<optimized out>) at osinit.c:154 #6 0x0000000006f725c0 in <signal handler called> () at /lib64/libpthread.so.0 Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x80: #7 0x000000000ab8bade in __kgem_busy (handle=#8 0x000000000ab8bade in kgem_retire__requests_ring (ring=<optimized out>, kgem=<optimized out>) at kgem.c:3226 #9 0x000000000ab8bade in kgem_retire__requests (kgem=0x9841000) at kgem.c:3260 #10 0x000000000ab8bade in kgem_retire (kgem=0x9841000) at kgem.c:3276 #11 0x000000000abc2b45 in sna_accel_block (sna=0x9841000, tv=tv@entry=0xfff0007a8) at sna_accel.c:18312 #12 0x000000000abe10ad in sna_block_handler (data=<optimized out>, _timeout=0xfff000814) at sna_driver.c:761 #13 0x000000000043b9de in BlockHandler (pTimeout=pTimeout@entry=0xfff000814) at dixutils.c:388 #14 0x0000000000598361 in WaitForSomething (are_ready=<optimized out>) at WaitFor.c:219 #15 0x0000000000436dca in Dispatch () at dispatch.c:422 #16 0x000000000043b018 in dix_main (argc=15, argv=0xfff0009f8, envp=<optimized out>) at main.c:287 #17 0x000000000719f401 in __libc_start_main (main= 0x424cc0 <main>, argc=15, argv=0xfff0009f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0xfff0009e8) at ../csu/libc-start.c:289 #18 0x0000000000424cfa in _start ()
Created attachment 129795 [details] valgrind log
Created attachment 129796 [details] (gdb) t a a bt f
I wonder if this could be another (missing?) input_lock() issue similar to bug 99358 but here I don't see much pointing toward the cursor or input code neither in the valgrind logs nor the backtrace.
Hmm, the valgrind log should be less noisy if the ddx is compiled with --enable-debug. It should mention successfully detecting vg in the compile and when using the new intel_drv.so under valgrind. As I understand the test, it creates a new client, changes the pointer, waits for the cursor to move, then exits and repeats. To clarify, the XDefinteCursor() step is critical for reproducing? The forking should be irrelevant, the client id should be recycled so that too should not be an issue. Suggests that perhaps the test is just to rapidly change the cursor whilst processing pointer events. Or it is nothing to do with the pointer, and is the window map/unmapping. Neither are particularly rare paths :| Didn't spot anything in the vg log to suggest otherwise, my first suspicion is the noise is masking a real out-of-bounds write.
It's not a reproducer I forged specifically for this issue, just an "old" reproducer I had for another bug of mine, but the issue is happens in real life and occurs even outside of valgrind (as found in the downstream bug I linked). I'll rebuild the intel driver with debug enabled, see what it gives.
Created attachment 129797 [details] valgrind log Right, I suspect the intel driver is not the culprit but probably a victim of another memory corruption that occurs earlier. I just had a crash with a complete different backtrace that has nothing to do with the intel driver: #0 0x00000000071b491f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00000000071b651a in __GI_abort () at abort.c:89 #2 0x00000000005a15ce in OsAbort () at utils.c:1355 #3 0x00000000005a7203 in AbortServer () at log.c:877 #4 0x00000000005a7fed in FatalError (f=f@entry=0x5d56f0 "Caught signal %d (%s). Server aborting\n") at log.c:1015 #5 0x000000000059e86e in OsSigHandler (signo=11, sip=<optimized out>, unused=<optimized out>) at osinit.c:154 #6 0x0000000006f725c0 in <signal handler called> () at /lib64/libpthread.so.0 #7 0x000000000059e1ec in WriteToClient (who=who@entry=0x972d060, count=count@entry=32, __buf=__buf@entry=0xfff000810) at io.c:705 #8 0x0000000000445749 in ProcGetInputFocus (client=0x972d060) at events.c:4861 #9 0x0000000000437095 in Dispatch () at dispatch.c:479 #10 0x000000000043b018 in dix_main (argc=15, argv=0xfff0009f8, envp=<optimized out>) at main.c:287 #11 0x000000000719f401 in __libc_start_main (main= 0x424cc0 <main>, argc=15, argv=0xfff0009f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0xfff0009e8) at ../csu/libc-start.c:289
Moving to xorg server
Looks like bug 99164.
Created attachment 129820 [details] anothe rvalgrind logs Interesting point here is the crash occurs in the InputThread. The corresponding backtrace from core file is even more instructing: #0 0x0000000006f8b91f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x0000000006f8d51a in __GI_abort () at abort.c:89 #2 0x00000000005a156e in OsAbort () at utils.c:1355 #3 0x00000000005a7163 in AbortServer () at log.c:877 #4 0x00000000005a7f4d in FatalError (f=f@entry=0xaaf8ecf "%s:%d assertion '%s' failed\n") at log.c:1015 #5 0x000000000a974e36 in _kgem_submit (kgem=kgem@entry=0x9617000) at kgem.c:4134 #6 0x000000000a9b9f64 in kgem_submit (kgem=0x9617000) at kgem.h:382 #7 0x000000000a9b9f64 in sna_accel_flush (sna=sna@entry=0x850800 <FlushCallback>) at sna_accel.c:17375 #8 0x000000000a9b9fec in sna_flush_callback (list=<optimized out>, user_data=0x850800 <FlushCallback>, call_data=<optimized out>) at sna_accel.c:17399 #9 0x000000000043c3e4 in _CallCallbacks (pcbl=0x36dc6b30, pcbl@entry=0x850800 <FlushCallback>, call_data=0x20, call_data@entry=0xb0bc150) at dixutils.c:737 #10 0x000000000059dbd3 in CallCallbacks (call_data=0xb0bc150, pcbl=0x850800 <FlushCallback>) at ../include/callback.h:83 #11 0x000000000059dbd3 in FlushClient (who=who@entry=0xb0bc150, oc=oc@entry=0x9552800, __extraBuf=__extraBuf@entry=0x11027a60, extraCount=extraCount@entry=32) at io.c:809 #12 0x000000000059e13f in WriteToClient (who=who@entry=0xb0bc150, count=count@entry=32, __buf=__buf@entry=0x11027a60) at io.c:763 #13 0x0000000000442572 in WriteEventsToClient (pClient=pClient@entry=0xb0bc150, count=<optimized out>, count@entry=1, events=events@entry=0x11027a60) at events.c:6000 #14 0x0000000000442710 in TryClientEvents (client=0xb0bc150, dev=<optimized out>, pEvents=0x11027a60, count=1, mask=<optimized out>, filter=<optimized out>, grab=0x0) at events.c:2021 #15 0x0000000000445e7a in DeliverEventToInputClients (dev=dev@entry=0xfc85870, inputclients=0xd5726f0, win=win@entry=0xb2fed40, events=events@entry=0x11027a60, count=count@entry=1, filter=filter@entry=16, grab=0x0, client_return=0x11027998, mask_return=0x11027994) at events.c:2170 #16 0x0000000000446167 in DeliverEventToWindowMask (mask_return=0x11027994, client_return=0x11027998, grab=0x0, filter=16, count=1, events=0x11027a60, win=0xb2fed40, dev=0xfc85870) at events.c:2213 #17 0x0000000000446167 in DeliverEventsToWindow (pDev=pDev@entry=0xfc85870, pWin=pWin@entry=0xb2fed40, pEvents=pEvents@entry=0x11027a60, count=count@entry=1, filter=filter@entry=16, grab=grab@entry=0x0) at events.c:2277 #18 0x00000000005283b9 in SendEventToAllWindows (dev=dev@entry=0xfc85870, mask=16, ev=ev@entry=0x11027a60, count=count@entry=1) at exevents.c:3045 #19 0x0000000000533d1f in send_property_event (dev=dev@entry=0xfc85870, property=<optimized out>, what=<optimized out>) at xiproperty.c:208 #20 0x00000000005346d8 in XIChangeDeviceProperty (dev=0xfc85870, property=<optimized out>, type=<optimized out>, format=<optimized out>, mode=<optimized out>, len=<optimized out>, value=0x11027b50, sendevent=1) at xiproperty.c:802 #21 0x000000000fe1d039 in wcmUpdateSerial () at /usr/lib64/xorg/modules/input/wacom_drv.so #22 0x000000000fe131c2 in wcmSendEvents () at /usr/lib64/xorg/modules/input/wacom_drv.so #23 0x000000000fe14764 in wcmEvent () at /usr/lib64/xorg/modules/input/wacom_drv.so #24 0x000000000fe1a476 in usbParse () at /usr/lib64/xorg/modules/input/wacom_drv.so #25 0x000000000fe11feb in wcmReadPacket () at /usr/lib64/xorg/modules/input/wacom_drv.so #26 0x000000000fe12226 in wcmDevReadInput () at /usr/lib64/xorg/modules/input/wacom_drv.so #27 0x000000000059cc8c in InputReady (fd=32, xevents=1, data=0xfc94e90) at inputthread.c:173 #28 0x000000000059f271 in ospoll_wait (ospoll=0xd1172f0, timeout=timeout@entry=-1) at ospoll.c:412 #29 0x000000000059cae6 in InputThreadDoWork (arg=<optimized out>) at inputthread.c:360 #30 0x0000000006d3f6ca in start_thread (arg=0x11029700) at pthread_create.c:333 #31 0x000000000705df7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
ok, that's a bug in the wacom driver. The driver should not write events from within the input thread. I'll get that fixed
fix for the wacom driver: https://sourceforge.net/p/linuxwacom/mailman/message/35683033/ patch to help identify where we post stuff from within the input thread: https://patchwork.freedesktop.org/patch/140413/
commit f65d97362813ea89d18e5049ad87bc468ec30f85 Author: Peter Hutterer <peter.hutterer@who-t.net> Date: Thu Feb 23 09:47:27 2017 +1000 Revert to Update properties with a timer func and in the server: commit 1b12249fd6d4d3b566043d556503f3f625b9b4ba Author: Peter Hutterer <peter.hutterer@who-t.net> Date: Thu Feb 23 10:21:49 2017 +1000 os: log a bug whenever WriteToClient is called from the input thread
*** Bug 99164 has been marked as a duplicate of this bug. ***
*** Bug 100033 has been marked as a duplicate of this bug. ***
Since this patch using a Wacom tablet will causes constant freezes. Everything freezes for about a second every 1-5 seconds while the stylus is moving over the tablet. The tablet is now completely unusable!
Addition: Removing xf86-input-wacom stops the freezing, but it also removes the ability to have custom tablet and stylus configuration, which can be essential. Is this something the patch submitted to linuxwacom is supposed to fix? As if not, then this ticket needs to be reopened.
(In reply to Storm Engineer from comment #15) > Since this patch using a Wacom tablet will causes constant freezes. > Everything freezes for about a second every 1-5 seconds while the stylus is > moving over the tablet. > > The tablet is now completely unusable! Ther eare two patches, one for the Xorg server to log the issue and avoid the crash, and one to actualyl fix the bug in xf86-input-wacom Are you using the version of xf86-input-wacom which actually has the fix?
(In reply to Olivier Fourdan from comment #17) > Are you using the version of xf86-input-wacom which actually has the fix? Note, you can easily check this by looking at the Xorg logs, if you see tons of messages like: (EE) BUG: triggered 'if (in_input_thread())' (EE) BUG: io.c:658 in WriteToClient() (EE) ******** WriteToClient called from input thread ********* Followed by a backtrace, then you are using an affected version of xf86-input-wacom which does not contain the actual fix for this bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.