Summary: | infinite loop when lid events are repeated - list item points to itself | ||
---|---|---|---|
Product: | Wayland | Reporter: | Alban Browaeys <prahal> |
Component: | libinput | Assignee: | Wayland bug list <wayland-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | peter.hutterer, prahal |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
lid close-open evemu record
debounce lid events debug: trigger a core dump on list corruption debug: hang cpu - session without patch : action sequence as directory name - all events at once debug: hang or assert - patched the version of the patch I applied for session1 - changed to apply against 1.8 patch adapted to branch 1.8 - works fine |
Created attachment 134868 [details] [review] debounce lid events The patch remove the event before adding it anew. It applies against 1.8.3 which is the version I tested against (debian sid). This fixes the issue that the same event is inserted twice into the list. That we then ends up with a list item that points to itself. I also safeguards the removal of this event listener by this keyboard_is_monitored flag. The flag tells we already attached this keyboard listener so that even if the state is_closed is not switched yet we avoid duplicates. I've tried to reproduce this with a test case, but I can't quite figure out what the trigger is. It indicates libinput getting out of sync somewhere with the state of the device, but that's a bug that needs to be fixed where it happens. A simple boolean guard isn't going to be enough here. Can you try to figure out the exact sequence of events that needs to happen for this to occur please? I think I am seeing the same thing with Ubuntu 17.10, libinput 1.8.2-2. The patch applies cleanly, I will see if it helps. I was able to trigger the crash pretty easily with the old library still in use by opening and closing the lid a few times in quick succession and then trying to type an update to this bug. After restarting I was not immediately able to trigger it, but instead brought gnome-shell down with an assertion (g_malloc failed to allocate 1.8x10^19 bytes). Working down the stack a few frames it looked like it was trying to use an invalid structure, presumably in memory which had already been freed. Which at least raises the possibility that gnome-shell has some memory use bug which triggers on suspend and resume and that libinput was collateral damage. If it is of interest, I am continuing investigation on Launchpad bug 1724259[1]. [1] https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/1724259 Note that the more applications (windows) running, the easier it is to trigger this. Sorry taking me a while to sort out how to setup a test/dev environment in a VM. About the crash I think it is not directly related to this issue. The current issue might prevent an user to reach the crash in gnome-shell. In the crash I get gnome-shell hangs (100%cpu , libinput is in a busy processing LID event list). But no crash. With the fix I get a crash at random times but pretty much nailed down : shell_gtk_embed_window_created_cb is called with its embed argument freed and dereference it. Sadly I cannot seem to sort out how to reproduce (even though it kills my session every 2 days). I see that with coredumpctl and gdb. But I believe the current issue is higher priority. When libinput loop I cannot even switch to a VT to kill it. Only way out is to ssh to the box and mess with gdb. Created attachment 135092 [details] [review] debug: trigger a core dump on list corruption I send this hack to get a coredump / backtrace instead of busy loop to help other debug this issue. I will investigate anew as time passed and I somehow lost track of the in and outs of this fix. The core of the debug with this debug patch I get these journald entries and the adhoc coredumpctl backtrace. Note that 0x5647705b6c20 is added 0x564770541940 twice in a raw. sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device init event listener - 0x564770541940 which has prev (nil) and next (nil) sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - to 0x564770545110 sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - 0x5647705a72e0 which has prev (nil) and next (nil) sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device POST add event listener - 0x5647705a72e0 which has prev 0x564770545110 and next 0x564770545110 sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - to 0x5647705b6c20 sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - 0x5647705a71a8 which has prev (nil) and next (nil) sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device POST add event listener - 0x5647705a71a8 which has prev 0x5647705b6c20 and next 0x5647705b6c20 sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - to 0x56477059c930 sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - 0x5647705a70f8 which has prev (nil) and next (nil) sept. 27 02:32:55 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device POST add event listener - 0x5647705a70f8 which has prev 0x56477059c930 and next 0x56477059c930 sept. 27 02:33:14 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - to 0x5647705b6c20 sept. 27 02:33:14 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - 0x564770541940 which has prev 0x564770541940 and next 0x564770541940 sept. 27 02:33:14 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device POST add event listener - 0x564770541940 which has prev 0x5647705b6c20 and next 0x5647705a71a8 sept. 27 02:33:18 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - to 0x5647705b6c20 sept. 27 02:33:18 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device PRE add event listener - 0x564770541940 which has prev 0x5647705b6c20 and next 0x5647705a71a8 sept. 27 02:33:18 cyclope org.gnome.Shell.desktop[14480]: LIBINPUT : device POST add event listener - 0x564770541940 which has prev 0x564770541940 and next 0x564770541940 PID: 14799 (gnome-shell) UID: 1000 (prahal) GID: 1000 (prahal) Signal: 6 (ABRT) Timestamp: Wed 2017-09-27 02:33:18 CEST (1min 1s ago) Command Line: /usr/bin/gnome-shell Executable: /usr/bin/gnome-shell Control Group: /user.slice/user-1000.slice/session-396.scope Unit: session-396.scope Slice: user-1000.slice Session: 396 Owner UID: 1000 (prahal) Boot ID: c29df59057834e2bb77ae569ae135cfb Machine ID: 2b483dcbcecb6729df407c5b5382b0d1 Hostname: cyclope Storage: /var/lib/systemd/coredump/core.gnome-shell.1000.c29df59057834e2bb77ae569ae135cfb.14799.1506472398000000.lz4 Message: Process 14799 (gnome-shell) of user 1000 dumped core. Stack trace of thread 14799: #0 0x00007fd6b1220fff __GI_raise (libc.so.6) #1 0x00007fd6b122242a __GI_abort (libc.so.6) #2 0x00007fd6a859cea4 create_dump (libinput.so.10) #3 0x00007fd6a85a626b lid_switch_process_switch (libinput.so.10) #4 0x00007fd6a85a10a9 evdev_process_event (libinput.so.10) #5 0x00007fd6a859cd2f libinput_dispatch (libinput.so.10) #6 0x00007fd6b22feeac dispatch_libinput (libmutter-clutter-1.0.so) #7 0x00007fd6b17f2f67 g_main_dispatch (libglib-2.0.so.0) #8 0x00007fd6b17f31a0 g_main_context_iterate (libglib-2.0.so.0) #9 0x00007fd6b17f34b2 g_main_loop_run (libglib-2.0.so.0) #10 0x00007fd6b2fa6a8c meta_run (libmutter.so.0) #11 0x000056476f0224b7 main (gnome-shell) #12 0x00007fd6b120e2e1 __libc_start_main (libc.so.6) #13 0x000056476f0225fa _start (gnome-shell) Thread 1 (Thread 0x7fd6b47a9a80 (LWP 14799)): #0 0x00007fd6b1220fff in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007fd6b122242a in __GI_abort () at abort.c:89 #2 0x00007fd6a859cea4 in create_dump () at libinput.c:2013 #3 0x00007fd6a859cea4 in libinput_device_add_event_listener (device=<optimized out>, listener=0x564770541940, notify_func=<optimized out>, notify_func_data=<optimized out>) at libinput.c:2047 #4 0x00007fd6a85a626b in lid_switch_process_switch (e=<optimized out>, e=<optimized out>, time=105062899623, device=0x5647705450f0, dispatch=0x5647705418f0) at evdev-lid.c:135 #5 0x00007fd6a85a626b in lid_switch_process (evdev_dispatch=0x5647705418f0, device=0x5647705450f0, event=<optimized out>, time=105062899623) at evdev-lid.c:155 #6 0x00007fd6a85a10a9 in evdev_process_event (e=0x7ffd18541870, device=0x5647705450f0) at evdev.c:1904 #7 0x00007fd6a85a10a9 in evdev_device_dispatch_one (ev=0x7ffd18541870, device=0x5647705450f0) at evdev.c:1912 #8 0x00007fd6a85a10a9 in evdev_device_dispatch (data=0x5647705450f0) at evdev.c:1971 #9 0x00007fd6a859cd2f in libinput_dispatch (libinput=0x56477052bdf0) at libinput.c:1998 #10 0x00007fd6b22feeac in dispatch_libinput (manager_evdev=0x56477052a8c0 [ClutterDeviceManagerEvdev]) at evdev/clutter-device-manager-evdev.c:788 #11 0x00007fd6b22feeac in clutter_event_dispatch (g_source=g_source@entry=0x5647705d7de0, callback=<optimized out>, user_data=<optimized out>) at evdev/clutter-device-manager-evdev.c:810 #12 0x00007fd6b17f2f67 in g_main_dispatch (context=0x56477035bc00) at ../../../../glib/gmain.c:3148 #13 0x00007fd6b17f2f67 in g_main_context_dispatch (context=context@entry=0x56477035bc00) at ../../../../glib/gmain.c:3813 #14 0x00007fd6b17f31a0 in g_main_context_iterate (context=0x56477035bc00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../../glib/gmain.c:3886 #15 0x00007fd6b17f34b2 in g_main_loop_run (loop=0x5647706054b0) at ../../../../glib/gmain.c:4082 #16 0x00007fd6b2fa6a8c in meta_run () at core/main.c:572 #17 0x000056476f0224b7 in main (argc=<optimized out>, argv=<optimized out>) at main.c:471 Ok, got it. Sequence is: close lid, press key, open lid, close lid. i.e. the open-lid event has to come in after the keyboard event. Patch is here: https://lists.freedesktop.org/archives/wayland-devel/2017-October/035562.html Created attachment 135130 [details]
debug: hang cpu - session without patch : action sequence as directory name - all events at once
03-lidcloseopenclose-suspend-lidopen_NOK included has the hanging backtrace.
The events are grabbed with this command in current working directory:
parallel -j $(ls -1 /dev/input/event* | wc -l ) sudo evemu-record {} {/}.emu ::: /dev/input/event*
This is of interest as session1 with the patch was not successful. I send the session 1 in a followup comment.
Created attachment 135131 [details]
debug: hang or assert - patched
only the third test has exact actions in directory name. The two others I do not know when suspend completed so is not encoded in their names.
02 is the assert in libinput list_remove
03 is the hang for the issue at stake
Created attachment 135132 [details] [review] the version of the patch I applied for session1 - changed to apply against 1.8 Comment on attachment 135132 [details] [review] the version of the patch I applied for session1 - changed to apply against 1.8 >Once the lid is closed, the keyboard event listener is set up to open the lid >for us on keyboard events. With the right sequence, we can trigger the >listener to be added to the list multiple times, triggering an assert in the >list test code (or an infinite loop in the 1.8 branch). > >Conditions: >* SW_LID value 1 - sets up the keyboard listener >* keyboard event - sets lid_is_closed to false >* SW_LID value 0 - is ignored because we're already open >* SW_LID value 1 - sets up the keyboard listener again > >https://bugs.freedesktop.org/show_bug.cgi?id=103298 > >Signed-off-by: Peter Hutterer <peter.hutterer at who-t.net> >--- > src/evdev-fallback.c | 13 ++++++------- > test/test-switch.c | 40 +++++++++++++++++++++------------------- > 2 files changed, 27 insertions(+), 26 deletions(-) > >--- a/test/test-lid.c >+++ b/test/test-lid.c >@@ -426,26 +426,27 @@ > > keyboard = litest_add_device(li, LITEST_KEYBOARD); > >- litest_lid_action(sw, LIBINPUT_SWITCH_STATE_ON); >- litest_drain_events(li); >- >- litest_event(keyboard, EV_KEY, KEY_A, 1); >- litest_event(keyboard, EV_SYN, SYN_REPORT, 0); >- litest_event(keyboard, EV_KEY, KEY_A, 0); >- litest_event(keyboard, EV_SYN, SYN_REPORT, 0); >- libinput_dispatch(li); >- >- event = libinput_get_event(li); >- litest_is_switch_event(event, >- LIBINPUT_SWITCH_LID, >- LIBINPUT_SWITCH_STATE_OFF); >- >- litest_assert_only_typed_events(li, LIBINPUT_EVENT_KEYBOARD_KEY); >- >- litest_lid_action(sw, LIBINPUT_SWITCH_STATE_OFF); >- litest_assert_empty_queue(li); >- >- libinput_event_destroy(event); >+ for (int i = 0; i < 3; i++) { >+ litest_lid_action(sw, LIBINPUT_SWITCH_STATE_ON); >+ litest_drain_events(li); >+ >+ litest_event(keyboard, EV_KEY, KEY_A, 1); >+ litest_event(keyboard, EV_SYN, SYN_REPORT, 0); >+ litest_event(keyboard, EV_KEY, KEY_A, 0); >+ litest_event(keyboard, EV_SYN, SYN_REPORT, 0); >+ libinput_dispatch(li); >+ >+ event = libinput_get_event(li); >+ litest_is_switch_event(event, >+ LIBINPUT_SWITCH_LID, >+ LIBINPUT_SWITCH_STATE_OFF); >+ libinput_event_destroy(event); >+ >+ litest_assert_only_typed_events(li, LIBINPUT_EVENT_KEYBOARD_KEY); >+ >+ litest_lid_action(sw, LIBINPUT_SWITCH_STATE_OFF); >+ litest_assert_empty_queue(li); >+ } > litest_delete_device(keyboard); > } > END_TEST >--- a/src/evdev-lid.c >+++ b/src/evdev-lid.c >@@ -104,17 +104,16 @@ > if (!dispatch->keyboard.keyboard) > return; > >+ libinput_device_remove_event_listener(&dispatch->keyboard.listener); >+ > if (is_closed) { > libinput_device_add_event_listener( > &dispatch->keyboard.keyboard->base, > &dispatch->keyboard.listener, > lid_switch_keyboard_event, > dispatch); >- } else { >- libinput_device_remove_event_listener( >- &dispatch->keyboard.listener); >- libinput_device_init_event_listener( >- &dispatch->keyboard.listener); >+ >+ libinput_device_init_event_listener(&dispatch->keyboard.listener); > } > } > >@@ -130,11 +129,12 @@ > case SW_LID: > is_closed = !!e->value; > >- if (dispatch->lid_is_closed == is_closed) >- return; > lid_switch_toggle_keyboard_listener(dispatch, > is_closed); > >+ if (dispatch->lid_is_closed == is_closed) >+ return; >+ > dispatch->lid_is_closed = is_closed; > > lid_switch_notify_toggle(dispatch, device, time); sorry I did not saw the else in hunk 2 of the patch against master when porting it to 1.8 branch. I retry (session1 data is thus moot, only session2 unpatched is of interest) This explains the assert. I still test against 1.8. Created attachment 135149 [details] [review] patch adapted to branch 1.8 - works fine I confirm with this "correct" version of the patch the 100% cpu hang is gone. Good job ! commit 41a70bbe924c4867283bfdec67cd76012744afcb Author: Peter Hutterer <peter.hutterer@who-t.net> Date: Fri Oct 27 15:03:52 2017 +1000 fallback: fix lid switch event listener being initialized twice I'll get a new 1.8 release out asap, thanks for testing! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 134867 [details] lid close-open evemu record steps: when closing and opening lid do so more than once before suspend and resume complete then gnome-shell consume 100% cpu and no in - libinput 1.8.3-1 from debian sid - wayland gnome-shell 3.26.1-2 x86_64 - /dev/input/event5: Lid Switch udevadm info /sys/class/input/event5 P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input8/event5 N: input/event5 E: DEVNAME=/dev/input/event5 E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input8/event5 E: ID_INPUT=1 E: ID_INPUT_SWITCH=1 E: ID_PATH=acpi-PNP0C0D:00 E: ID_PATH_TAG=acpi-PNP0C0D_00 E: LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable E: LIBINPUT_DEVICE_GROUP=19/0/5/0:PNP0C0D/button E: MAJOR=13 E: MINOR=69 E: SUBSYSTEM=input E: TAGS=:power-switch: E: USEC_INITIALIZED=39835474 gdb backtrace: (gdb) bt #0 0x00007fb99722f06e in lid_switch_keyboard_event (time=<optimized out>, event=<optimized out>, data=0x560b543f10f0) at evdev-lid.c:98 #1 0x00007fb997225bfd in post_device_event (device=device@entry=0x560b5e87a800, time=time@entry=698204515849, type=type@entry=LIBINPUT_EVENT_KEYBOARD_KEY, event=0x560b5cf64720) at libinput.c:2113 #2 0x00007fb997226f3f in keyboard_notify_key (device=device@entry=0x560b5e87a800, time=time@entry=698204515849, key=<optimized out>, state=<optimized out>) at libinput.c:2212 #3 0x00007fb997229987 in fallback_keyboard_notify_key (device=device@entry=0x560b5e87a800, time=time@entry=698204515849, key=<optimized out>, state=<optimized out>, dispatch=<optimized out>) at evdev.c:169 #4 0x00007fb99722c4de in fallback_process_key (e=0x7ffde9a12f70, e=0x7ffde9a12f70, time=698204515849, device=0x560b5e87a800, dispatch=<optimized out>) at evdev.c:857 #5 0x00007fb99722c4de in fallback_process (evdev_dispatch=<optimized out>, device=0x560b5e87a800, event=0x7ffde9a12f70, time=698204515849) at evdev.c:1168 #6 0x00007fb997229f19 in evdev_process_event (e=0x7ffde9a12f70, device=0x560b5e87a800) at evdev.c:1904 #7 0x00007fb997229f19 in evdev_device_dispatch_one (ev=0x7ffde9a12f70, device=0x560b5e87a800) at evdev.c:1912 #8 0x00007fb997229f19 in evdev_device_dispatch (data=0x560b5e87a800) at evdev.c:1971 #9 0x00007fb997225cef in libinput_dispatch (libinput=0x560b54202aa0) at libinput.c:1996 #10 0x00007fb9a15f2e9c in () at /usr/lib/x86_64-linux-gnu/mutter/libmutter-clutter-1.so #11 0x00007fb9a2976f67 in g_main_dispatch (context=0x560b541f70d0) at ../../../../glib/gmain.c:3148 #12 0x00007fb9a2976f67 in g_main_context_dispatch (context=context@entry=0x560b541f70d0) at ../../../../glib/gmain.c:3813 #13 0x00007fb9a29771a0 in g_main_context_iterate (context=0x560b541f70d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../../glib/gmain.c:3886 #14 0x00007fb9a29774b2 in g_main_loop_run (loop=0x560b544af720) at ../../../../glib/gmain.c:4082 #15 0x00007fb9a0eac93c in meta_run () at /usr/lib/x86_64-linux-gnu/libmutter-1.so.0 #16 0x0000560b52aed287 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:462 (gdb) up #1 0x00007fb997225bfd in post_device_event (device=device@entry=0x560b5e87a800, time=time@entry=698204515849, type=type@entry=LIBINPUT_EVENT_KEYBOARD_KEY, event=0x560b5cf64720) at libinput.c:2113 2113 listener->notify_func(time, event, listener->notify_func_data); (gdb) p &device->event_listeners $12 = (struct list *) 0x560b5e87a820 (gdb) p device->event_listeners $13 = {prev = 0x560b54459218, next = 0x560b543f1140} (gdb) p device->event_listeners->next $14 = (struct list *) 0x560b543f1140 (gdb) p *device->event_listeners->next $15 = {prev = 0x560b543f1140, next = 0x560b543f1140} (gdb) p device->event_listeners->prev $16 = (struct list *) 0x560b54459218 (gdb) p *device->event_listeners->prev $17 = {prev = 0x560b543f1140, next = 0x560b5e87a820} Workaround : (gdb) p device->event_listeners->next->next = 0x560b54459218 $11 = (struct list *) 0x560b54459218 (gdb) c