When doing a parallel "make check" I get occasional failures in test-monitor (< 10%). I suspect there's a race condition.
The log is not very informative, but indicates that the failure was in the "become monitor" test: /monitor/become: Connection :1.0 (uid=1000 pid=12890 comm="/srv/tmp/smcv/build/dbus/clang/test/.libs/lt-test-") became a monitor. Monitoring connection :1.0 closed. Unfortunately, SIGALRM doesn't provoke a core dump.
Created attachment 113788 [details] [review] test_init: convert SIGALRM into SIGABRT so we leave a core dump on timeout --- This does not do anything to fix the bug, but I hope it will help to diagnose it.
Comment on attachment 113788 [details] [review] test_init: convert SIGALRM into SIGABRT so we leave a core dump on timeout Review of attachment 113788 [details] [review]: ----------------------------------------------------------------- ++
Comment on attachment 113788 [details] [review] test_init: convert SIGALRM into SIGABRT so we leave a core dump on timeout Applied, thanks. Hopefully I'll see the race condition again soon...
Program terminated with signal SIGABRT, Aborted. #0 0x00007f75076c7107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007f75076c7107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f75076c84e8 in __GI_abort () at abort.c:89 #2 0x00000000004108c9 in wrap_abort (signal=<optimized out>) at /home/smcv/src/dbus/test/test-utils-glib.c:400 #3 <signal handler called> #4 0x00007f7507778603 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 #5 0x0000000000411ffa in socket_set_epoll_poll (set=0x103c750, revents=0x7fff7240a470, max_events=64, timeout_ms=-1) at /home/smcv/src/dbus/dbus/dbus-socket-set-epoll.c:300 #6 0x00000000004119b2 in _dbus_socket_set_poll (timeout_ms=<optimized out>, max_events=64, revents=0x7fff7240a470, self=<optimized out>) at /home/smcv/src/dbus/dbus/dbus-socket-set.h:110 #7 _dbus_loop_iterate (loop=0x103c700, block=1) at /home/smcv/src/dbus/dbus/dbus-mainloop.c:660 #8 0x000000000040b221 in test_become_monitor (f=0x4, context=0x7fff7240a0c0) at /home/smcv/src/dbus/test/monitor.c:599 #9 0x00007f7507aaa433 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #10 0x00007f7507aaa602 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #11 0x00007f7507aaa96b in g_test_run_suite () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #12 0x00007f7507aaa9a1 in g_test_run () ---Type <return> to continue, or q <return> to quit--- from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #13 0x0000000000403dc8 in main (argc=1, argv=0x7fff7240aa68) at /home/smcv/src/dbus/test/monitor.c:1495
(gdb) p *(Fixture *) fixture $2 = {config = 0x6188b0 <side_effects_config>, ctx = 0x103c700, e = { name = 0x0, message = 0x0, dummy1 = 1, dummy2 = 0, dummy3 = 0, dummy4 = 0, dummy5 = 0, padding1 = 0x0}, ge = 0x0, address = 0x103ca00 "unix:abstract=/tmp/dbus-WOYDXeSKk0,guid=b54458064351b942d55afa5454ec92b8", daemon_pid = 21855, monitor = 0x103d770, sender = 0x103fcc0, recipient = 0x1041bf0, monitored = {head = 0x103b440, tail = 0x103b4a0, length = 4}, monitor_name = 0x103cfd0 ":1.0", sender_name = 0x103d410 ":1.1", recipient_name = 0x103f960 ":1.2", systemd = 0x0, systemd_name = 0x0, systemd_message = 0x0, activated = 0x0, activated_name = 0x0, activated_message = 0x0} The fact that f->monitored is non-empty is the key thing here I think.
Created attachment 113796 [details] [review] monitor test: don't block in main context if we already have messages Functions like become_monitor() sometimes iterate the main context, which could leave us with unprocessed messages in f->monitored. We need to drain that queue of unprocessed messages (setting flags accordingly, which might meet the loop's exit condition or cause a break) before we are willing to block in the main context again.
Comment on attachment 113796 [details] [review] monitor test: don't block in main context if we already have messages Review of attachment 113796 [details] [review]: ----------------------------------------------------------------- ++
Fixed in git for 1.9.14 (I hope - I got a failure within 10 -j4 runs without the patch, and 20 consecutive successful runs with it)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.