Bug 89222 - intermittent test failure in test-monitor
Summary: intermittent test failure in test-monitor
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: git master
Hardware: Other All
: medium normal
Assignee: D-Bus Maintainers
QA Contact: D-Bus Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-19 10:39 UTC by Simon McVittie
Modified: 2015-02-24 17:00 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
test_init: convert SIGALRM into SIGABRT so we leave a core dump on timeout (1.08 KB, patch)
2015-02-24 11:17 UTC, Simon McVittie
Details | Splinter Review
monitor test: don't block in main context if we already have messages (1.84 KB, patch)
2015-02-24 15:27 UTC, Simon McVittie
Details | Splinter Review

Description Simon McVittie 2015-02-19 10:39:01 UTC
When doing a parallel "make check" I get occasional failures in test-monitor (< 10%). I suspect there's a race condition.
Comment 1 Simon McVittie 2015-02-20 15:51:32 UTC
The log is not very informative, but indicates that the failure was in the "become monitor" test:

/monitor/become: Connection :1.0 (uid=1000 pid=12890 comm="/srv/tmp/smcv/build/dbus/clang/test/.libs/lt-test-") became a monitor.
Monitoring connection :1.0 closed.

Unfortunately, SIGALRM doesn't provoke a core dump.
Comment 2 Simon McVittie 2015-02-24 11:17:23 UTC
Created attachment 113788 [details] [review]
test_init: convert SIGALRM into SIGABRT so we leave a core  dump on timeout

---

This does not do anything to fix the bug, but I hope it will help to diagnose it.
Comment 3 Philip Withnall 2015-02-24 14:19:34 UTC
Comment on attachment 113788 [details] [review]
test_init: convert SIGALRM into SIGABRT so we leave a core  dump on timeout

Review of attachment 113788 [details] [review]:
-----------------------------------------------------------------

++
Comment 4 Simon McVittie 2015-02-24 14:46:54 UTC
Comment on attachment 113788 [details] [review]
test_init: convert SIGALRM into SIGABRT so we leave a core  dump on timeout

Applied, thanks. Hopefully I'll see the race condition again soon...
Comment 5 Simon McVittie 2015-02-24 15:15:22 UTC
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f75076c7107 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f75076c7107 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f75076c84e8 in __GI_abort () at abort.c:89
#2  0x00000000004108c9 in wrap_abort (signal=<optimized out>)
    at /home/smcv/src/dbus/test/test-utils-glib.c:400
#3  <signal handler called>
#4  0x00007f7507778603 in __epoll_wait_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
#5  0x0000000000411ffa in socket_set_epoll_poll (set=0x103c750, 
    revents=0x7fff7240a470, max_events=64, timeout_ms=-1)
    at /home/smcv/src/dbus/dbus/dbus-socket-set-epoll.c:300
#6  0x00000000004119b2 in _dbus_socket_set_poll (timeout_ms=<optimized out>, 
    max_events=64, revents=0x7fff7240a470, self=<optimized out>)
    at /home/smcv/src/dbus/dbus/dbus-socket-set.h:110
#7  _dbus_loop_iterate (loop=0x103c700, block=1)
    at /home/smcv/src/dbus/dbus/dbus-mainloop.c:660
#8  0x000000000040b221 in test_become_monitor (f=0x4, context=0x7fff7240a0c0)
    at /home/smcv/src/dbus/test/monitor.c:599
#9  0x00007f7507aaa433 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007f7507aaa602 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007f7507aaa96b in g_test_run_suite ()
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007f7507aaa9a1 in g_test_run ()
---Type <return> to continue, or q <return> to quit---
   from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x0000000000403dc8 in main (argc=1, argv=0x7fff7240aa68)
    at /home/smcv/src/dbus/test/monitor.c:1495
Comment 6 Simon McVittie 2015-02-24 15:25:32 UTC
(gdb) p *(Fixture *) fixture
$2 = {config = 0x6188b0 <side_effects_config>, ctx = 0x103c700, e = {
    name = 0x0, message = 0x0, dummy1 = 1, dummy2 = 0, dummy3 = 0, dummy4 = 0, 
    dummy5 = 0, padding1 = 0x0}, ge = 0x0, 
  address = 0x103ca00 "unix:abstract=/tmp/dbus-WOYDXeSKk0,guid=b54458064351b942d55afa5454ec92b8", daemon_pid = 21855, monitor = 0x103d770, sender = 0x103fcc0, 
  recipient = 0x1041bf0, monitored = {head = 0x103b440, tail = 0x103b4a0, 
    length = 4}, monitor_name = 0x103cfd0 ":1.0", 
  sender_name = 0x103d410 ":1.1", recipient_name = 0x103f960 ":1.2", 
  systemd = 0x0, systemd_name = 0x0, systemd_message = 0x0, activated = 0x0, 
  activated_name = 0x0, activated_message = 0x0}

The fact that f->monitored is non-empty is the key thing here I think.
Comment 7 Simon McVittie 2015-02-24 15:27:30 UTC
Created attachment 113796 [details] [review]
monitor test: don't block in main context if we already have  messages

Functions like become_monitor() sometimes iterate the main context,
which could leave us with unprocessed messages in f->monitored.
We need to drain that queue of unprocessed messages (setting flags
accordingly, which might meet the loop's exit condition or cause
a break) before we are willing to block in the main context again.
Comment 8 Philip Withnall 2015-02-24 16:58:28 UTC
Comment on attachment 113796 [details] [review]
monitor test: don't block in main context if we already have  messages

Review of attachment 113796 [details] [review]:
-----------------------------------------------------------------

++
Comment 9 Simon McVittie 2015-02-24 17:00:56 UTC
Fixed in git for 1.9.14 (I hope - I got a failure within 10 -j4 runs without the patch, and 20 consecutive successful runs with it)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.