|Summary:||Improve test timeout behaviour|
|Product:||dbus||Reporter:||Simon McVittie <smcv>|
|Component:||core||Assignee:||Simon McVittie <smcv>|
|Status:||RESOLVED FIXED||QA Contact:||D-Bus Maintainers <dbus>|
|i915 platform:||i915 features:|
test-utils-glib: Try to emit TAP diagnostics before aborting
tests: In slower tests, make the timeout per-test-case
Description Simon McVittie 2017-09-27 12:46:48 UTC
dbus intermittently or consistently fails build-time tests on a couple of the non-mainstream architectures in Debian. I suspect this might be because some tests spam messages or refcount operations to such an extent that on these architectures, the arbitrary 60 second timeout is hit. We can avoid this, while still making sure a stuck test can only make us wait 1 minute for a result, by applying the arbitrary timeout to individual test-cases within the problematic tests. We should also improve the diagnostics.
Comment 1 Simon McVittie 2017-09-27 12:47:31 UTC
Created attachment 134501 [details] [review] test-utils-glib: Try to emit TAP diagnostics before aborting We can't use normal I/O in a signal handler, so resort to write(). "Bail out!" is a special syntactic token in TAP. If I artifically force the tests to time out by reducing timeouts and increasing the number of operations, I get results like this: ERROR: test-sd-activation - Bail out! Test timed out (GLib main loop timeout callback reached) ERROR: test-refs - Bail out! Test timed out (SIGALRM received) which is a lot easier to understand than "Not enough tests run" or "nonzero exit status". The differing output is because test-sd-activation iterates the main loop, whereas test-refs just blocks (it is joining a series of worker threads, each of which is spamming refcount operations).
Comment 2 Simon McVittie 2017-09-27 12:48:10 UTC
Created attachment 134502 [details] [review] tests: In slower tests, make the timeout per-test-case Some test-cases in the dbus-daemon and relay tests spam the bus with thousands of messages, which can take 25 seconds on slower CPUs like MIPS. Similarly, the refs test spams millions of refcount operations, which it appears might take more than a minute on PA-RISC (HPPA). To get an idea of how close we are to having a problem on other architectures, log a message and start a timer when we reset the timeout in setup(), and log the elapsed time when we reach teardown().
Comment 3 Philip Withnall 2017-09-27 13:38:11 UTC
Comment on attachment 134501 [details] [review] test-utils-glib: Try to emit TAP diagnostics before aborting Review of attachment 134501 [details] [review]: ----------------------------------------------------------------- ++ with a nitpick ::: test/test-utils-glib.c @@ +474,5 @@ > + /* We might be halfway through writing out something else, so force this > + * onto its own line */ > + const char message  = "\nBail out! Test timed out (SIGALRM received)\n"; > + > + if (write (1, message, sizeof (message) - 1) < (ssize_t) sizeof (message) - 1) Nitpick: I’d slightly prefer STDOUT_FILENO instead of `1`, although you’d have to do the #ifndef STDOUT_FILENO\n#define STDOUT_FILENO 1\n#endif dance for Windows — so you might disagree.
Comment 4 Philip Withnall 2017-09-27 13:40:48 UTC
Comment on attachment 134502 [details] [review] tests: In slower tests, make the timeout per-test-case Review of attachment 134502 [details] [review]: ----------------------------------------------------------------- Looks legit.
Comment 5 Simon McVittie 2017-09-27 13:44:21 UTC
(In reply to Philip Withnall from comment #3) > Nitpick: I’d slightly prefer STDOUT_FILENO instead of `1`, although you’d > have to do the #ifndef STDOUT_FILENO\n#define STDOUT_FILENO 1\n#endif dance > for Windows — so you might disagree. Actually this is #ifdef G_OS_UNIX already (it's an async signal handler), so maybe we can rely on the POSIX-specified STDOUT_FILENO, at least until someone tells us that their version of HP/UX or Minix or something doesn't define it.
Comment 6 Simon McVittie 2017-09-27 13:48:21 UTC
Changed from 1 to STDOUT_FILENO, and pushed to master for 1.11.20. I think I'm going to leave 1.10 alone for now, since I want to release a 1.12 new-stable branch soon anyway.