Bug 103009 - Improve test timeout behaviour
Summary: Improve test timeout behaviour
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.10
Hardware: Other All
: medium normal
Assignee: Simon McVittie
QA Contact: D-Bus Maintainers
URL:
Whiteboard: review+
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-09-27 12:46 UTC by Simon McVittie
Modified: 2017-09-27 13:48 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
test-utils-glib: Try to emit TAP diagnostics before aborting (2.22 KB, patch)
2017-09-27 12:47 UTC, Simon McVittie
Details | Splinter Review
tests: In slower tests, make the timeout per-test-case (4.08 KB, patch)
2017-09-27 12:48 UTC, Simon McVittie
Details | Splinter Review

Description Simon McVittie 2017-09-27 12:46:48 UTC
dbus intermittently or consistently fails build-time tests on a couple of the non-mainstream architectures in Debian. I suspect this might be because some tests spam messages or refcount operations to such an extent that on these architectures, the arbitrary 60 second timeout is hit.

We can avoid this, while still making sure a stuck test can only make us wait 1 minute for a result, by applying the arbitrary timeout to individual test-cases within the problematic tests.

We should also improve the diagnostics.
Comment 1 Simon McVittie 2017-09-27 12:47:31 UTC
Created attachment 134501 [details] [review]
test-utils-glib: Try to emit TAP diagnostics before  aborting

We can't use normal I/O in a signal handler, so resort to write().

"Bail out!" is a special syntactic token in TAP. If I artifically force
the tests to time out by reducing timeouts and increasing the number of
operations, I get results like this:

ERROR: test-sd-activation - Bail out! Test timed out (GLib main loop timeout callback reached)
ERROR: test-refs - Bail out! Test timed out (SIGALRM received)

which is a lot easier to understand than "Not enough tests run" or
"nonzero exit status". The differing output is because test-sd-activation
iterates the main loop, whereas test-refs just blocks (it is joining a
series of worker threads, each of which is spamming refcount operations).
Comment 2 Simon McVittie 2017-09-27 12:48:10 UTC
Created attachment 134502 [details] [review]
tests: In slower tests, make the timeout per-test-case

Some test-cases in the dbus-daemon and relay tests spam the bus with
thousands of messages, which can take 25 seconds on slower CPUs like
MIPS. Similarly, the refs test spams millions of refcount operations,
which it appears might take more than a minute on PA-RISC (HPPA).
                                                                       
To get an idea of how close we are to having a problem on other
architectures, log a message and start a timer when we reset the
timeout in setup(), and log the elapsed time when we reach teardown().
Comment 3 Philip Withnall 2017-09-27 13:38:11 UTC
Comment on attachment 134501 [details] [review]
test-utils-glib: Try to emit TAP diagnostics before  aborting

Review of attachment 134501 [details] [review]:
-----------------------------------------------------------------

++ with a nitpick

::: test/test-utils-glib.c
@@ +474,5 @@
> +  /* We might be halfway through writing out something else, so force this
> +   * onto its own line */
> +  const char message [] = "\nBail out! Test timed out (SIGALRM received)\n";
> +
> +  if (write (1, message, sizeof (message) - 1) < (ssize_t) sizeof (message) - 1)

Nitpick: I’d slightly prefer STDOUT_FILENO instead of `1`, although you’d have to do the #ifndef STDOUT_FILENO\n#define STDOUT_FILENO 1\n#endif dance for Windows — so you might disagree.
Comment 4 Philip Withnall 2017-09-27 13:40:48 UTC
Comment on attachment 134502 [details] [review]
tests: In slower tests, make the timeout per-test-case

Review of attachment 134502 [details] [review]:
-----------------------------------------------------------------

Looks legit.
Comment 5 Simon McVittie 2017-09-27 13:44:21 UTC
(In reply to Philip Withnall from comment #3)
> Nitpick: I’d slightly prefer STDOUT_FILENO instead of `1`, although you’d
> have to do the #ifndef STDOUT_FILENO\n#define STDOUT_FILENO 1\n#endif dance
> for Windows — so you might disagree.

Actually this is #ifdef G_OS_UNIX already (it's an async signal handler), so maybe we can rely on the POSIX-specified STDOUT_FILENO, at least until someone tells us that their version of HP/UX or Minix or something doesn't define it.
Comment 6 Simon McVittie 2017-09-27 13:48:21 UTC
Changed from 1 to STDOUT_FILENO, and pushed to master for 1.11.20.

I think I'm going to leave 1.10 alone for now, since I want to release a 1.12 new-stable branch soon anyway.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.