Summary: | Unit tests are racy/buggy | ||
---|---|---|---|
Product: | Telepathy | Reporter: | Olli Salli <ollisal> |
Component: | tp-qt | Assignee: | Olli Salli <ollisal> |
Status: | RESOLVED FIXED | QA Contact: | Telepathy bugs list <telepathy-bugs> |
Severity: | major | ||
Priority: | high | CC: | andrunko |
Version: | git master | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://git.collabora.co.uk/?p=user/oggis/telepathy-qt4.git;a=shortlog;h=refs/heads/fix-racy-tests | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 25126 |
Description
Olli Salli
2010-08-20 06:40:35 UTC
The branch fixes races identified thus far, and also adds a script to repeat tests for a given number of times, saving the logs for the failed runs. This is useful for detecting intermittent test failures which stem from race conditions. Once this is merged, it probably should added to the release procedure to do tools/repeat-tests.sh check 100 and tools/repeat-tests.sh valgrind-check 10 (or more, but it takes a lot of time) to identify potential racy tests. The valgrind variant is useful because the tests run a lot slower under valgrind, so some types of race conditions become much more common due to the altered timings. The fixes bring about some graciously overlapping rules-of-thumb for implementing tests: - NEVER assume that only one event happens when you call processEvents() unless further events can't happen without additional requests (otherwise, if the test executes slower than the service, two service events may happen, which then get processed with one processEvents() call and any checks depending on those events after that call are then unpredictable) - NEVER implement a while (state != desired) {processEvents()} style loop if the state can go to desired and back to something else without making further requests (otherwise Murphy says it will go to desired during a processEvents() call, and then again to something undesired during that same call, and never go back to desired afterwards - so we'll just busy-loop infinitely...) - NEVER assume that you can reach checking that an event has happened (such as you asking somebody for their presence) before another simulated event overwriting it (such as them accepting) happens - even more generally, NEVER assume that you can reach starting to wait for a simulated network event to happen before it already happened (so always check for it to not have already happened before starting to wait) Fix merged to master. Unit tests run reliably now. This was verified with repeat-tests.sh "make check" 1000 and repeat-tests.sh "make check-valgrind" 100, both of which completed without a single failure, which is impressive considering the branch lowers delay between simulated network events from 1000ms to just 1ms. This has the consequence of speeding up the tests immensely. If you, however, notice that running unit tests now occasionally fails / hangs for you, please reopen the bug with the log from the failed run. This sort of fixing can never be 100% - if the races are sufficiently rare to not have happened in the finite amount of test runs I've done (which is several tens of thousands though), I haven't been able to fix them. For example one of the races fixed only happened three times in 10000 test runs. Will be in 0.3.8. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.