Summary: | [IVB][BAT] gem_sync/basic-store-all FAIL on CI | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Jani Saarinen <jani.saarinen> | ||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||
Version: | DRI git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
i915 platform: | IVB | i915 features: | GEM/Other | ||||
Attachments: |
|
Description
Jani Saarinen
2017-02-14 19:54:23 UTC
The test is doing its job and hitting the error it is hunting for. Pretty much the only way to prevent it is by increasing the delay between receiving the interrupt and checking the seqno. At the moment that delay is defined by an uncached mmio - but maybe we should try setting SyncFlush and polling until clear? Do we have an equivalent machine in farm2? Could you set it running gem_sync (the full set) and see how reproducible the missed interrupt is? Not exactly but almost (4770s) Created attachment 129620 [details] [review] Use sync flush polling for the irq seqno barrier Back to something super heavyweight. Wrong HW. there is same IVB 3770 there. There's IVB-3770 on farm2, Dell Optiplex (vs farm1 HP Pro). It doesn't seem to have any gem_sync failures on last 100 runs. Chris, can you try this on trybot, as like this hard to know if this helps? The failure is rare that I don't expect trybot to give a clear indication of whether it is sufficient to prevent the missed interrupt. As it happens SyncFlush does not work well with hanging batches, so I'm trying a different approach. commit 8998567b51141f79309d1267640c919dfd23d3a4 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 17 15:13:02 2017 +0000 drm/i915: Defer declaration of missed-interrupt until the waiter is asleep and earlier should indirectly help, and I expect to reduce the frequency of false positives. Marking as closed until we see it again. (In reply to Chris Wilson from comment #9) > commit 8998567b51141f79309d1267640c919dfd23d3a4 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Feb 17 15:13:02 2017 +0000 > > drm/i915: Defer declaration of missed-interrupt until the waiter is > asleep > > and earlier should indirectly help, and I expect to reduce the frequency of > false positives. Marking as closed until we see it again. OK, will archive the temporary blacklist. However, for very intermittent failures like this, it would be nice if we could land a patch that would improve the debug-ability of the issue, so that next time we see it, the CI system would give us meaningful information about the bug and help us improve our code. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.