Bug 107936 - [CI][DRMTIP] igt@gem_exec_parse@basic-allocation - timeout
Summary: [CI][DRMTIP] igt@gem_exec_parse@basic-allocation - timeout
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 105555 110255 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-09-14 12:50 UTC by Martin Peres
Modified: 2019-09-30 12:22 UTC (History)
2 users (show)

See Also:
i915 platform: BYT, HSW
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-09-14 12:50:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-byt-clapper/igt@gem_exec_parse@basic-allocation.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-hsw-peppy/igt@gem_exec_parse@basic-allocation.html

Seems like the test is taking too long without outputting anything, which then gets killed by the new runner
Comment 1 Tvrtko Ursulin 2019-02-11 10:39:16 UTC
These links are 404 by now, but when I look at https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_exec_parse@basic-allocation.html I see test consistently passing on HSW in around 45-70 seconds (I haven't checked all runs for execution time).
Comment 2 Francesco Balestrieri 2019-02-12 09:54:39 UTC
According to CI this is happening frequently. Latest log on HSW:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_214/fi-hsw-peppy/igt@gem_exec_parse@basic-allocation.html
Comment 3 Tvrtko Ursulin 2019-02-12 10:08:05 UTC
Why is the test history page (the link I put in #1) all green then? I go to CI -> drm-tip -> shards all -> find the test in list and click on it to get it's history. What am I doing wrong there?

Also, Chris seems to have noticed mailing list activity on this bug and has sent a proposed time cap for the test. So issue might get resolved quickly.
Comment 4 Francesco Balestrieri 2019-02-12 18:51:00 UTC
Mmm, good question. The machine fi-hsw-peppy doesn't appear in that page, not sure why. Martin?
Comment 5 Lakshmi 2019-02-13 09:24:48 UTC
(In reply to Francesco Balestrieri from comment #4)
> Mmm, good question. The machine fi-hsw-peppy doesn't appear in that page,
> not sure why. Martin?

Only shard Machines are shown here https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_exec_parse@basic-allocation.html
Comment 6 Chris Wilson 2019-02-15 15:37:34 UTC
*** Bug 105555 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2019-02-27 13:23:16 UTC
Hmm, I have

commit b7120a04360ddbd8166657187599e2a0a3b1f12e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Sep 15 17:03:22 2018 +0100

    drm/i915: Recover batch pool caches from shrinker
    
    Discard all of our batch pools under mempressure to make their pages
    available to the shrinker. We will quickly reacquire them when necessary
    for more GPU relocations or for the command parser.
    
    v2: Init the lists for mock_engine
    v3: Return a strong ref from i915_gem_batch_pool_get() and convert it
    into an active reference to protect ourselves against all allocations
    while the object is in play.
    v4: Couple shadow batch to active request early.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107936
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>

with an interesting tagline. Memory says that it was blowing up... But it will shrink the batch pool caches and so speed up basic-allocation.
Comment 8 Chris Wilson 2019-02-27 14:37:05 UTC
But that is presupposing that it's mempressure; limiting my ivb to under 2G (like hsw-peppy) only doubles the runtime (kswapd is barely being invoked). The test predicts it needs 1G, and it doesn't seem far off.

Fwiw, the patch does as it claims though, after patching under the same conditions the runtime is the same as when it has sufficient memory.
Comment 9 Chris Wilson 2019-03-06 12:18:10 UTC
Test clamped to

commit a382aeec489a187591677644cc3b98e34322b474 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Feb 11 14:29:29 2019 +0000

    i915/gem_exec_parse: Switch to a fixed timeout for basic-allocations
    
    basic-allocations was written to demonstrate a flaw in our continual
    reallocation of cmdparser shadow bo, largely fixed by keeping a small
    cache of bo of different lengths (to speed up the search for the correct
    sized bo). We only care enough to exercise the slowdown by submitting
    lots of execbufs, and can see the effect of bo caching on the rate, so
    replace the fixed number of iterations with a timeout and count how many
    batches we could submit instead.
    
    Similarly, we now do not need to wait for all of our queue to complete
    as we can tell the kernel to drop the queue instead.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=107936
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


Must remember to finish off the shrinker patches.
Comment 10 Martin Peres 2019-03-06 18:24:30 UTC
(In reply to Chris Wilson from comment #9)
> Test clamped to
> 
> commit a382aeec489a187591677644cc3b98e34322b474 (HEAD, upstream/master)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Feb 11 14:29:29 2019 +0000
> 
>     i915/gem_exec_parse: Switch to a fixed timeout for basic-allocations
>     
>     basic-allocations was written to demonstrate a flaw in our continual
>     reallocation of cmdparser shadow bo, largely fixed by keeping a small
>     cache of bo of different lengths (to speed up the search for the correct
>     sized bo). We only care enough to exercise the slowdown by submitting
>     lots of execbufs, and can see the effect of bo caching on the rate, so
>     replace the fixed number of iterations with a timeout and count how many
>     batches we could submit instead.
>     
>     Similarly, we now do not need to wait for all of our queue to complete
>     as we can tell the kernel to drop the queue instead.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=107936
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> 
> Must remember to finish off the shrinker patches.

Still not fixing all the failure, as they pretty much fail every single drmtip run on the chromebooks:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_237/fi-hsw-peppy/igt@gem_exec_parse@basic-allocation.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_237/fi-byt-clapper/igt@gem_exec_parse@basic-allocation.html
Comment 11 CI Bug Log 2019-03-08 15:13:51 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW: igt@gem_exec_parse@basic-allocation - timeout -}
{+ BYT HSW BSW: igt@gem_exec_parse@basic-allocation / igt@gem_exec_big - timeout +}

 No new failures caught with the new filter
Comment 12 Chris Wilson 2019-03-26 10:24:58 UTC
commit abffc52b0ec74c8498f2197760199a54e29c8a6a (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Feb 12 18:40:37 2019 +0000

    i915/gem_exec_big: Add a single shot test
    
    CI complains that the exhaustive test of trying every size up to the
    limit is too slow, so add a simple test that tries to submit one
    extreme batch buffer and check all the relocations land.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105555
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 13 Francesco Balestrieri 2019-04-02 11:34:56 UTC
*** Bug 110255 has been marked as a duplicate of this bug. ***
Comment 14 CI Bug Log 2019-09-30 12:20:23 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.
Comment 15 Lakshmi 2019-09-30 12:22:28 UTC
Closing and archiving the issue, reproduction rate used to be 100% till Ddrmtip 247. But no new occurrences later.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.