Bug 110882 - [CI][SHARDS] igt@gem_mmap_gtt@forked-* - timeout - extremely slow on ICL
Summary: [CI][SHARDS] igt@gem_mmap_gtt@forked-* - timeout - extremely slow on ICL
Status: RESOLVED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Mika Kuoppala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-10 12:59 UTC by Martin Peres
Modified: 2019-09-18 07:19 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2019-06-10 12:59:00 UTC
On ICL, the following subtests of igt@gem_mmap_gtt are taking almost 80 minutes to execute, which is equivalent to ~25% of the normal CI time. Other platforms take 1-20 seconds / subtest.

Test                                           Machine       Min (s)     Avg (s)     Max (s)
igt@gem_mmap_gtt@forked-big-copy-odd           shard-iclb     803.247     955.443    1084.676
igt@gem_mmap_gtt@forked-big-copy-xy            shard-iclb    1029.645	 1033.683    1037.721
igt@gem_mmap_gtt@forked-big-copy               shard-iclb    1037.463    1037.463    1037.463
igt@gem_mmap_gtt@forked-medium-copy-xy         shard-iclb     651.568     665.734     681.662
igt@gem_mmap_gtt@forked-medium-copy            shard-iclb     544.721     557.768     564.785
igt@gem_mmap_gtt@forked-basic-small-copy-xy    shard-iclb     319.319     344.974     373.88

You can see here that this is very ICL-specific: https://intel-gfx-ci.01.org/tree/drm-tip/shards-all.html?testfilter=igt@gem_mmap_gtt
Comment 1 Chris Wilson 2019-06-10 13:22:23 UTC
On interesting comparison, small-copy

       single      forked
glk:    2.15s       2.89s
icl:    2.50s     281.08s

Quite clearly it simply explodes with concurrent use.
Comment 3 Chris Wilson 2019-06-10 13:41:29 UTC
For the sake of consistency, next iclb1 result was

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6092/shard-iclb1/igt@gem_mmap_gtt@forked-big-copy.html

-> +1032s (timed out)
Comment 4 Tomi Sarvela 2019-06-11 06:48:54 UTC
On May 14 (when CI_DRM_6078 wasn't run on shard-iclb) the BIOSes were upgraded for the shard. So there is large probability that this change has caused the issue.

New BIOS was WW18; the shards are now running newer one. Older than WW18 can't be tested on current CPUs, so to reproduce the issue we need to find an older CPU, or full host that can be downgraded BIOS-wise.
Comment 5 Chris Wilson 2019-06-11 09:44:26 UTC
Temporary band-aid:

commit 6cb3d4a9457cdfb993ebb2a086a4844b85c49ee2 (upstream/master, origin/master, origin/HEAD)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jun 10 13:52:02 2019 +0100

    i915/gem_mmap_gtt: Disregard forked subtests on ICL for reasons
    
    Nothing to see here, please move along.
    
    The short story seems to be that a BIOS update made concurrent GTT
    access a few orders of magnitude slower, severely hampering CI. Where
    the fault actually lies is unknown, and how to circumvent it, unknown.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=110882
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Martin Peres <martin.peres@linux.intel.com>
    Acked-by: Daniel Vetter <daniel@ffwll.ch>
Comment 6 Martin Peres 2019-06-11 11:37:11 UTC
(In reply to Chris Wilson from comment #5)
> Temporary band-aid:
> 
> commit 6cb3d4a9457cdfb993ebb2a086a4844b85c49ee2 (upstream/master,
> origin/master, origin/HEAD)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Jun 10 13:52:02 2019 +0100
> 
>     i915/gem_mmap_gtt: Disregard forked subtests on ICL for reasons
>     
>     Nothing to see here, please move along.
>     
>     The short story seems to be that a BIOS update made concurrent GTT
>     access a few orders of magnitude slower, severely hampering CI. Where
>     the fault actually lies is unknown, and how to circumvent it, unknown.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=110882
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Cc: Martin Peres <martin.peres@linux.intel.com>
>     Acked-by: Daniel Vetter <daniel@ffwll.ch>

Thanks, it got rid of most of the issues. We still have the following tests being executed though:

Test                                  Machine     Min (s)  Avg (s)  Max (s)
igt@gem_mmap_gtt@forked-big-copy      shard-iclb  803.781   937.90  1063.49
igt@gem_mmap_gtt@forked-big-copy-odd  shard-iclb  974.263  1002.33  1044.71
Comment 7 Martin Peres 2019-06-11 12:17:05 UTC
(In reply to Martin Peres from comment #6)
> (In reply to Chris Wilson from comment #5)
> > Temporary band-aid:
> > 
> > commit 6cb3d4a9457cdfb993ebb2a086a4844b85c49ee2 (upstream/master,
> > origin/master, origin/HEAD)
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Jun 10 13:52:02 2019 +0100
> > 
> >     i915/gem_mmap_gtt: Disregard forked subtests on ICL for reasons
> >     
> >     Nothing to see here, please move along.
> >     
> >     The short story seems to be that a BIOS update made concurrent GTT
> >     access a few orders of magnitude slower, severely hampering CI. Where
> >     the fault actually lies is unknown, and how to circumvent it, unknown.
> >     
> >     References: https://bugs.freedesktop.org/show_bug.cgi?id=110882
> >     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >     Cc: Martin Peres <martin.peres@linux.intel.com>
> >     Acked-by: Daniel Vetter <daniel@ffwll.ch>
> 
> Thanks, it got rid of most of the issues. We still have the following tests
> being executed though:
> 
> Test                                  Machine     Min (s)  Avg (s)  Max (s)
> igt@gem_mmap_gtt@forked-big-copy      shard-iclb  803.781   937.90  1063.49
> igt@gem_mmap_gtt@forked-big-copy-odd  shard-iclb  974.263  1002.33  1044.71

My bad, we have not received the new results yet! Sorry for the noise!
Comment 8 CI Bug Log 2019-06-12 12:09:57 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_mmap_gtt@forked-* - fail - Failed assertion: !(intel_gen(devid) &gt;= 11 &amp;&amp; ncpus &gt; 1)
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb1/igt@gem_mmap_gtt@forked-big-copy-xy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb1/igt@gem_mmap_gtt@forked-big-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb4/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb4/igt@gem_mmap_gtt@forked-medium-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb4/igt@gem_mmap_gtt@forked-medium-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb4/igt@gem_mmap_gtt@forked-basic-small-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb4/igt@gem_mmap_gtt@forked-big-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb5/igt@gem_mmap_gtt@forked-medium-copy-xy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3136/shard-iclb8/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb3/igt@gem_mmap_gtt@forked-big-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb5/igt@gem_mmap_gtt@forked-medium-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb5/igt@gem_mmap_gtt@forked-basic-small-copy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb5/igt@gem_mmap_gtt@forked-medium-copy-xy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb7/igt@gem_mmap_gtt@forked-big-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb7/igt@gem_mmap_gtt@forked-big-copy-xy.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb8/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb8/igt@gem_mmap_gtt@forked-medium-copy-odd.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5051/shard-iclb8/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
Comment 9 Francesco Balestrieri 2019-06-13 14:20:00 UTC
Are these failures about old runs? AFAIK the tests have been removed.
Comment 10 Francesco Balestrieri 2019-06-13 14:26:53 UTC
Actually no. We want to keep tracking this as an open issue, but we also don't want to clog CI with slow tests, so the choice was to make the tests fail deliberately on ICL.
Comment 11 Mika Kuoppala 2019-06-25 13:18:47 UTC
Investigation ongoing about the exponential slowdown on multicore access
Comment 12 CI Bug Log 2019-09-09 06:30:56 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.
Comment 13 CI Bug Log 2019-09-09 06:31:50 UTC
The CI Bug Log issue associated to this bug has been restored.

All the previous filters are now active.
Comment 14 Martin Peres 2019-09-09 06:35:40 UTC
Sorry for the noise! I archived the wrong bug.

Anyway, this skip is hit on TGL, can you check if it is reproducible there and adjust the condition if it is not?
Comment 15 CI Bug Log 2019-09-11 07:31:14 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@gem_mmap_gtt@forked-* - fail - Failed assertion: !(intel_gen(devid) &gt;= 11 &amp;&amp; ncpus &gt; 1) -}
{+ ICL TGL: igt@gem_mmap_gtt@forked-* - fail - Failed assertion: !(intel_gen(devid) &gt;= 11 &amp;&amp; ncpus &gt; 1) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_365/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_362/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_363/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-big-copy-odd.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-basic-small-copy-xy.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_364/fi-tgl-u/igt@gem_mmap_gtt@forked-medium-copy.html
Comment 16 Chris Wilson 2019-09-17 11:24:26 UTC
Early tgl results:

basic-small-copy: SUCCESS (1,671s)
forked-basic-small-copy: SUCCESS (37,568s)

Not great, but not as bad as icl (might just be difference in memdebug options?)

medium-copy: SUCCESS (3,307s)
forked-medium-copy: SUCCESS (76,614s)
forked-medium-copy-XY: SUCCESS (203,251s)
forked-medium-copy-odd: SUCCESS (204,265s)
Comment 17 Chris Wilson 2019-09-18 07:19:50 UTC
Moved to the attic:

commit 0e9510b83502af3e230870df2d66d4f68918d3a4 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 17 14:01:27 2019 +0100

    i915/gem_mmap_gtt: Replace forked-mmapped tests with a lighter variant
    
    Introduce a new 2-process fork test that is bound to a single cpu to
    exercise contention during pagefaults. This is a much lighter variant of
    the all-cpus test intended to be viable even on the legendary frozen
    lakes of molasses.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=110882
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Cc: Martin Peres <martin.peres@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.