Bug 107307 - [CI][BAT] Memory performance issue
Summary: [CI][BAT] Memory performance issue
Status: RESOLVED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-20 11:15 UTC by Martin Peres
Modified: 2019-07-27 13:00 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT, BYT, CFL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-07-20 11:15:26 UTC
On two separate machines, we got a performance issue that we never caught before. This is likely the result of some background activity while running the tests.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4409_1/fi-cfl-8109u/igt@gem_mmap_gtt@basic-wc.html

(gem_mmap_gtt:2981) CRITICAL: Test assertion failure function test_wc, file ../tests/gem_mmap_gtt.c:282:
(gem_mmap_gtt:2981) CRITICAL: Failed assertion: gtt_writes > 2*gtt_reads
(gem_mmap_gtt:2981) CRITICAL: Write-Combined writes are expected to be much faster than reads: read=171.86MiB/s, write=337.03MiB/s
Subtest basic-wc failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4370_120/fi-bsw-n3050/igt@gem_mmap_gtt@basic-wc.html

(gem_mmap_gtt:3044) CRITICAL: Test assertion failure function test_wc, file ../tests/gem_mmap_gtt.c:286:
(gem_mmap_gtt:3044) CRITICAL: Failed assertion: gtt_writes > cpu_writes/2
(gem_mmap_gtt:3044) CRITICAL: Write-Combined writes are expected to be roughly equivalent to WB writes: WC (gtt)=665.21MiB/s, WB (cpu)=1352.17MiB/s
Subtest basic-wc failed.
Comment 1 Tomi Sarvela 2018-08-13 07:37:15 UTC
The DUTs have had cron enabled for maintenance duties, and this has probably been the extra background activity.

The crons are now disabled, and periodical anacron is used instead.
Comment 2 Jani Saarinen 2018-08-17 07:47:42 UTC
Based on last comment, resolved in CI not in i915.
Please re-open if still issue.
Comment 3 Lakshmi 2018-08-23 11:35:55 UTC
This bug is noticed a month ago. Closing this bug.
Comment 4 Lakshmi 2018-08-28 06:40:05 UTC
This issue occurred only twice with a frequency of 40 rounds of CI_DRM execution. To ensure that it is really fixed we don't close this defect. For now this was not seen since ~300 rounds. We keep this defect open for few more rounds and then we close this defect.

This doesn't mean it needs a fix.
Comment 5 Lakshmi 2018-09-25 07:54:57 UTC
This issue was seen 499 (of CI DRM) rounds ago. 
Closing this issue as Resolved/fixed. Re-open if this issue persists.
Comment 6 Martin Peres 2018-11-19 16:43:39 UTC
And funnily-enough, it came back in the next repeat run on same BSW and fi-byt-n2820:

Starting subtest: basic-wc
(gem_mmap_gtt:2536) CRITICAL: Test assertion failure function test_wc, file ../tests/i915/gem_mmap_gtt.c:286:
(gem_mmap_gtt:2536) CRITICAL: Failed assertion: gtt_writes > cpu_writes/2
(gem_mmap_gtt:2536) CRITICAL: Write-Combined writes are expected to be roughly equivalent to WB writes: WC (gtt)=966.15MiB/s, WB (cpu)=2058.96MiB/s
Subtest basic-wc failed.
Comment 7 Francesco Balestrieri 2019-03-19 07:40:46 UTC
Platform = All is a bit of an overstatement, so far this happened on BSW, BYT and CFL. Updating accordingly.
Comment 8 CI Bug Log 2019-04-23 13:26:36 UTC
A CI Bug Log filter associated to this bug has been updated:

{- All machines: igt@gem_mmap_gtt@basic-wc - fail - Failed assertion: gtt_writes > 2*gtt_reads -}
{+ BWR CFL: igt@gem_mmap_gtt@basic-wc - fail - Failed assertion: gtt_writes > 2*gtt_reads +}

 No new failures caught with the new filter
Comment 9 CI Bug Log 2019-04-23 13:27:36 UTC
A CI Bug Log filter associated to this bug has been updated:

{- All machines: igt@gem_mmap_gtt@basic-wc - fail - Failed assertion: gtt_writes > cpu_writes/2 -}
{+ BYT BSW ICL: igt@gem_mmap_gtt@basic-wc - fail - Failed assertion: gtt_writes > cpu_writes/2 +}

 No new failures caught with the new filter
Comment 10 Francesco Balestrieri 2019-07-23 08:19:32 UTC
This had a sudden burst on BYT just today, see e.g.:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6535/fi-byt-clapper/igt@gem_mmap_gtt@basic-wc.html

Did something change that made it suddenly worse?
Comment 11 Chris Wilson 2019-07-23 09:49:08 UTC
The secret is that test was always meant to fail on Baytrail. It was an oddity that it didn't fail for CI; but it looks like the kernel got quicker for WB (in particular) and now we are able to see the snafu consistently. The machines identified in the original report are outliers where it is more likely that the scheduler threw off the timings.
Comment 12 Chris Wilson 2019-07-27 13:00:09 UTC
To cancel the hijacking, drop the bug as the test is removed from BAT. If it occurs again on sparse runs, be sure to separate out byt for its known HW deficiencies.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.