Bug 101060 - [BAT][BWR] MI_STORE_DWORD bogons
Summary: [BAT][BWR] MI_STORE_DWORD bogons
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: lowest critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 101061 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-05-16 13:54 UTC by Martin Peres
Modified: 2018-07-02 09:41 UTC (History)
1 user (show)

See Also:
i915 platform: I965G
i915 features: GEM/Other


Attachments

Description Martin Peres 2017-05-16 13:54:10 UTC
The following tests have spurious failed assertions:
 - igt@prime_busy@basic-before-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_busy@basic-before-default.html
 - igt@prime_vgem@basic-busy-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_vgem@basic-busy-default.html
 - igt@prime_vgem@basic-fence-wait-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_vgem@basic-fence-wait-default.html
 - igt@prime_vgem@basic-sync-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_vgem@basic-sync-default.html
 - igt@prime_busy@basic-wait-after-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_busy@basic-wait-after-default.html
 - igt@prime_busy@basic-wait-before-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_busy@basic-wait-before-default.html
 - igt@prime_vgem@basic-wait-default: https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@prime_vgem@basic-wait-default.html

Here is an example of such assertion:

(prime_vgem:3699) CRITICAL: Test assertion failure function test_wait, file prime_vgem.c:405:
(prime_vgem:3699) CRITICAL: Failed assertion: ptr[i] == i
(prime_vgem:3699) CRITICAL: error: 0 != 0x1
Subtest basic-wait-default failed.
**** DEBUG ****
(prime_vgem:3699) ioctl-wrappers-DEBUG: Test requirement passed: gem_has_ring(fd, ring)
(prime_vgem:3699) DEBUG: Test requirement passed: !(gen == 6 && e->exec_id == I915_EXEC_BSD)
(prime_vgem:3699) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0'
(prime_vgem:3699) DEBUG: Test requirement passed: __gem_execbuf(i915, &execbuf) == 0
(prime_vgem:3699) CRITICAL: Test assertion failure function test_wait, file prime_vgem.c:405:
(prime_vgem:3699) CRITICAL: Failed assertion: ptr[i] == i
(prime_vgem:3699) CRITICAL: error: 0 != 0x1
****  END  ****
Comment 1 Chris Wilson 2017-05-16 13:58:19 UTC
The common pattern is MI_STORE_DWORD is failing. (It's used in a lot of tests to write to memory from the gpu, as it is common to all gen and engines). I have a crestline locally that I'm bringing up to see if I can reproduce -- however, there is a very real possibility that it is broadwater specific.
Comment 2 Chris Wilson 2017-05-16 14:20:58 UTC
*** Bug 101061 has been marked as a duplicate of this bug. ***
Comment 3 Jani Saarinen 2017-05-16 15:59:17 UTC
More tests added to same:
gt@gem_ringfill@basic-default-hang	
igt@gem_exec_suspend@basic-s4-devices	
igt@gem_exec_suspend@basic	
igt@gem_ringfill@basic-default-interruptible	
igt@gem_exec_suspend@basic-s3	
igt@gem_exec_flush@basic-batch-kernel-default-uc	
igt@gem_ringfill@basic-default-forked
Comment 4 Chris Wilson 2017-05-16 18:43:25 UTC
Seems my crestline/i965gm is ok.

vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz
stepping	: 10
microcode	: 0x92

Running through the full tests individually with no failures (yet).
Comment 5 Chris Wilson 2017-05-16 19:44:31 UTC
What's the hw config for brw-2160? Is it using more than 4G of memory?
Comment 6 Chris Wilson 2017-05-16 20:34:11 UTC
Pushed commit 3a264fc85fed4c56ffef4958e6dca883cac3e1f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue May 16 21:31:59 2017 +0100

    igt/gem_exec_store: Add welcome screen
    
    Print out a little bit of device information on startup to help diagnose
    errors.

can you please attach the output of gem_exec_store from the unhappy machine?
Comment 8 Jani Saarinen 2017-05-17 06:05:36 UTC
For fi-bwr-2160 memory:
cat /proc/meminfo:
Result

MemTotal:         980224 kB
MemFree:          259152 kB
MemAvailable:     543292 kB
Buffers:          115988 kB
Cached:           293840 kB
SwapCached:            0 kB
Active:           320528 kB
Inactive:         224984 kB
Active(anon):      68416 kB
Inactive(anon):   120088 kB
Active(file):     252112 kB
Inactive(file):   104896 kB
Unevictable:           0 kB
Comment 9 Chris Wilson 2017-05-17 07:43:03 UTC
(In reply to Jani Saarinen from comment #7)
> Similar issues seen on fi-gdg-551
> http://intel-gfx-ci.01.org/CI/CI_DRM_2614/fi-gdg-551/igt@gem_mmap_gtt@basic-
> small-copy-xy.html
> and 
> http://intel-gfx-ci.01.org/CI/CI_DRM_2614/fi-gdg-551/igt@gem_mmap_gtt@basic-
> small-copy.html
> 
> Chris, are these same or new bug needed?

They are a different class of bug. A different sort of scary.
Comment 10 Martin Peres 2017-05-17 07:47:06 UTC
(In reply to Chris Wilson from comment #9)
> (In reply to Jani Saarinen from comment #7)
> > Similar issues seen on fi-gdg-551
> > http://intel-gfx-ci.01.org/CI/CI_DRM_2614/fi-gdg-551/igt@gem_mmap_gtt@basic-
> > small-copy-xy.html
> > and 
> > http://intel-gfx-ci.01.org/CI/CI_DRM_2614/fi-gdg-551/igt@gem_mmap_gtt@basic-
> > small-copy.html
> > 
> > Chris, are these same or new bug needed?
> 
> They are a different class of bug. A different sort of scary.

Thanks, making a bug for it now.

What about the following two failure modes (only showing two instances)?
 - https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@gem_exec_flush@basic-batch-kernel-default-uc.html
 - https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/igt@gem_exec_suspend@basic.html
Comment 11 Chris Wilson 2017-05-17 07:49:57 UTC
(In reply to Martin Peres from comment #10) 
> What about the following two failure modes (only showing two instances)?
>  -
> https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/
> igt@gem_exec_flush@basic-batch-kernel-default-uc.html
>  -
> https://intel-gfx-ci.01.org/CI/CI_DRM_2619/fi-bwr-2160/
> igt@gem_exec_suspend@basic.html

For the moment, MI_STORE_DWORD is the chief suspect for those.
Comment 12 Martin Peres 2017-05-18 13:38:06 UTC
Also seen on igt@gem_exec_flush@basic-batch-kernel-default-wb: https://intel-gfx-ci.01.org/CI/CI_DRM_2626/fi-bwr-2160/igt@gem_exec_flush@basic-batch-kernel-default-wb.html
Comment 13 Martin Peres 2017-05-18 13:39:01 UTC
Also seen on igt@gem_exec_parallel@basic: https://intel-gfx-ci.01.org/CI/CI_DRM_2626/fi-bwr-2160/igt@gem_exec_parallel@basic.html
Comment 14 Chris Wilson 2017-05-18 14:53:19 UTC
I've told igt to skip any test relying on MI_STORE_DWORD on Broadwater (and reduced the reliance where trivial). It does mean that we have a large gap in coverage on brw, but until we can get gem_exec_store passing, the results in BAT aren't helpful.
Comment 15 Martin Peres 2017-05-18 15:21:10 UTC
(In reply to Chris Wilson from comment #14)
> I've told igt to skip any test relying on MI_STORE_DWORD on Broadwater (and
> reduced the reliance where trivial). It does mean that we have a large gap
> in coverage on brw, but until we can get gem_exec_store passing, the results
> in BAT aren't helpful.

Ah, that explains why we suddenly got all green or grey. I think this approach makes sense, especially since we have a low priority for this platform. At least, we'll catch additional regressions!

Let's keep this bug open then, but I will get rid of the blacklisting of tests on BWR. If this becomes stable enough, we'll add the bwr machines to pre-merge.
Comment 16 Elizabeth 2017-08-24 21:08:03 UTC
 (In reply to Chris Wilson from comment #14)
> I've told igt to skip any test relying on MI_STORE_DWORD on Broadwater (and
> reduced the reliance where trivial). It does mean that we have a large gap
> in coverage on brw, but until we can get gem_exec_store passing, the results
> in BAT aren't helpful.
Hello, is there any update on this? Thanks.
Comment 17 Martin Peres 2018-05-23 23:11:47 UTC
Downgrading the priority to lowest, since applications have obviously been fine with this behaviour since forever.
Comment 18 Francesco Balestrieri 2018-07-02 09:41:35 UTC
I'm going to close this since there doesn't seem to be any impact.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.