Bug 100587 - [CI]gem_mmap_gtt/coherency failing assertion cpu[x] == i
Summary: [CI]gem_mmap_gtt/coherency failing assertion cpu[x] == i
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 100598 103079 104002 104250 104372 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-04-05 16:44 UTC by Elio
Modified: 2018-09-07 17:55 UTC (History)
5 users (show)

See Also:
i915 platform: BSW/CHT, BXT, CNL, GLK
i915 features: GEM/Other


Attachments
dmesg log (110.66 KB, text/plain)
2017-04-05 16:44 UTC, Elio
no flags Details
output (1.34 KB, text/plain)
2017-07-11 19:59 UTC, Armando Antonio
no flags Details

Description Elio 2017-04-05 16:44:32 UTC
Created attachment 130703 [details]
dmesg log

The following test in igt is failing: gem_mmap_gtt@coherency

Steps to reproduce:

1. Install igt tools
2. Execute gem_mmap_gtt@coherency

Showing error message without dmesg error:

IGT-Version: 1.18-g56741ce (x86_64) (Linux: 4.11.0-rc5-drm-tip-qa-ww14-commit-5bc82ec+ x86_64)
(gem_mmap_gtt:1533) CRITICAL: Test assertion failure function test_coherency, file gem_mmap_gtt.c:336:
(gem_mmap_gtt:1533) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:1533) CRITICAL: error: 0 != 64
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [__real_main773+0x13b1]
  #2 [<unknown>+0x13b1]
  #3 [<unknown>+0x13b1]
  #4 [<unknown>+0x13b1]
Subtest coherency failed.
**** DEBUG ****
(gem_mmap_gtt:1533) DEBUG: Test requirement passed: igt_setup_clflush()
(gem_mmap_gtt:1533) CRITICAL: Test assertion failure function test_coherency, file gem_mmap_gtt.c:336:
(gem_mmap_gtt:1533) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:1533) CRITICAL: error: 0 != 64
****  END  ****
Subtest coherency: FAIL (0.016s)



Configuration:

Kernel version:
4.11.0-rc5   commit-5bc82ec

Component         : drm
	url       : http://cgit.freedesktop.org/mesa/drm
	tag       : libdrm-2.4.76-16-g6312017
	commit    : 6312017
	author    : Emil Velikov <emil.l.velikov@gmail.com>
	age       : Mon Apr 3 18:01:49 2017 +0100 7 hours ago
	comment   : configure.ac: bring back pthread-stubs check


Component         : mesa
	url       : http://cgit.freedesktop.org/mesa/mesa
	tag       : 17.0-branchpoint-2397-g405ef7b
	commit    : 405ef7b
	author    : Jason Ekstrand <jason@jlekstrand.net>
	age       : Mon Apr 3 16:58:35 2017 -0700 30 minutes ago
	comment   : intel/vec4: Add some fall through comments



Component         : xf86-video-intel
	url       : http://cgit.freedesktop.org/xorg/driver/xf86-video-intel
	tag       : 2.99.917-770-gcb6ba2d
	commit    : cb6ba2d
	author    : Chris Wilson <chris@chris-wilson.co.uk>
	age       : Sat Mar 25 01:21:46 2017 +0000 10 days ago

Component         : libva
	url       : http://cgit.freedesktop.org/libva
	tag       : libva-1.7.3.pre1-85-gefc164d
	commit    : efc164d
	author    : Xiang Haihao <haihao.xiang@intel.com>
	age       : Tue Mar 7 23:42:43 2017 +0800 4 weeks ago
	comment   : Bump libva to 1.8.1.pre1 for development


Component         : intel-driver
	url       : http://cgit.freedesktop.org/vaapi/intel-driver
	tag       : 1.7.3-359-g437cbe0
	commit    : 437cbe0
	author    : Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
	age       : Wed Mar 29 08:29:46 2017 +0800 6 days ago
	comment   : gen8: accept P010 as valid format



Component         : cairo
	url       : http://cgit.freedesktop.org/cairo
	tag       : 1.15.4-11-gcffa452
	commit    : cffa452
	author    : Debarshi Ray <debarshir@freedesktop.org>
	age       : Wed Mar 15 20:26:22 2017 -0700 3 weeks ago
	comment   : doc: Clarify when the device scale is inherited and when 
Component         : xserver
	url       : http://cgit.freedesktop.org/xorg/xserver
	tag       : xorg-server-1.19.0-184-ge4d0757
	commit    : e4d0757
	author    : Adam Jackson <ajax@redhat.com>
	age       : Thu Mar 30 11:32:02 2017 -0400 4 days ago
	comment   : xfree86: Remove driver entity hooks and private



Component         : macros
	url       : https://cgit.freedesktop.org/xorg/util/macros
	tag       : util-macros-1.19.1-2-g39f07f7
	commit    : 39f07f7
	author    : Emil Velikov <emil.veliko@collabora.com>
	age       : Mon Feb 20 10:16:40 2017 +1000 6 weeks ago
	comment   : Rework INSTALL_CMD to touch/echo >&2 only as needed



Component         : intel-gpu-tools
	url       : https://cgit.freedesktop.org/xorg/app/intel-gpu-tools
	tag       : intel-gpu-tools-1.18-56-g56741ce
	commit    : 56741ce
	author    : Chris Wilson <chris@chris-wilson.co.uk>
	age       : Mon Apr 3 19:19:42 2017 +0100 6 hours ago
	comment   : tests/gem_media_fill: Fixup typo



Component         : piglit
	url       : https://cgit.freedesktop.org/piglit
	tag       : piglit-v1
	commit    : 3d1cbd9
	author    : Vinson Lee <vlee@freedesktop.org>
	age       : Mon Apr 3 15:46:33 2017 -0700 2 hours ago
	comment   : glslparsertest: Add test case for FDO bug #100438.
Comment 1 Chris Wilson 2017-04-05 16:54:58 UTC
Expected, it demonstrates that there is a delay in posting writes via the GTT when compared to accessing the physical page directly.
Comment 2 Elio 2017-04-05 18:48:24 UTC
is this the same case for gem_mmap_gtt@swap* as well?
Comment 3 Chris Wilson 2017-04-05 18:54:15 UTC
(In reply to Elio from comment #2)
> is this the same case for gem_mmap_gtt@swap* as well?

Is a completely different class of bug, it should fail exactly like #100585.
Comment 4 maria guadalupe 2017-04-05 21:06:12 UTC
(In reply to Chris Wilson from comment #3)
> (In reply to Elio from comment #2)
> > is this the same case for gem_mmap_gtt@swap* as well?
> 
> Is a completely different class of bug, it should fail exactly like #100585.

this issue happening over GLK. is this expected ? 

output
==============================
./gem_mmap_gtt --r coherency --debug
IGT-Version: 1.18-g56741ce (x86_64) (Linux: 4.11.0-rc4-drm-tip-qa-ww13-commit-5c                                                                                                             7479a+ x86_64)
(gem_mmap_gtt:1736) drmtest-DEBUG: Test requirement passed: !(fd<0)
(gem_mmap_gtt:1736) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/de                                                                                                             bug/dri/0'
(gem_mmap_gtt:1736) igt-core-DEBUG: Starting subtest: coherency
(gem_mmap_gtt:1736) DEBUG: Test requirement passed: igt_setup_clflush()
(gem_mmap_gtt:1736) CRITICAL: Test assertion failure function test_coherency, fi                                                                                                             le gem_mmap_gtt.c:336:
(gem_mmap_gtt:1736) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:1736) CRITICAL: error: 0 != 64
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [__real_main773+0x13b1]
  #2 [<unknown>+0x13b1]
  #3 [<unknown>+0x13b1]
  #4 [<unknown>+0x13b1]
Subtest coherency failed.
**** DEBUG ****
(gem_mmap_gtt:1736) DEBUG: Test requirement passed: igt_setup_clflush()
(gem_mmap_gtt:1736) CRITICAL: Test assertion failure function test_coherency, fi                                                                                                             le gem_mmap_gtt.c:336:
(gem_mmap_gtt:1736) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:1736) CRITICAL: error: 0 != 64
****  END  ****
Subtest coherency: FAIL (0.020s)
Comment 5 Chris Wilson 2017-04-05 21:20:47 UTC
(In reply to maria guadalupe from comment #4)
> (In reply to Chris Wilson from comment #3)
> > (In reply to Elio from comment #2)
> > > is this the same case for gem_mmap_gtt@swap* as well?
> > 
> > Is a completely different class of bug, it should fail exactly like #100585.
> 
> this issue happening over GLK. is this expected ? 

It seems to be a feature of the Atom design since Baytrail.
Comment 6 Chris Wilson 2017-04-05 22:11:14 UTC
*** Bug 100598 has been marked as a duplicate of this bug. ***
Comment 7 Elizabeth 2017-06-22 19:56:25 UTC
Is there any update in this bug? If so, please could you share it? Thank you.
Comment 8 Chris Wilson 2017-06-22 20:02:51 UTC
(In reply to elizabethx.de.la.torre.mena from comment #7)
> Is there any update in this bug? If so, please could you share it? Thank you.

The hw is failing as expected. The raison d'etre of this bug is to demonstrate the issue in hw.
Comment 9 Armando Antonio 2017-07-11 19:58:44 UTC
The following test fail on BSW with latest configuration

====================================================
Test list
====================================================
igt@gem_mmap_gtt@coherency


====================================================
Graphic Stack
====================================================
Component: drm
    tag: libdrm-2.4.81-24-g3095cc8
    commit: 3095cc8eaba1aa87ad38c04ae2b1eabe30f7e16c
Component: cairo
    tag: 1.15.6-2-g57b4050
    commit: 57b40507dda3f58dfc8635548d606b86dc7bcf51
Component: intel-gpu-tools
    tag: intel-gpu-tools-1.19-57-g6fcc8e8
    commit: 6fcc8e8b247661c7950b998e0b95141ffbd6b833
Component: piglit
    tag: piglit-v1
    commit: c8f4fd9eeb298a2ef0855927f22634f794ef3eff

======================================
             Hardware
======================================
platform                   : Braswell
motherboard model          : 10G9000NUS
motherboard id             : BRASWELL
form factor                : Desktop
manufacturer               : LENOVO
cpu family                 : Pentium
cpu family id              : 6
cpu information            : Intel(R) Pentium(R) CPU  N3700  @ 1.60GHz
gpu card                   : Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller (rev 21) (prog-if 00 [VGA controller])
memory ram                 : 7.68 GB
max memory ram             : 8 GB
cpu thread                 : 4
cpu core                   : 4
cpu model                  : 76
cpu stepping               : 3
socket                     : Socket BGA1155
signature                  : Type 0, Family 6, Model 76, Stepping 3
hard drive                 : 476GiB (512GB)
current cd clock frequency : 266667 kHz
maximum cd clock frequency : 320000 kHz
displays connected         : DP-1 DP-3
Comment 10 Armando Antonio 2017-07-11 19:59:15 UTC
Created attachment 132620 [details]
output
Comment 11 Elizabeth 2017-09-14 19:57:41 UTC
(In reply to Chris Wilson from comment #8)
> (In reply to elizabethx.de.la.torre.mena from comment #7)
> > Is there any update in this bug? If so, please could you share it? Thank you.
> 
> The hw is failing as expected. The raison d'etre of this bug is to
> demonstrate the issue in hw.
So this should remain open until HW is changed?
Comment 12 Hector Velazquez 2017-09-27 19:33:33 UTC
This test is still failing on GLK QA

Tests List:

igt@gem_mmap_gtt@coherency


====================================================
Output
====================================================

. . .
**** DEBUG ****
(gem_mmap_gtt:5993) DEBUG: Test requirement passed: igt_setup_clflush()
(gem_mmap_gtt:5993) CRITICAL: Test assertion failure function test_coherency, file gem_mmap_gtt.c:335:
(gem_mmap_gtt:5993) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:5993) CRITICAL: error: 0 != 64
(gem_mmap_gtt:5993) igt-core-INFO: Stack trace:
(gem_mmap_gtt:5993) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(gem_mmap_gtt:5993) igt-core-INFO:   #1 [__real_main791+0x1410]
(gem_mmap_gtt:5993) igt-core-INFO:   #2 [<unknown>+0x1410]
(gem_mmap_gtt:5993) igt-core-INFO:   #3 [<unknown>+0x1410]
****  END  ****
. . .



This is my configuration:

======================================
        Graphic stack
======================================
Component: drm
    tag: libdrm-2.4.81-56-g7c71188
    commit: 7c71188610b4ceba0339c2bc884320bcb749adee

Component: cairo
    tag: 1.15.6-42-gdccbed7
    commit: dccbed7d78d32bd3b912e8810379451dd94e6a1f

Component: intel-gpu-tools
    tag: intel-gpu-tools-1.19-332-g0a91a5e
    commit: 0a91a5e9624d41d23b79e2540eda111cb56d42d9

Component: piglit
    tag: piglit-v1
    commit: 95e2f51a28b6cf7ff77d84e1234121c98f10ef64
	
======================================
             Software
======================================
kernel version              : 4.14.0-rc2-drm-tip-ww39-commit-d76cbbc+
hostname                    : GLK-2-GLKRVP1DDR405
architecture                : x86_64
os version                  : Ubuntu 16.10
os codename                 : yakkety
kernel driver               : i915
bios revision               : 62.30
bios release date           : 08/22/2017
ksc                         : 1.41
hardware acceleration       : disabled
swap partition              : enabled on (/dev/sda3)

======================================
        Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm                      : 2.4.83
cairo                       : 1.15.9
intel-gpu-tools (tag)       : intel-gpu-tools-1.19-332-g0a91a5e
intel-gpu-tools (commit)    : 0a91a5e

======================================
             Hardware
======================================
. . .

======================================
             Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.4
guc fw loaded             : SUCCESS
guc version wanted        : 10.56
guc version found         : 10.56
huc fw loaded             : yes

======================================
             kernel parameters
======================================
quiet drm.debug=0xe pci=pcie_bus_safe i915.alpha_support=1 i915.enable_guc_loading=2 i915.enable_guc_submission=2 intel_iommu=igfx_off auto panic=1 nmi_watchdog=panic resume=/dev/sda3 fastboot
Comment 13 Marta Löfstedt 2017-10-09 10:15:19 UTC
In BUG 103079 Chris Wilson claim it is the same issue as this. So, from Ci perspective I will use this bug.

At least from CI_DRM_3118 on APL-shards:

(prime_vgem:2592) CRITICAL: Test assertion failure function test_gtt_interleaved, file prime_vgem.c:273:
(prime_vgem:2592) CRITICAL: Failed assertion: gtt[1024*i] == ~i
(prime_vgem:2592) CRITICAL: error: 0 != -1
Subtest coherency-gtt failed.


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3171/shard-apl5/igt@prime_vgem@coherency-gtt.html
Comment 14 Marta Löfstedt 2017-10-18 10:37:47 UTC
CI_DRM_3253 GLK-shards fail:

(prime_vgem:2599) CRITICAL: Test assertion failure function test_gtt_interleaved, file prime_vgem.c:273:
(prime_vgem:2599) CRITICAL: Failed assertion: gtt[1024*i] == ~i
(prime_vgem:2599) CRITICAL: error: 0 != -1
Subtest coherency-gtt failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3253/shard-glkb1/igt@prime_vgem@coherency-gtt.html
Comment 15 Martin Peres 2017-10-19 10:35:46 UTC
Increasing the priority as it affects latest platforms.
Comment 16 Martin Peres 2017-10-19 10:36:48 UTC
*** Bug 103079 has been marked as a duplicate of this bug. ***
Comment 17 Daniel Vetter 2017-11-08 10:14:39 UTC
Since it's a hw issue we can't work around, marking as wontfix.

Note: We need to make sure we don't add more machines to this one, before the case is reviewed by developers.
Comment 18 Chris Wilson 2017-12-14 07:53:55 UTC
*** Bug 104250 has been marked as a duplicate of this bug. ***
Comment 19 Chris Wilson 2017-12-14 07:54:15 UTC
*** Bug 104002 has been marked as a duplicate of this bug. ***
Comment 20 Marta Löfstedt 2018-03-12 12:00:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3903/fi-cnl-drrs/igt@gem_mmap_gtt@coherency.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3903/fi-cnl-y3/igt@gem_mmap_gtt@coherency.html

(gem_mmap_gtt:1546) CRITICAL: Test assertion failure function test_coherency, file gem_mmap_gtt.c:335:
(gem_mmap_gtt:1546) CRITICAL: Failed assertion: cpu[x] == i
(gem_mmap_gtt:1546) CRITICAL: error: 0 != 64
Comment 25 Francesco Balestrieri 2018-07-02 10:22:57 UTC
*** Bug 104372 has been marked as a duplicate of this bug. ***
Comment 26 Chris Wilson 2018-07-20 16:40:43 UTC
kernel commit 900ccf30f9e112b508a61b228bf014e3bea14bc4 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 20 11:19:10 2018 +0100

    drm/i915: Only force GGTT coherency w/a on required chipsets
    
    Not all chipsets have an internal buffer delaying the visibility of
    writes via the GGTT being visible by other physical paths, but we use a
    very heavy workaround for all. We only need to apply that workarounds to
    the chipsets we know suffer from the delay and the resulting coherency
    issue.
    
    Similarly, the same inconsistent coherency fouls up our ABI promise that
    a write into a mmap_gtt is immediately visible to others. Since the HW
    has made that a lie, let userspace know when that contract is broken.
    (Not that userspace would want to use mmap_gtt on those chipsets for
    other performance reasons...)
    
    Testcase: igt/drv_selftest/live_coherency
    Testcase: igt/gem_mmap_gtt/coherency
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180720101910.11153-1-chris@chris-wilson.co.uk

igt commit 65cdccdc7bcbb791d791aeeeecb784a382110a3c (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 20 09:02:26 2018 +0100

    igt/gem_mmap_gtt: Check for known incoherency before testing
    
    We test map_gtt coherency (whether or not a write via the mmap_gtt is
    immediately visible in the backing storage to a read via mmap_cpu) but
    we know that several platforms are inherently incorrect and require some
    form of hammer to workaround internal delays. These platforms break our
    ABI guarantees and so we report the change in ABI via a driver getparam.
    
    If we know the platform doesn't meet the ABI guarantee, skip the test.
    If it is meant to work, test!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Comment 27 Chris Wilson 2018-08-07 08:37:53 UTC
commit 21eb1850fa0bd0a9b729bf3708da78888433027f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 1 11:47:21 2018 +0100

    drm/i95: Mark GGTT as incoherent for gen10+
    
    The evidence suggests that we need to start treating writes via GGTT as
    incoherent for gen10+, that is that they are internally buffered and not
    immediately visible via a read along a different physical path.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107398
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107400
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107435
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180801104721.4030-1-chris@chris-wilson.co.uk
Comment 28 Martin Peres 2018-09-04 07:53:12 UTC
The following platforms are also incoherent:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-blb-e6850/igt@prime_vgem@coherency-gtt.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-pnv-d510/igt@prime_vgem@coherency-gtt.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-elk-e7500/igt@prime_vgem@coherency-gtt.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_101/fi-bwr-2160/igt@prime_vgem@coherency-gtt.html

(prime_vgem:1363) CRITICAL: Test assertion failure function test_gtt_interleaved, file ../tests/prime_vgem.c:314:
(prime_vgem:1363) CRITICAL: Failed assertion: gtt[1024*i] == ~i
(prime_vgem:1363) CRITICAL: error: 0 != -1
Subtest coherency-gtt failed.

The other platforms have indeed been fixed/silenced.
Comment 29 Chris Wilson 2018-09-04 08:01:54 UTC
Different test; hosts that don't show the coherency issue in the specific test for it => different bug.
Comment 30 Chris Wilson 2018-09-04 09:26:39 UTC
(In reply to Chris Wilson from comment #29)
> Different test; hosts that don't show the coherency issue in the specific
> test for it => different bug.

For example, our previous issue was that the indirect write via the GGTT was being buffered, but in this case it's the WC writes that aren't immediately visible. The first half of the loop (writing into GTT, reading via WC) works without any sync.

Argh!
Comment 31 Chris Wilson 2018-09-04 09:59:21 UTC
To be more precise the problem on my i915gm is that I get a WB vgem mmap; so obviously it is not being flushed to system memory immediately.
Comment 32 Martin Peres 2018-09-07 17:55:07 UTC
(In reply to Chris Wilson from comment #29)
> Different test; hosts that don't show the coherency issue in the specific
> test for it => different bug.

Moved to https://bugs.freedesktop.org/show_bug.cgi?id=107862.

Thanks for your explanation!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.