Bug 108722 - gem_exec_flush stress tests fail on APL(Celeron) Platform
Summary: gem_exec_flush stress tests fail on APL(Celeron) Platform
Status: CLOSED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-13 10:01 UTC by Ren Chenglei
Modified: 2018-12-28 08:47 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Subcases fail on APL(Celeron) platform (7.29 KB, text/plain)
2018-11-13 10:02 UTC, Ren Chenglei
no flags Details
Subcases pass on KBL platform (2.90 KB, text/plain)
2018-11-13 10:03 UTC, Ren Chenglei
no flags Details

Description Ren Chenglei 2018-11-13 10:01:26 UTC
There are some subcases in gem_exec_flush, which fail on APL platform(Intel(R) Celeron(R) CPU), but could pass on KBL platform.

Reproduce steps:
================
./gem_partial_pwrite_pread
cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-ro-before-default
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Starting subtest: uc-ro-before-default
(gem_exec_flush:6730) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:321:
(gem_exec_flush:6730) CRITICAL: Failed assertion: map[i] == i ^ 0xffffffff
(gem_exec_flush:6730) CRITICAL: error: 0x256 != 0xfffffda9
child 0 failed with exit status 99
Subtest uc-ro-before-default failed.
No log.
Subtest uc-ro-before-default: FAIL (5.240s)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
99|cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-ro-before-bsd
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Starting subtest: uc-ro-before-bsd
(gem_exec_flush:6736) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:323:
(gem_exec_flush:6736) CRITICAL: Failed assertion: map[i] == i
(gem_exec_flush:6736) CRITICAL: error: 0xabcdabcd != 0x332
child 1 failed with exit status 99
Subtest uc-ro-before-bsd failed.
No log.
Subtest uc-ro-before-bsd: FAIL (0.020s)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-rw-before-bsd
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Starting subtest: uc-rw-before-bsd
(gem_exec_flush:6753) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:323:
(gem_exec_flush:6753) CRITICAL: Failed assertion: map[i] == i
(gem_exec_flush:6753) CRITICAL: error: 0xdeadbeef != 0x27b
child 3 failed with exit status 99
Subtest uc-rw-before-bsd failed.
No log.
Subtest uc-rw-before-bsd: FAIL (7.018s)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
99|cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-ro-before-blt
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Starting subtest: uc-ro-before-blt
(gem_exec_flush:6758) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:321:
(gem_exec_flush:6758) CRITICAL: Failed assertion: map[i] == i ^ 0xffffffff
(gem_exec_flush:6758) CRITICAL: error: 0x376 != 0xfffffc89
child 3 failed with exit status 99
Subtest uc-ro-before-blt failed.
No log.
Subtest uc-ro-before-blt: FAIL (0.665s)
99|cel_apl:/data/igt #  ./gem_exec_flush --run-subtest uc-rw-before-blt
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Starting subtest: uc-rw-before-blt
(gem_exec_flush:6764) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:321:
(gem_exec_flush:6764) CRITICAL: Failed assertion: map[i] == i ^ 0xffffffff
(gem_exec_flush:6764) CRITICAL: error: 0xdeadbeef != 0xfffffd20
child 3 failed with exit status 99
Subtest uc-rw-before-blt failed.
No log.
Subtest uc-rw-before-blt: FAIL (4.265s)
cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-ro-before-vebox
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Starting subtest: uc-ro-before-vebox
(gem_exec_flush:6771) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:323:
(gem_exec_flush:6771) CRITICAL: Failed assertion: map[i] == i
(gem_exec_flush:6771) CRITICAL: error: 0xabcdabcd != 0x7
child 0 failed with exit status 99
Subtest uc-ro-before-vebox failed.
No log.
Subtest uc-ro-before-vebox: FAIL (0.045s)
99|cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-rw-before-vebox
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Starting subtest: uc-rw-before-vebox
(gem_exec_flush:6776) CRITICAL: Test assertion failure function run, file vendor/intel/intel-gpu-tools/tests/gem_exec_flush.c:323:
(gem_exec_flush:6776) CRITICAL: Failed assertion: map[i] == i
(gem_exec_flush:6776) CRITICAL: error: 0xdeadbeef != 0x1b6
child 0 failed with exit status 99
Subtest uc-rw-before-vebox failed.
No log.
Subtest uc-rw-before-vebox: FAIL (6.946s)
99|cel_apl:/data/igt # ./gem_exec_flush --run-subtest uc-rw-before-vebox-interruptible
IGT-Version: 1.23-@VCS_TAG@ (x86_64) (Linux: 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6 x86_64)
Has LLC? no
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Test requirement not met in function gem_require_ring, file vendor/intel/intel-gpu-tools/lib/ioctl_wrappers.c:1487:
Test requirement: gem_has_ring(fd, ring)
Starting subtest: uc-rw-before-vebox-interruptible
Child[0]: 1282955 cycles
Child[2]: 1598944 cycles
Child[1]: 1101032 cycles
Child[3]: 1599530 cycles
Subtest uc-rw-before-vebox-interruptible: SUCCESS (134.222s)
Comment 1 Ren Chenglei 2018-11-13 10:02:38 UTC
Created attachment 142452 [details]
Subcases fail on APL(Celeron) platform
Comment 2 Ren Chenglei 2018-11-13 10:03:15 UTC
Created attachment 142453 [details]
Subcases pass on KBL platform
Comment 3 Ren Chenglei 2018-11-13 10:07:16 UTC
We tried with same image(same kernel, cairo, drm, IGT and so on) on APL NUC and KBL NUC, issue only reproduced on APL NUC.
We worked on Android, and build IGT tool based on commit "9a8da36e708f", kernel version is Linux localhost 4.19.0-rc7-quilt-2e5dc0ac-geb7f67b87cf6
Comment 4 Chris Wilson 2018-11-13 10:58:43 UTC
Behold speculative fetches.
Comment 5 Ren Chenglei 2018-11-13 11:08:23 UTC
(In reply to Chris Wilson from comment #4)
> Behold speculative fetches.

Hi Wilson, we have projection on APL platform, and there does be failure on APL platform. Could you help take a review of this issue? Similar to 92845, the issue is also only reproduced on APL platform.
Comment 6 Tapani Pälli 2018-11-13 11:15:58 UTC
FYI the failures mentioned in comment #5 happen with SkQP test suite (when using Vulkan backend). Same tests are passing fine on Kabylake.
Comment 7 Jani Saarinen 2018-11-13 11:18:28 UTC
These tests are part of blacklist also: https://cgit.freedesktop.org/drm/igt-gpu-tools/tree/tests/intel-ci/blacklist.txt

Propably due to reason.
Comment 8 Tapani Pälli 2018-11-13 11:24:10 UTC
(In reply to Jani Saarinen from comment #7)
> These tests are part of blacklist also:
> https://cgit.freedesktop.org/drm/igt-gpu-tools/tree/tests/intel-ci/blacklist.
> txt
> 
> Propably due to reason.

That reason most likely is that these tests take hours to complete? As said, they are passing fine on KBL, is there a good reason why they fail on APL?
Comment 9 Ren Chenglei 2018-11-13 11:26:55 UTC
Thanks all for the quick response. As this issue blocked our milestone, so it is critical on our side. Let me move this back to open and please help take a look this issue.
Comment 10 Tapani Pälli 2018-11-13 11:30:11 UTC
Thanks for reopening this. We've spent days trying to figure out the issue in our vulkan driver and it seems to relate to clflush. There has been issues with clflush in the past (issue #92845) so we would like to get a better response than "Behold speculative fetches.". If there is a known issue with clflush on APL please let us know.
Comment 11 Chris Wilson 2018-11-13 11:40:11 UTC
The before- tests are meant to fail. Userspace is leaving the caches polluted and not performing any domain changes (no clflushes) prior to access. Their only purpose was to prove that we cannot apply this optimisation in the kernel and userspace must also be wary of not falling into the same trap.
Comment 12 Tapani Pälli 2018-11-13 11:48:22 UTC
(In reply to Chris Wilson from comment #11)
> The before- tests are meant to fail. Userspace is leaving the caches
> polluted and not performing any domain changes (no clflushes) prior to
> access. Their only purpose was to prove that we cannot apply this
> optimisation in the kernel and userspace must also be wary of not falling
> into the same trap.

Thanks for the explanation. Uhh .. this means we are likely missing a flush somewhere within anv driver or client is not properly calling vkFlushMappedMemoryRanges, vkInvalidateMappedMemoryRanges.
Comment 13 Yang 2018-11-13 13:17:20 UTC
Hi Chris:

So you mean this is only a CTS test case issue itself?
You mean the CTS test case don't call Vulkan flushmap related API correctly?


Thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.