Bug 97967 - glsl/tests/cache-test regression
Summary: glsl/tests/cache-test regression
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: 13.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks: 99517
  Show dependency treegraph
 
Reported: 2016-09-29 01:30 UTC by Vinson Lee
Modified: 2017-02-10 12:52 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Some debug printfs (2.00 KB, patch)
2017-02-06 10:51 UTC, Emil Velikov
Details | Splinter Review
Possible fix (831 bytes, patch)
2017-02-08 04:12 UTC, Timothy Arceri
Details | Splinter Review
Possible fix (1.26 KB, patch)
2017-02-08 04:19 UTC, Timothy Arceri
Details | Splinter Review
Possible fix (2.54 KB, patch)
2017-02-08 04:52 UTC, Timothy Arceri
Details | Splinter Review

Description Vinson Lee 2016-09-29 01:30:25 UTC
mesa: 8c60bcb4c317026e017a8ecffe303fd4e7f0db33 (master 12.1.0-devel)

make check fails on CentOS 7.2.

====================================================
   Mesa 12.1.0-devel: src/compiler/test-suite.log
====================================================

# TOTAL: 10
# PASS:  9
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: glsl/tests/cache-test
===========================

Mesa warning: Failed to create ./cache-test-tmp/xdg-cache-home for shader cache (No such file or directory)---disabling.

Mesa warning: Failed to create ./cache-test-tmp/mesa-glsl-cache-dir for shader cache (No such file or directory)---disabling.

Error: Test 'cache_put eviction with MAX_SIZE=1K' failed: Expected=1, Actual=2
Error: Test 'eviction after overflow with MAX_SIZE=1M' failed: Expected=2, Actual=3


87ab26b2ab35a29d446ae66f1795d40c184c0739 is the first bad commit
commit 87ab26b2ab35a29d446ae66f1795d40c184c0739
Author: Timothy Arceri <timothy.arceri@collabora.com>
Date:   Wed Sep 28 08:55:02 2016 +1000

    glsl: Add initial functions to implement an on-disk cache
    
    This code provides for an on-disk cache of objects. Objects are stored
    and retrieved via names that are arbitrary 20-byte sequences,
    (intended to be SHA-1 hashes of something identifying for the
    content). The directory used for the cache can be specified by means
    of environment variables in the following priority order:
    
        $MESA_GLSL_CACHE_DIR
        $XDG_CACHE_HOME/mesa
        <user-home-directory>/.cache/mesa
    
    By default the cache will be limited to a maximum size of 1GB. The
    environment variable:
    
        $MESA_GLSL_CACHE_MAX_SIZE
    
    can be set (at the time of GL context creation) to choose some other
    size. This variable is a number that can optionally be followed by
    'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
    gigabytes. By default, an unadorned value will be interpreted as
    gigabytes.
    
    The cache will be entirely disabled at runtime if the variable
    MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.
    
    Many thanks to Kristian Høgsberg <krh@bitplanet.net> for the initial
    implementation of code that led to this patch. In particular, the idea
    of using an mmapped file, (indexed by a portion of the SHA-1), for the
    efficent implementation of cache_has_key was entirely his
    idea. Kristian also provided some very helpful advice in discussions
    regarding various race conditions to be avoided in this code.
    
    Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
    Reviewed-by: Eric Anholt <eric@anholt.net>

:100644 100644 b9e6000a3d56cae090469657a86e222e13836215 c702b5399949f8972535d7db4f523b404c0836ec M      configure.ac
:040000 040000 becff3daff0cdfc5d5e83ef57d1a4ea72d705932 fed451616b0589d077aeb27223fc8310b8473a95 M      src
bisect run success
Comment 1 Timothy Arceri 2016-10-31 13:34:37 UTC
There is a problem creating the folders I'm not sure why. Do you have some kind of odd user setup?

Either way I don't think this should be a blocked for 13.0 because nobody should be building with this enabled as it still doesn't do anything.
Comment 2 Tapani Pälli 2016-11-01 05:52:07 UTC
maybe it does not implement '-p' for the mkdir? My original cache set did this, if wanted some code can be taken from here:

https://cgit.freedesktop.org/~tpalli/mesa/log/?h=automatic_cache_mgmt

following patches:

mesa: add _mesa_mkdir helper to imports.h
mesa/program: add disk cache functionality

makes parent directories if needed and is more windows friendly (AFAIK apart from mkdir and mmap it looks like cache code should work on windows).
Comment 3 Emil Velikov 2016-11-03 15:51:12 UTC
The "Failed to create foo" messages are expected and deliberate.

The actual issue is the next lines:

Error: Test 'cache_put eviction with MAX_SIZE=1K' failed: Expected=1, Actual=2
Error: Test 'eviction after overflow with MAX_SIZE=1M' failed: Expected=2, Actual=3

Currently we have 7 (seven) different codepaths for the SHA implementation (as provided by 8 libraries). My local work to resolve that ran into similar issues. That said the following come to mind:
 - Buggy sha implementation - play with --with-sha1 + libc|libmd|libnettle|libgcrypt|libcrypto|libsha1|CommonCrypto|CryptoAPI
 - Concurrency issues - does 5c73ecaac487eba36e15f22be2e9396c4a0ffe46 help ?
 - Mesa bugs ?
Comment 4 Emil Velikov 2017-01-16 14:59:17 UTC
Fwiw I've ran into identical issue whist importing a SHA1 implementation.

The issue here is buggy SHA1 implementation. As such if you print the result of _mesa_sha1_compute and match that across sha1 implementation you'll see it first hand.

Thus for this bug the cache_test.c comment "For this test, we force this signature to land in the same directory as the original blob first written to the cache," is not true, leading to the interesting experience.

Fwiw I've imported OpenBSD's implementation [1]. It works great, saves us a bit of code, configure script and toggles.

[1] https://patchwork.freedesktop.org/patch/133113/
Comment 5 Emil Velikov 2017-01-19 13:26:20 UTC
Should no longer be an issue, since

commit d1efa09d342bff3e5def2978a0bef748d74f9c82
Author: Emil Velikov <emil.velikov@collabora.com>
Date:   Fri Jan 13 16:51:31 2017 +0000

    util: import sha1 implementation from OpenBSD
Comment 6 Vinson Lee 2017-01-23 07:09:54 UTC
mesa: 3eada948a05d34fabfcfac69d6c33708cb1cf740 (master 17.1.0-devel)


====================================================
   Mesa 17.1.0-devel: src/compiler/test-suite.log
====================================================

# TOTAL: 10
# PASS:  9
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: glsl/tests/cache-test
===========================

Failed to create ./cache-test-tmp/xdg-cache-home for shader cache (No such file or directory)---disabling.
Failed to create ./cache-test-tmp/mesa-glsl-cache-dir for shader cache (No such file or directory)---disabling.
Error: Test 'disk_cache_put eviction with MAX_SIZE=1K' failed: Expected=1, Actual=2
Error: Test 'eviction after overflow with MAX_SIZE=1M' failed: Expected=2, Actual=3
Comment 7 Timothy Arceri 2017-01-23 07:21:11 UTC
The real issue is these MAX_SIZE limits are far too small. Arguably we should handle it anyway or maybe apply a minimum size.
Comment 8 Tapani Pälli 2017-01-23 12:25:06 UTC
(In reply to Timothy Arceri from comment #7)
> The real issue is these MAX_SIZE limits are far too small. Arguably we
> should handle it anyway or maybe apply a minimum size.

hmm let me share my theory .. MAX_SIZE limit set by this test is small intentionally to be able to force eviction to happen. There is some intent in the test to evict exact wanted entry BUT for some reason picking that exact entry on CentOS fails. My theory is that in CentOS environment compiler generates such code that our integrated sha1 implementation will produce different results than on system where test was generated.
Comment 9 Emil Velikov 2017-01-24 13:26:12 UTC
To check the theory in comment 8 just add a couple of printfs in test_put_and_get() and/or get_cache_file().

Vinson can you give this a try - it will save us all a fair bit of guesswork/time.
Comment 10 Vinson Lee 2017-01-26 04:43:36 UTC
Can someone provide a patch with the print statements wanted to test?
Comment 11 Timothy Arceri 2017-02-05 21:21:47 UTC
FRom the gentoo bug:

"This only happens with ABI_X86=32, the test passes for native abi."
Comment 12 Emil Velikov 2017-02-06 10:51:10 UTC
Created attachment 129357 [details] [review]
Some debug printfs

Adding a reference to the Gentoo bug and the patch in question.
Once executed the output should be as below. I've intentionally left the whole sha1 as a complete reference point.


Failed to create ./cache-test-tmp/xdg-cache-home for shader cache (No such file or directory)---disabling.
Failed to create ./cache-test-tmp/mesa-glsl-cache-dir for shader cache (No such file or directory)---disabling.
_mesa_sha1_compute ca db d9 c9 9c d7 e1 b4 e0 a5 6f cf 25 8e 83 b5 9f 74 5a 56
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/dbd9c99cd7e1b4e0a56fcf258e83b59f745a56".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 4d 78 cc 97 5b 41 97 6b b e3 58 2c 90 bf bd 6 26 a f4 d8
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/4d/78cc975b41976b0be3582c90bfbd06260af4d8".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 60 ca cb f3 d7 2e 1e 78 34 20 3d a6 8 3 7b 1b f8 3b 40 e8
test_put_and_get: Adjusting the first byte of one_KB_key[] the to 0xca.
This is to ensure that the correct folder is picked "ca/" for the eviction that follows.
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/cacbf3d72e1e7834203da608037b1bf83b40e8".
NOTE the directory name wrt sha1[0].

disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/dbd9c99cd7e1b4e0a56fcf258e83b59f745a56".
NOTE the directory name wrt sha1[0].

disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/4d/78cc975b41976b0be3582c90bfbd06260af4d8".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 3b 71 f4 3f f3 f 4b 15 b5 cd 85 dd 9e 95 eb c7 e8 4e b5 a3
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3".
NOTE the directory name wrt sha1[0].
Comment 13 Emil Velikov 2017-02-06 10:53:07 UTC
Another attempt to add the Gentoo bug report.
Comment 14 Emil Velikov 2017-02-06 10:57:40 UTC
Semi-random thoughts which came to mind. Feel free to pursue if interested - I'm sidetracked with something else atm. 
 - endianess issue ?
 - fs is missing/buggy/etc. flock and friends ?
 - sizeof(void *) related ?
Comment 15 Vinson Lee 2017-02-07 06:41:36 UTC
Here is make check failure with the debug statements from attachment 129357 [details] [review].

FAIL: glsl/tests/cache-test
===========================

Failed to create ./cache-test-tmp/xdg-cache-home for shader cache (No such file or directory)---disabling.
Failed to create ./cache-test-tmp/mesa-glsl-cache-dir for shader cache (No such file or directory)---disabling.
_mesa_sha1_compute ca db d9 c9 9c d7 e1 b4 e0 a5 6f cf 25 8e 83 b5 9f 74 5a 56
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/dbd9c99cd7e1b4e0a56fcf258e83b59f745a56".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 4d 78 cc 97 5b 41 97 6b b e3 58 2c 90 bf bd 6 26 a f4 d8
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/4d/78cc975b41976b0be3582c90bfbd06260af4d8".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 60 ca cb f3 d7 2e 1e 78 34 20 3d a6 8 3 7b 1b f8 3b 40 e8
test_put_and_get: Adjusting the first byte of one_KB_key[] the to 0xca.
This is to ensure that the correct folder is picked "ca/" for the eviction that follows.
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/cacbf3d72e1e7834203da608037b1bf83b40e8".
NOTE the directory name wrt sha1[0].

Error: Test 'disk_cache_put eviction with MAX_SIZE=1K' failed: Expected=1, Actual=2
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/dbd9c99cd7e1b4e0a56fcf258e83b59f745a56".
NOTE the directory name wrt sha1[0].

disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/4d/78cc975b41976b0be3582c90bfbd06260af4d8".
NOTE the directory name wrt sha1[0].

_mesa_sha1_compute 3b 71 f4 3f f3 f 4b 15 b5 cd 85 dd 9e 95 eb c7 e8 4e b5 a3
disk_cache_put: get_cache_file() returned "./cache-test-tmp/mesa-glsl-cache-dir/ca/71f43ff30f4b15b5cd85dd9e95ebc7e84eb5a3".
NOTE the directory name wrt sha1[0].

Error: Test 'eviction after overflow with MAX_SIZE=1M' failed: Expected=2, Actual=3
Comment 16 Timothy Arceri 2017-02-08 04:12:59 UTC
Created attachment 129404 [details] [review]
Possible fix

Does this patch help?
Comment 17 Timothy Arceri 2017-02-08 04:19:33 UTC
Created attachment 129405 [details] [review]
Possible fix

Please try this one instead
Comment 18 Timothy Arceri 2017-02-08 04:52:44 UTC
Created attachment 129406 [details] [review]
Possible fix

Note to self, test patch *then* post.

Please try this one.
Comment 19 Vinson Lee 2017-02-09 22:29:27 UTC
Applied attachment 129406 [details] [review]. make check passes.

====================================================
   Mesa 17.1.0-devel: src/compiler/test-suite.log
====================================================

# TOTAL: 10
# PASS:  10
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2


Tested-by: Vinson Lee <vlee@freedesktop.org>
Comment 20 Timothy Arceri 2017-02-10 12:52:29 UTC
Should now be fixed by:

commit d7b3707c612027b354deea6bc5eae56a02d5f8d5
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date:   Wed Feb 8 15:05:19 2017 +1100

    util/disk_cache: use stat() to check if entry is a directory
    
    d_type is not supported on all systems.
    
    Tested-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97967


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.