Bug 89219 - arb_uniform_buffer_object-bufferstorage fails now and then
Summary: arb_uniform_buffer_object-bufferstorage fails now and then
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-19 06:42 UTC by Tapani Pälli
Modified: 2016-03-02 00:26 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
workaround for the issue (625 bytes, patch)
2015-02-19 07:38 UTC, Tapani Pälli
Details | Splinter Review

Description Tapani Pälli 2015-02-19 06:42:17 UTC
It seems to fail in a quite random fashion, failure looks like this:

--- 8< ---
UBO ub_pos_size: index = 0, size = 16
UBO ub_color: index = 1, size = 32
UBO ub_rot: index = 2, size = 16
Probe color at (40,40)
  Expected: 1.000000 0.000000 0.000000 0.500000
  Observed: 0.200000 0.200000 0.200000 0.200000
Probe color at (120,40)
  Expected: 0.000000 1.000000 0.000000 0.250000
  Observed: 0.200000 0.200000 0.200000 0.200000
Probe color at (40,120)
  Expected: 0.000000 0.000000 1.000000 0.200000
  Observed: 0.200000 0.200000 0.200000 0.200000
Comment 1 Tapani Pälli 2015-02-19 06:44:43 UTC
It looks to me like "ARB_sync" does not work. If I do glFinish() instead of fence sync then the test always passes.
Comment 2 Tapani Pälli 2015-02-19 07:12:53 UTC
It actually fails all the time if run without '-fbo -auto'. From the contents it's easy to see why, there should be always 4 rectangles but in case of failure there are missing ones.
Comment 3 Tapani Pälli 2015-02-19 07:38:39 UTC
Created attachment 113653 [details] [review]
workaround for the issue

this patch for Mesa makes test always pass
Comment 4 Tapani Pälli 2015-02-19 08:03:49 UTC
Some more info, sometimes drm_intel_gem_bo_wait fails with 'timer expired' and also 'unknown error', workaround fixes this. Another way would be to start handling errors in the function.
Comment 5 Ian Romanick 2015-02-19 18:51:15 UTC
The sporadic failures are not limited to HSW.  I seem them on IVB as well.  I suspect Mark sees them on other platforms.

Ilia: Did this test ever consistently pass on whatever hardware you developed it on?  Also, what hardware was that?

I'm not sure if this is a bug in Mesa or in the test.
Comment 6 Ilia Mirkin 2015-02-19 18:58:45 UTC
(In reply to Ian Romanick from comment #5)
> The sporadic failures are not limited to HSW.  I seem them on IVB as well. 
> I suspect Mark sees them on other platforms.
> 
> Ilia: Did this test ever consistently pass on whatever hardware you
> developed it on?  Also, what hardware was that?
> 
> I'm not sure if this is a bug in Mesa or in the test.

The test passes pretty reliably on nvc0 right now. It definitely didn't originally and I had to fix up some stuff in the driver... (see commit 9a37eb8adb6558a4abf47774b583cb582a0ae116). I'm fairly sure it passes reliably on nv50 hw as well (which also had to get similar fixes).

But if it makes you feel better, 'bin/arb_uniform_buffer_object-rendering offset' fails fairly reliably on nvc0 and I have no clue why.

Basically the NVIDIA chips appear to cache constbuf lookups, so when you change them behind their backs [and there are also ways to change them in front of their backs], you have to tell them to flush their caches. I wouldn't be surprised if there were a similar situation with Intel chips.
Comment 7 Mark Janes 2015-02-19 19:32:32 UTC
I've disabled this test, but only on haswell systems.  I don't recall intermittent failures on other platforms, but I'll report them if I see them.

If it fails on other platforms, it is not to the same degree as on hsw.
Comment 8 Mark Janes 2015-02-19 21:34:34 UTC
I just noticed it fail on bay trail today.
Comment 9 Alejandro Piñeiro (freenode IRC: apinheiro) 2015-03-12 12:17:43 UTC
(In reply to Mark Janes from comment #7)
> I've disabled this test, but only on haswell systems.

I have been trying to reproduce the test fail on my haswell machine, but I was not able to do that. Each time I do 1000 runs of the test, passing always. I also tried to run it manually without -fbo and -auto as suggested on comment 2, but I always get those 4 squares.

Initially I assumed that this got solved due a specific change after the bug report, so I tried to bisect the change. As the description didn't specify a bad commit, I just tried to go to older commits:

 * ~mid-February => works fine
 * 10.3.2 => works fine
 * 10.2.9 => works fine
 * 10.1.6 => test is skipped, as needs GL_ARB_buffer_storage implemented

So right now I assume that the problem got introduced and quickly solved between releases.
Comment 10 Tapani Pälli 2015-03-12 12:25:49 UTC
This was fixed by:

commit 10c82c6c5fc415d323a5e9c6acdc6a4c85d6b712
Author: Kristian Høgsberg <krh@bitplanet.net>
Date:   Mon Mar 2 16:19:52 2015 -0800

    i965: Fix uint64_t overflow in intel_client_wait_sync()
    
    DRM_IOCTL_I915_GEM_WAIT takes an int64_t for the timeout value but
    GL_ARB_sync takes an uint64_t.  Further, the ioctl used to wait
    indefinitely when passed a negative timeout, but it's been broken and
    now returns immediately in that case.  Thus, if an application passes
    UINT64_MAX to wait forever, we overflow to -1LL and return immediately.
    Work around this mess by clamping the wait timeout to INT64_MAX.
    
    Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
    Reviewed-by: Chad Versace <chad.versace@intel.com>
Comment 11 Mark Janes 2015-11-13 16:58:38 UTC
This bug still occurs intermittently:
http://otc-mesa-ci.jf.intel.com/job/mesa_master_daily/1650/

/tmp/build_root/m32/lib/piglit/bin/bufferstorage-persistent read -auto -fbo
Probe [0] failed. Expected: 17.000000  Observed: 0.000000
Probe [1] failed. Expected: 13.000000  Observed: 0.000000
Probe [3] failed. Expected: 17.000000  Observed: 0.000000
Probe [4] failed. Expected: 18.000000  Observed: 0.000000
Probe [6] failed. Expected: 12.000000  Observed: 0.000000
Probe [7] failed. Expected: 13.000000  Observed: 0.000000
Probe [9] failed. Expected: 12.000000  Observed: 0.000000
Probe [10] failed. Expected: 18.000000  Observed: 0.000000
Probe [12] failed. Expected: 27.000000  Observed: 0.000000
Probe [13] failed. Expected: 13.000000  Observed: 0.000000
Probe [15] failed. Expected: 27.000000  Observed: 0.000000
Probe [16] failed. Expected: 18.000000  Observed: 0.000000
Probe [18] failed. Expected: 22.000000  Observed: 0.000000
Probe [19] failed. Expected: 13.000000  Observed: 0.000000
Probe [21] failed. Expected: 22.000000  Observed: 0.000000
Probe [22] failed. Expected: 18.000000  Observed: 0.000000
Probe [24] failed. Expected: 37.000000  Observed: 0.000000
Probe [25] failed. Expected: 13.000000  Observed: 0.000000
Probe [27] failed. Expected: 37.000000  Observed: 0.000000
Probe [28] failed. Expected: 18.000000  Observed: 0.000000
Probe [30] failed. Expected: 32.000000  Observed: 0.000000
Probe [31] failed. Expected: 13.000000  Observed: 0.000000
Probe [33] failed. Expected: 32.000000  Observed: 0.000000
Probe [34] failed. Expected: 18.000000  Observed: 0.000000
Probe [36] failed. Expected: 47.000000  Observed: 0.000000
Probe [37] failed. Expected: 13.000000  Observed: 0.000000
Probe [39] failed. Expected: 47.000000  Observed: 0.000000
Probe [40] failed. Expected: 18.000000  Observed: 0.000000
Probe [42] failed. Expected: 42.000000  Observed: 0.000000
Probe [43] failed. Expected: 13.000000  Observed: 0.000000
Probe [45] failed. Expected: 42.000000  Observed: 0.000000
Probe [46] failed. Expected: 18.000000  Observed: 0.000000
Comment 12 Mark Janes 2015-11-19 16:58:09 UTC
additionally, arb_buffer_storage tests fail in the same fashion on hsw:

piglit.spec.arb_buffer_storage.bufferstorage-persistent read client-storage.hswgt1m64 (from piglit)
Failing for the past 1 build (Since Unstable#258206 )
Took 17 ms.
Standard Output

/tmp/build_root/m64/lib/piglit/bin/bufferstorage-persistent read client-storage -auto -fbo
Probe [0] failed. Expected: 17.000000  Observed: 0.000000

Standard Error


start time: 1447930617.42
end time: 1447930617.44
Comment 13 Mark Janes 2016-03-02 00:26:48 UTC
I can't reproduce this bug with kernel 4.4 and latest mesa


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.