Bug 98330

Summary: [dEQP, EGL] dEQP-EGL.functional.buffer_age.no_preserve fails
Product: Mesa Reporter: Mark Janes <mark.a.janes>
Component: Drivers/DRI/i965Assignee: Tapani Pälli <lemody>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 98315    
Attachments: fix
apitrace snapshot for "odd_clear_render_even_none"
apitrace snapshot for "odd_clear_render_even_none" with dummy clear
hack patch
mesa fix

Description Mark Janes 2016-10-19 18:07:09 UTC
Only on haswell, this category of tests fails with:

dEQP-EGL.functional.buffer_age.no_preserve.no_resize.odd_render_clear_even_render_render

Image comparison failed:
	allowed position deviation = (2, 2, 0)
	color threshold = (8, 8, 8, 0)
Comment 1 Mark Janes 2016-10-19 21:56:42 UTC
Correction.  These tests are flaky on HSW.  I've disabled them in the CI until this bug is addressed.
Comment 2 Mark Janes 2016-10-31 21:36:59 UTC
These tests may have been failing on haswell because those machines were updated to Linux 4.7.

After updating a skylake machine to linux 4.7, I see similar failures.

Tapani, can you confirm?
Comment 3 Tapani Pälli 2016-11-02 07:19:45 UTC
these tests are skipped on my SKL machine:

"NotSupported (EGL_EXT_buffer_age is not supported at teglBufferAgeTests.cpp:401)

and fail constantly on HSW machine:

"Fail (Fail, buffer content is not well preserved when age > 0)"

dEQP version on both machines matches:
812d768b55dcedf2c0fda63e69db3c05600f379d

and Mesa as well:
9f0726f3e509c80c78ddb5e7411fa34f676de71d

kernel on SKL is drm-nightly 4.8.0+, HSW has 4.7.9-200.fc24.x86_64. Then on top of this I have a feeling that this test will behave differently on DRI2 vs DRI3 because of the preserved bit support ...

I'll have to figure out first what is the requirement for EGL_EXT_buffer_age support to be there.
Comment 4 Tapani Pälli 2016-12-09 08:29:10 UTC
It looks like tests always verify the content no matter if preseved bit is set, I've sent a patch to Google and will be waiting for their feedback.
Comment 5 Mark Janes 2016-12-09 15:31:56 UTC
Please attach the patch here, and I'll use it in my testing.
Comment 6 Tapani Pälli 2016-12-09 15:51:03 UTC
Created attachment 128391 [details] [review]
fix

Patches changes test to verify buffer pixels only when preserved bit is set.
Comment 7 Mark Janes 2016-12-09 22:06:59 UTC
From the dEQP test engineers:

 From: Mika Isojarvi <misojarvi@google.com>
 Subject: Re: dEQP-EGL.functional.buffer_age.no_preserve bug
 To: Mark Janes <mark.a.janes@intel.com>
 Cc: deqp-external-requests <deqp-external-requests@google.com>
 Date: Fri, 09 Dec 2016 13:52:15 -0800

 [ multipart/alternative ]
 [ text/plain ]
 Hi Mark,

 The preserve bit is not relevant for the EGL buffer age extensions. Setting
 the preserve bit means that the color buffer must be preserved over the
 eglSwapBuffers() call. So the color buffer content at the end of the frame
 and at the beginning of the next frame must be exactly same. The EGL buffer
 age extension allows querying the age of the color buffers contents at the
 beginning of the frame e.g. if the driver returns  2 then the color buffers
 contents must match the content two frames ago(or it can return 0 if the
 color buffer contents don't match any previous frame). This guarantee does
 not require that the color buffer preserved bit is set.

 The test case is using preserve bit to check that if the preserve bit is
 set then buffer age is always 1. Meaning that the color buffer contents are
 the same as at the end of the previous frame.

 Regards,
 Mika
Comment 8 Tapani Pälli 2016-12-10 06:15:36 UTC
Right, now I remember how this is supposed to work. So Mesa is quite likely just simply setting wrong age then, will continue investigation.
Comment 9 Tapani Pälli 2016-12-28 07:11:48 UTC
Updating some status here since I've been debugging this for quite a long time now ... :/

Mesa returns buffer age of 0 for the fresh buffers correctly and after this the age is always 2 (double buffering) in my system (using dri3). I implemented alternative age calculation for dri3 buffers but it comes to the same result of 2 which I believe is correct. Right now I'm trying to understand the failure from test POV which is comparing the reference renderer images to the gles2 renderer. I'm using test "odd_clear_render_even_none" which is the 'simple case'.
Comment 10 Tapani Pälli 2017-01-02 12:08:22 UTC
I've noticed at least one thing that is causing issues with tests that use the type 'none' as one method. In those tests there are frames when nothing happens between 2 swapbuffer commmands, in these cases it looks like the backbuffer contents do not change (swap does not happen?). I will attach 2 images from apitrace where first one is regular one of the test and in 2nd one I'm making a empty clear (with scissor 0,0,0,0) in case nothing happens in that frame.
Comment 11 Tapani Pälli 2017-01-02 12:09:20 UTC
Created attachment 128709 [details]
apitrace snapshot for "odd_clear_render_even_none"
Comment 12 Tapani Pälli 2017-01-02 12:09:44 UTC
Created attachment 128710 [details]
apitrace snapshot for "odd_clear_render_even_none" with dummy clear
Comment 13 Tapani Pälli 2017-01-05 11:31:33 UTC
Created attachment 128769 [details] [review]
hack patch

Here's a hack patch to dEQP that makes all of these tests pass. I think I understand why bad stuff happens, however I'm not sure how to make things work correctly yet.
Comment 14 Tapani Pälli 2017-01-05 12:20:15 UTC
Created attachment 128770 [details] [review]
mesa fix

patch sent to mesa-dev, this fixes all failures for me and no regressions seen in CI
Comment 15 Tapani Pälli 2017-01-09 06:14:29 UTC
commit 8b43f4201129a5d11ebf314f9ae612289fd0994e
Author: Tapani Pälli <tapani.palli@intel.com>
Date:   Thu Jan 5 13:40:35 2017 +0200

    i965: call intel_prepare_render always when reading pixels
    
    Currently we do this only in the fallback code (when tiled memcpy
    version failed) but it needs to be done always so that we have
    correct read and write buffer in place. No regressions seen in CI.
    
    Fixes:
            dEQP-EGL.functional.buffer_age.*
    
    Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98330
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Chad Versace <chadversary@chromium.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.