Bug 67934 - [SNB/IVB/HSW 9.2 Bisected]Ogles2conform/GL2Tests/glUniform/glUniform.test fails with gnome-session enable compositing
[SNB/IVB/HSW 9.2 Bisected]Ogles2conform/GL2Tests/glUniform/glUniform.test fai...
Status: VERIFIED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
9.2
All Linux (All)
: high major
Assigned To: Chad Versace
:
Depends on:
Blocks: 67224
  Show dependency treegraph
 
Reported: 2013-08-09 06:03 UTC by lu hua
Modified: 2013-08-26 05:49 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
output (22.58 KB, text/plain)
2013-08-14 08:24 UTC, lu hua
Details
Test logs on Ivybridge (72.87 KB, text/plain)
2013-08-17 00:31 UTC, Anuj Phogat
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lu hua 2013-08-09 06:03:30 UTC
System Environment:
--------------------------
Arch:             x86_64
Platform:         Sandybridge
Libdrm:		(master)libdrm-2.4.46-26-g3c967e715528ee52195c178c4d09d03b643f0c06
Mesa:		(9.2)10ff10c89e8a3d65c9c97564e010884ac8610ca5
Xserver:	(server-1.13-branch)xorg-server-1.13.4
Xf86_video_intel:(master)2.21.14-6-gc01c66bca2c64ae2d77233b6ccdca26431ee51b8
Cairo:		(master)46d9db96d460fea72f0420102e8a90c6a7231f79
Libva:		(master)d2dbc3f69c69e5933e7b3da429c0fb0cae5b98b0
Libva_intel_driver:(master)8bf807539c1790d6eee531373131672d38c82b31
Kernel:	(drm-intel-fixes) b250da79a0c972ef7f6d58ebd1083cab066e6c82

Bug detailed description:
-------------------------
It fails on sandybridge with gnome-session enable compositing. It happens on mesa 9.2 branch and works well on master branch.
Following cases also has this issue:
GL2ExtensionTests_data_type_10_10_10_2_data_type_10_10_10_2.test
GL3Tests_transform_feedback2_transform_feedback2_framebuffer.test
GL3Tests_transform_feedback_transform_feedback_misc.test

Bisect shows: 9aeb967e75f01afe6df8f0033d129243279a8cd2 is the first bad commit.
commit 9aeb967e75f01afe6df8f0033d129243279a8cd2
Author:     Ian Romanick <ian.d.romanick@intel.com>
AuthorDate: Thu Jul 18 17:38:16 2013 -0700
Commit:     Ian Romanick <ian.d.romanick@intel.com>
CommitDate: Tue Aug 6 12:19:40 2013 -0700

    mesa: Treat glBindRenderbuffer and glBindRenderbufferEXT correctly

    Allow user-generated names for glBindRenderbufferEXT on desktop GL.
    Disallow its use altogether for core profiles.

    v2: Disallow glBindRenderbufferEXT in 3.1 by not installing it in the
    dispatch table.  Suggested by Jordan.

    Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
    Cc: mesa-stable@lists.freedesktop.org
    (cherry picked from commit 97965e87fc0771a99c16b639caed01e5d0b64353)

Reproduce steps:
----------------
1. xinit
2. gnome-session
3. ./GTF -width=64 -height=64 -run=GL2Tests/glUniform/glUniform.test
Comment 1 lu hua 2013-08-14 01:59:33 UTC
It also happens on ivybridge and haswell.
Comment 2 Ian Romanick 2013-08-14 02:41:23 UTC
This patch exists on master and 9.2.  I don't think the bisect is correct.
Comment 3 lu hua 2013-08-14 08:24:30 UTC
Bisect it again.
a3f48d97cd0ea679f35e39176c4d8b960151e627 is the first bad commit
commit a3f48d97cd0ea679f35e39176c4d8b960151e627
Author:     Eric Anholt <eric@anholt.net>
AuthorDate: Fri Jun 21 15:34:52 2013 -0700
Commit:     Ian Romanick <ian.d.romanick@intel.com>
CommitDate: Mon Aug 5 17:07:15 2013 -0700

    egl: Restore "bogus" DRI2 invalidate event code.

    I had removed it in commit 1e7776ca2bc59a6978d9b933d23852d47078dfa8
    because it was obviously wrong -- why do we care whether the server is a
    version that emits events, if we're not watching for the server's events,
    anyway?  And why would you only invalidate on a server that emits
    invalidate events, when the comment said to emit invalidates if the server
    *doesn't*?  Only, I missed that we otherwise don't flag that our buffers
    might have changed at swap time at all, so the driver was only checking
    for new buffers when triggered by the Viewport hack.  Of course you don't
    expect Viewport to be called after a swap.

    So, this is effectively a revert of the previous commit, except that I
    dropped the check for only emitting invalidates on a new server -- we
    *always* need to invalidate if we're doing a SwapBuffers.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63435
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
    Cc: "9.1 and 9.2" <mesa-stable@lists.freedesktop.org>
    (cherry picked from commit eed0a80137dfac641adfd39ce316938dbcf2be10)
Comment 4 lu hua 2013-08-14 08:24:51 UTC
Created attachment 84044 [details]
output
Comment 5 Anuj Phogat 2013-08-17 00:31:12 UTC
Created attachment 84159 [details]
Test logs on Ivybridge

Reproduced the failure on Ivybridge. Test passes most of the times when run with -minfmt option. I also confirm that the observed failure is triggered by the bisected commit (a3f48d9). I have no understanding of dri2 related changes in bisected commit. It seems like running the test continuously without much delay triggers the failure. Running test without -minfmt exactly does that.
Comment 6 Chad Versace 2013-08-17 01:39:02 UTC
I have verified Lu Hua's bisect to

commit a3f48d97cd0ea679f35e39176c4d8b960151e627
Author:     Eric Anholt <eric@anholt.net>
AuthorDate: Fri Jun 21 15:34:52 2013 -0700
Commit:     Ian Romanick <ian.d.romanick@intel.com>
CommitDate: Mon Aug 5 17:07:15 2013 -0700

    egl: Restore "bogus" DRI2 invalidate event code.

using 

   cmd: ./GTF -width=64 -height=64 -run=GL2Tests/glUniform/glUniform.test
   haswell
   archlinux x86-64
   xorg-server-1.14.2-2
   gnome-shell-3.8.4-1 (with compositing enabled)

The test fails intermittently with a failure rate of approximately 5%.

I observed no failures when replacing gnome-shell with either
  - openbox, no compositing
  - openbox, with xcompmgr as compositor

Between commits during the bisect, I installed a proper Mesa package and restarted X, to ensure that the X server and GTF were using the same Mesa.

Paul is the resident expert on inconsistent test failures due to DRI2 bugs. Maybe he has a clue, because I sure don't.

Paul, I recall that you submitted DRI2 fixes to the X server. Are your fixes present in xorg-server-1.14.2? Are your fixes relevant to this bug?
Comment 7 Chad Versace 2013-08-17 01:41:30 UTC
Paul, this bug bisects to a commit on the 9.2 branch that alters DRI2 behavior. And, you are our resident expert on erratic test failure due to DRI2 bugs.

I recall that you submitted DRI2 fixes to the X server. Are your fixes present in xorg-server-1.14.2? Are your fixes relevant to this bug, or the bisected commit?
Comment 8 Paul Berry 2013-08-17 16:16:32 UTC
(In reply to comment #7)
> Paul, this bug bisects to a commit on the 9.2 branch that alters DRI2
> behavior. And, you are our resident expert on erratic test failure due to
> DRI2 bugs.
> 
> I recall that you submitted DRI2 fixes to the X server. Are your fixes
> present in xorg-server-1.14.2? Are your fixes relevant to this bug, or the
> bisected commit?

It was actually Eric who submitted fixes to the X server, and his fixes landed (see xserver commit 77e51d5bbb97eb5c9d9dbff9a7c44d7e53620e68).  I don't know if that commit is relevant or not.

Here's what I discovered in my testing this morning:

- Checking out master and reverting a3f48d97cd0ea679f35e39176c4d8b960151e627 causes Mesa to avoid querying the X server for new buffers after a swap.  For some reason this makes the problem go away (although it's definitely not a solution--that commit is necessary!).

- If I run the test with "-imagefileio" and look at  GL2Tests/glUniform/glUniformMatrix4fv_2loc_array_of_mat4_first_loc_no_transpose_second_loc_no_transpose_vert_64x64.{good,output,diff}.ppm I can see that the quad that was rendered looks fine, but the background of the image (the clear color) shows up different.

- When drawing both the "good" and "output" images, the clear color is set to (0.1, 0.2, 0.3).  However, the clear color that actually gets drawn in the "good" case is (0.349, 0.482, 0.584).  This is what you would get if you sRGB encoded (0.1, 0.2, 0.3).

- Disabling fast color clears has no effect on the problem.

- Disabling the blorp color clear path makes the problem go away.


From this I'm inferring that:

(a) When Mesa receives a new buffer from the X server, somehow that is causing some piece of sRGB state to change.

(b) The blorp color clear path uses that state to decide whether to sRGB encode the clear color.  The meta path doesn't.


I still haven't figured out:

- Which piece of sRGB state is changing?

- Should it be changing or not?

- Should color clears use that piece of state to decide whether to sRGB encode the clear color?
Comment 9 Chad Versace 2013-08-21 01:11:59 UTC
I solved the bug.

The first call to glClear believes the buffer is SARGB. If the X server later re-allocates the color buffer, then later calls to glClear believe the buffer is ARGB.

The driver changes the renderbuffer's format from SARGB to ARGB during the first call to eglMakeCurrent. This works around a quirk in GLES3. The bug rootcause is that the driver applies the workaround too late, *after* the renderbuffer's miptree is already allocated.

The fix is to move the workaround (intel_gles3_srgb_workaround) up two lines, before the call to intel_prepare_render.

Patches will arrive on mesa-dev after they get regression tested.
Comment 10 Chad Versace 2013-08-22 17:59:50 UTC
Fixed by commit on master, and will be cherry-picked to 9.2.

commit ce8639a766d0c36e676eea6f55135d9dccf1cb90
Author: Chad Versace <chad.versace@linux.intel.com>
Date:   Tue Aug 20 17:36:24 2013 -0700

    i965: Fix misapplication of gles3 srgb workaround
Comment 11 lu hua 2013-08-26 05:49:01 UTC
Verified.Fixed.