Bug 97532

Summary: Regression: GLB 2.7 & Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable
Product: Mesa Reporter: Eero Tamminen <eero.t.tamminen>
Component: glsl-compilerAssignee: Ian Romanick <idr>
Status: VERIFIED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: tfiga
Version: git   
Hardware: Other   
OS: All   
See Also: https://github.com/glmark2/glmark2/issues/25
https://bugs.freedesktop.org/show_bug.cgi?id=97804
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 103491    
Attachments: patch
Patch for environment variable that turns error into warning
A patch that implements a drirc option

Description Eero Tamminen 2016-08-29 10:46:07 UTC
Following commit breaks all GLBenchmark 2.7 tests:
--------------------------------------------------
commit 259fc505454ea6a67aeacf6cdebf1398d9947759
Author:     Ian Romanick <ian.d.romanick@intel.com>
AuthorDate: Tue May 24 12:04:53 2016 -0700
Commit:     Ian Romanick <ian.d.romanick@intel.com>
CommitDate: Fri Aug 26 15:03:15 2016 -0700

    glsl/linker: Fail linking on ES if uniform precision qualifiers don't match
    
    When GL_OES_geometry_shader is enabled, this fixes
    dEQP-GLES31.functional.shaders.linkage.geometry.uniform.rules.type_mismatch_1.
    
    Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
    Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
--------------------------------------------------

Wit that commit in, GLBenchmark 2.7 segfaults to:
--------------------------------------------------
...
Log: glDiscardFramebufferEXT = 0x7ff44873b0a0
Log: error: declarations for uniform `fragmentColorVP` have mismatching precision qualifiers

Program received signal SIGSEGV, Segmentation fault.
0x0000000000482cf0 in cFont::Draw(float, float) ()
(gdb) bt
#0  0x0000000000482cf0 in cFont::Draw(float, float) ()
#1  0x000000000043233a in GLB::Application::DrawFont(char const*) ()
#2  0x00000000004336ad in GLB::Application::RenderLoadingScreen() ()
#3  0x000000000042e1ec in main ()
(gdb) disassemble
...
   0x0000000000482cd0 <+240>:	callq  0x42d940 <glUniform3fv@plt>
   0x0000000000482cd5 <+245>:	mov    0xa0(%rbx),%eax
   0x0000000000482cdb <+251>:	mov    0x2ae45e(%rip),%rbp        # 0x731140
   0x0000000000482ce2 <+258>:	mov    0x2ae15f(%rip),%r14        # 0x730e48
   0x0000000000482ce9 <+265>:	mov    0x2ae2a8(%rip),%r13        # 0x730f98
=> 0x0000000000482cf0 <+272>:	movb   $0x1,0x0(%rbp,%rax,2)
   0x0000000000482cf5 <+277>:	mov    0xa4(%rbx),%eax
...
(gdb) info registers 
rax            0xffffffff	4294967295
rbx            0x748188	7635336
rcx            0xb	11
rdx            0x7c5280	8147584
rsi            0x502	1282
rdi            0x7a1268	8000104
rbp            0x734d80	0x734d80 <GLB::OpenGLStateManager::m_desired_enabled_vertex_attrib_arrays>
--------------------------------------------------

With the patch reverted, GLB works fine.

Problem is a vertex/fragment shader pair compiled at GLB startup, which declares the same (above mentioned) uniform in both shaders.  Vertex shader uses high precision and fragment shader medium precision with GLES (typical case).

Vertex shader doesn't use the given uniform, so the error is bogus.  Errors like this should be handled only after optimizations, not before.

Erroring on dead code is nasty, warning on dead code would be OK in this case, but in general warnings on dead code are nasty too (e.g. bogus warnings about array accesses that are guarded by correct array size checks, like in some of the 4.x GCC versions).

Most of the GLB tests don't use that shader, and T-Rex test uses it only at startup.  T-Rex in GfxBench 3.1 GLES version doesn't have that shader, so GfxBench works fine.

Question: Is this linker error relevant/appropriate for GLES v2.x, or should it be generated only for v3.x?
Comment 1 Mark Janes 2016-08-31 15:40:49 UTC
My GLBenchmark 2.7 egypt/trex automation continues to provide results.  Based on this bug report, I would expect it to crash.

Can you provide the invocation of the crashing benchmark?  I'd like to reproduce.
Comment 2 Eero Tamminen 2016-08-31 15:55:38 UTC
For example:
./build_x86_64/binaries/GLBenchmark -data data -w 1366 -h 768 -t GLB27_TRex_C24Z16_FixedTimeStep

But basically anything should crash right at start when GLB 2.7 compiles all of its shaders.

Maybe you have somehow different version of GLB 2.7.  If you use MESA_SHADER_DUMP_PATH to get shaders out of GLB, do you see the indicated variable in shaders:
$ grep fragmentColorVP *
FS_d049e6b822a18cb092e02530736ad6229104e9bb.glsl:uniform vec3 fragmentColorVP;
FS_d049e6b822a18cb092e02530736ad6229104e9bb.glsl:	gl_FragColor = tex * vec4(fragmentColorVP, 1.0);
VS_51d5e7cdfb8adc088cc5592d75b01fea7bf0ee69.glsl:uniform vec3 fragmentColorVP;

?
Comment 3 Kenneth Graunke 2016-08-31 21:36:47 UTC
My copy of GLB 2.7 continues to work fine as well, both in GL and ES builds.  The shaders in my copy do not contain that code.
Comment 4 Tapani Pälli 2016-09-01 06:26:28 UTC
(In reply to Eero Tamminen from comment #0)
> Question: Is this linker error relevant/appropriate for GLES v2.x, or should
> it be generated only for v3.x?

GLSL ES 1.0.17 states that precision qualifiers for uniforms must match (in section 10 Issues) so looks appropriate.
Comment 5 Tapani Pälli 2016-09-01 08:39:22 UTC
(In reply to Tapani Pälli from comment #4)
> (In reply to Eero Tamminen from comment #0)
> > Question: Is this linker error relevant/appropriate for GLES v2.x, or should
> > it be generated only for v3.x?
> 
> GLSL ES 1.0.17 states that precision qualifiers for uniforms must match (in
> section 10 Issues) so looks appropriate.

having said that, when linking we should know if uniform was used at all by the earlier shader stage and could utilize this information to print only warning in this case .. should first verify if this would be ok by the spec and check other vendor implementations how they behave in such situation
Comment 6 Eero Tamminen 2016-09-01 09:41:11 UTC
(In reply to Kenneth Graunke from comment #3)
> My copy of GLB 2.7 continues to work fine as well, both in GL and ES builds.
> The shaders in my copy do not contain that code.

Because GfxBench T-Rex version doesn't contain that either, I guess your GLB 2.7 version is newer than the GLB 2.7 we're using.

GLB test-cases shaders contain huge amounts of dead code, so it's not surprise if some of that is removed in minor version updates (some other driver may have complained about this issue also).

If your GLB 2.7 is also an official version, we can update GLB 2.7.  I think the error is still a bug though.

(I was wondering why the dead uniform hadn't been removed before linking, but Tapani said any shader stage can define uniform location used in the shader stages.  I.e. I guess info for all declared uniforms is retained to linking stage in case other stages were missing location info.)
Comment 7 Tapani Pälli 2016-09-01 10:33:45 UTC
Created attachment 126156 [details] [review]
patch

Here's somewhat experimental patch to make it warning instead of error if existing uniform was not read/written in the validate globals check.
Comment 8 Kenneth Graunke 2016-09-02 01:15:12 UTC
You have to enforce language rules regardless of whether or not code is used.

Nobody would suggest that a C compiler should compile:

void main()
{
   int x = "lol, who needs rules?";
   return 0;
}

just because 'x' isn't used.  That's crazy.
Comment 9 Ian Romanick 2016-09-02 03:10:26 UTC
(In reply to Kenneth Graunke from comment #8)
> You have to enforce language rules regardless of whether or not code is used.
> 
> Nobody would suggest that a C compiler should compile:
> 
> void main()
> {
>    int x = "lol, who needs rules?";
>    return 0;
> }
> 
> just because 'x' isn't used.  That's crazy.

Except that's legal C. :)

Either way... I added that check specifically for a CTS test.  If, as Tapani says, the same text exists in earlier versions of GLSL ES, then we should keep the err.
Comment 10 Tapani Pälli 2016-09-02 04:54:17 UTC
(In reply to Kenneth Graunke from comment #8)
> You have to enforce language rules regardless of whether or not code is used.
> 
> Nobody would suggest that a C compiler should compile:
> 
> void main()
> {
>    int x = "lol, who needs rules?";
>    return 0;
> }
> 
> just because 'x' isn't used.  That's crazy.

agreed, I sent the patch only because spec language is a bit vague (surprise!), it says:

"If uniforms are used in both the vertex and fragment shaders, developers should be warned if the precisions are different. Conversion of precision should never be implicit.

... blah blah ..

RESOLUTION: Yes, precision qualifiers for uniforms must match."

so ... conversion of precision should never happen implicitly, and precision qualifiers _must_ match, however developers should be 'warned'
Comment 11 Eero Tamminen 2016-09-02 10:45:23 UTC
(In reply to Kenneth Graunke from comment #8)
> You have to enforce language rules regardless of whether or not code is used.
> 
> Nobody would suggest that a C compiler should compile:
> 
> void main()
> {
>    int x = "lol, who needs rules?";
>    return 0;
> }
> 
> just because 'x' isn't used.  That's crazy.

C-code variant of this would be:
----------- foo.c -----------
#include <stdint.h>
extern int32_t x;
----------- bar.c -----------
#include <stdint.h>

extern int64_t x;

void do_something_with_x(...)
{
...
}
-----------------------------

And question would be whether C-linker should warn that global variable 'x' has been declared with different type/size in different objects, because program where this library would be linked can only be providing one type.

Whether things work in above example, depends only on whether the int64_t type matches the (at this moment unknown) "external" type, because that's the only type that's actually used by the code.  Usage here (in above C-code, and Egypt shaders) isn't dependent on whether the code is optimized, because the other declared type is NOT used at all, it's just declared.  Spec talks only about mismatch in use, not definition.


One can say that linker doesn't know whether the reason why uniform isn't used in the linked code is result of shader stage optimization or not, i.e. is it in use in unoptimized code.  That's why it must definitely be warned about.

However, only the case where different types are actually in use, is the only case where it's certain that it's not going to work -> clearly an error where program should be stopped.


The reason why I don't think there should be an error about the first case is that it can easily be a regression on fully functional programs (like here), and those regressions will go unnoticed because Mesa's regular testing is mainly for GL, there's not much testing for GLES (WebGL etc).
Comment 12 Kenneth Graunke 2016-09-04 08:13:49 UTC
(In reply to Eero Tamminen from comment #11)
> The reason why I don't think there should be an error about the first case
> is that it can easily be a regression on fully functional programs (like
> here), and those regressions will go unnoticed because Mesa's regular
> testing is mainly for GL, there's not much testing for GLES (WebGL etc).

That's not true at all...we run dEQP and the ES conformance suite on every commit.  That's something like 90,000 GLES-only tests, which is definitely tests than we run for GL.

You have a broken application.  If the latest release works, you should upgrade.  If it doesn't work, we should contact Kishonti and ask them to fix it.  It's a simple fix and I believe they would be responsive.
Comment 13 Kenneth Graunke 2016-09-06 19:29:03 UTC
Eero is right, the GLES version of GLBenchmark breaks.  The GL version works fine.

I double checked the rules and Ian's patch is absolutely correct - the ES 3.2 spec says:

"Uniforms in shaders all share a single global name space when
 linked into a program or separable program. Hence, the types,
 precisions and any location specifiers of all declared uniform
 variables with the same name must match across shaders that
 are linked into a single program."

In addition to the uniform not actually being used in the vertex shader...neither declaration is actually qualified with a specific precision.  It's just using the default precision in both cases.  One thought would be to enforce the "must match" rule when explicitly qualified, and take the maximum if all declarations are inheriting a default precision.  (I'm trying to think if there's some weasily way this could actually be legal - it seems surprising that this app has been out for 3 years and no other ES vendor has hit this bug...)

Or perhaps we just need to add a driconf option to either resort to "take the max" or just force everything to highp (like in desktop GL).

Ian, thoughts?
Comment 14 Eero Tamminen 2016-09-07 09:52:32 UTC
(In reply to Kenneth Graunke from comment #12)
> That's not true at all...we run dEQP and the ES conformance suite on every
> commit.  That's something like 90,000 GLES-only tests, which is definitely
> tests than we run for GL.

Sorry, I was unclear.  I wasn't thinking of API & unit test-cases. Such tests are naturally maintained & fixed (Mesa commit breaking GLB was done to fix conformance test).

By "use-cases" I meant real world GLES apps, games and benchmarks, things that users may be using years after buying them and their maintenance having stopped. E.g. stuff in Android & ChromeOS app stores that's still sold but not anymore maintained, or something like GLB 2.7.


(In reply to Kenneth Graunke from comment #13)
> Eero is right, the GLES version of GLBenchmark breaks.  The GL version works
> fine.
> 
> I double checked the rules and Ian's patch is absolutely correct - the ES
> 3.2 spec says:
> 
> "Uniforms in shaders all share a single global name space when
>  linked into a program or separable program. Hence, the types,
>  precisions and any location specifiers of all declared uniform
>  variables with the same name must match across shaders that
>  are linked into a single program."

3.1 & 3.2 specs are explicit that declarations must match.  3.0 was a bit more vague whether this applies to non-active uniform declarations.  2.0 spec doesn't say anything about matching, just a bit about whether uniforms are active or not.


[...]
> - it seems surprising that this app has been out for 3 years and no
> other ES vendor has hit this bug...)

GLB is GLES 2.x, and the matching requirement is only in GLES 3.x spec.


> Or perhaps we just need to add a driconf option to either resort to "take
> the max" or just force everything to highp (like in desktop GL).

Or give warnings for apps requesting GLES 2.x context, when both of the compared stage's uniforms aren't used, and error only when both are in use, or context is GLES 3.x+?

Tricky thing here is that AFAIK Mesa gives GLES 3.x context also for apps requesting 2.x one (which also causes them to be much slower when they use 1x1 cube maps, due to GLES 3.x sampling rule about cube map corners).
Comment 15 Eero Tamminen 2016-09-07 13:32:49 UTC
Glmark2 GLES version in latest Ubuntu (16.04) also segfaults to this in following subtests:
- ideas (modelview uniform declared also in FS but not used)
- jellyfish (uCurrentTime uniform used both in VS & FS)
- terrain (viewMatrix uniform declared also in FS but not used)

For example:
-------------------------
$ apt-get install glmark2-es2
$ glmark2-es2 -b terrain
...
Error: Failed to link program created from files None and None:  error: declarations for uniform `viewMatrix` have mismatching precision qualifiers
Segmentation fault (core dumped)
-------------------------

Like GLB, it's GLES v2 program.

Ideas & terrain can be run by fixing Tapani's patch to check whether uniform is unused in either of the stages (not just VS), but jellyfish test is using the uniform in both stages with different precision, so it has a real bug (which might show on backends that implement FP16 support).


Glmark2 is open source so it can be (eventually) fixed:
  https://github.com/glmark2/glmark2

But I don't think this and GLB are the only GLES 2.x programs with the problem.
Comment 16 Eero Tamminen 2016-09-13 15:42:06 UTC
Filed bug against glmark2 and checked the conformance test suite.

CTS tests check that also unused uniforms precision check fails and has check also for GLSL_VERSION_100_ES.  I.e. giving just warning on unused declarations, or with GLES <v3 won't fly.

I guess this needs another drirc/env setting, which is specified in drirc e.g. for "GLBenchmark" and "glmark2-es2" binaries.  My proposal for setting name would be "linker_precision_check", which defaults to true.
Comment 17 Eero Tamminen 2016-09-15 08:33:33 UTC
Upstream fixed glmark2-es2 test-suite shader issues.  Jellyfish one turned out to be bug in Mesa precision handling (see bug 97804).
Comment 18 Eero Tamminen 2016-09-16 16:05:28 UTC
Created attachment 126582 [details]
Patch for environment variable that turns error into warning

I experimented with adding drirc option for this, but had no good solution for that [1].  So I opted for just adding an environment variable.  Does that seem OK?


[1] Compiler part of Mesa didn't have any drirc stuff yet, so it would need drirc parsing somewhere higher, then either adding member to shader struct for the option, or propagating options stuff along function calls. Or making some drirc options global available.

I have working patch also of last "solution", but it's really awful (couldn't decide where options are initially read/cached, so I stuck it into library constructor, but it seems that xmlconfig.c gets included into a lot libs generated by Mesa...).  I can provide it for entertainment value if requested.
Comment 19 Eero Tamminen 2016-11-02 15:32:13 UTC
With Mesa bug 97804 and glmark2 upstream code fixed, GLB 2.7 is the only currently known test-suite/program this affects on *Linux desktop* [1].

Kenneth marked this early on as "NOTOURBUG" which is valid, but not necessarily best option.  GLB 2.7 is still used (despite its problems, it's probably most widely used GLES 2.x benchmark) and Mesa now enforcing GLES 3.x error for harmless & earlier accepted inconsistency in GLES 2.0 programs, is a functionality regression (although Intel backend doesn't yet even support FP16).

Kenneth, I'm opening this to get (your or somebody else) opinion on adding user workaround for this, e.g. the attached trivial patch.


[1] Linux desktop has only handful of GLES programs (and most of them compositors i.e. not really using complex shaders).  There could be other regressing cases on ChromeOS (WebGL www-pages) or Android.
Comment 20 Mark Janes 2016-11-04 03:40:37 UTC
Eero, we talked about this bug in our weekly meeting today, and agreed that it should be fixed as soon as possible, and put into the 13.0.1 release.  Everyone agreed that your fix was the right thing to do, but that it should be put behind a drirc option, so regular users benefit without knowing about an environment variable.

If you can put your patch behind a drirc option, then it will be reviewed and merged.  If you don't want to do that, we can assign it to someone else.  You've put a lot of work into this bug, and I'd like for you to get the credit for it in the git history.
Comment 21 Eero Tamminen 2016-11-04 08:55:25 UTC
(In reply to Mark Janes from comment #20)
> Eero, we talked about this bug in our weekly meeting today, and agreed that
> it should be fixed as soon as possible, and put into the 13.0.1 release. 
> Everyone agreed that your fix was the right thing to do, but that it should
> be put behind a drirc option, so regular users benefit without knowing about
> an environment variable.
>
> If you can put your patch behind a drirc option, then it will be reviewed
> and merged.  If you don't want to do that, we can assign it to someone else.

I think that should be done by somebody who has better opinion on what's best place in the compiler side to initialize drirc stuff, and what's the best place to pass the info from there to the linker, than me.

Even better would be refactoring all the separate places in Mesa where drirc is initialized sp that it's done only in one place, but that would be a *lot* more work (Martin had some initial patches for that from long ago, but he gave up on it)...
Comment 22 Kenneth Graunke 2016-11-08 23:19:18 UTC
Just discovered another reason I didn't notice this bug: this shader is only used for the loading screen.  If you run it with -skip_load_frames, it works fine.
Comment 23 Kenneth Graunke 2016-11-09 00:00:19 UTC
Created attachment 127853 [details] [review]
A patch that implements a drirc option

Here's a patch that implements a drirc option.
Comment 24 Ian Romanick 2016-11-09 00:18:05 UTC
We implement the specification.  In cases where the specification does not match the reality of shipping applications and shipping implementations, we work to change the specification.

The only application that does not work is one that has been abandoned by the developer because it was replaced by a newer version that does work.  Moreover, there is a workaround to allow the older version to work.

(In reply to Eero Tamminen from comment #19)
> [1] Linux desktop has only handful of GLES programs (and most of them
> compositors i.e. not really using complex shaders).  There could be other
> regressing cases on ChromeOS (WebGL www-pages) or Android.

I suspect this will not be a problem for WebGL.  WebGL conformance is even more picky about such restrictions that the native API CTS.

There have been no reports of Android applications failing due to this change.  Should there be any problems with Android applications, they will be handled on a case-by-case basis.
Comment 25 Eero Tamminen 2016-11-09 14:40:00 UTC
(In reply to Kenneth Graunke from comment #22)
> Just discovered another reason I didn't notice this bug: this shader is only
> used for the loading screen.  If you run it with -skip_load_frames, it works
> fine.

Thanks, I didn't know about that, it indeed works fine.


(In reply to Kenneth Graunke from comment #23)
> Created attachment 127853 [details] [review] [review]
> A patch that implements a drirc option
> 
> Here's a patch that implements a drirc option.

Thanks, this works fine too, and the patch is much smaller than I expected!

Comment added at the end of patch would need to be updated before commiting.  

This same override would be needed also for currently available glmark2-es2 releases (fixes for the bugs are only in glmark2 git repo).


(In reply to Ian Romanick from comment #24)
> (In reply to Eero Tamminen from comment #19)
> > [1] Linux desktop has only handful of GLES programs (and most of them
> > compositors i.e. not really using complex shaders).  There could be other
> > regressing cases on ChromeOS (WebGL www-pages) or Android.
> 
> I suspect this will not be a problem for WebGL.  WebGL conformance is even
> more picky about such restrictions that the native API CTS.
>
> There have been no reports of Android applications failing due to this
> change.  Should there be any problems with Android applications, they will
> be handled on a case-by-case basis.

With the GLB workaround option available, this is fine for me.  If there are other buggy proprietary apps, Kenneth's patch can be applied. -> Verified

---
We just don't know whether there are other problematic apps:
* Different WebGL scripts on random www-pages aren't conformance tested.  They can have this bug if they're developed against Browsers & 3D driver versions that haven't (had) this check
* AFAIK there aren't Android devices shipping with Mesa 13.0 (which is the first version with this issue), so there cannot be any reports of issues yet
* I didn't find (by googling) any reports of GLB 2.x failing with any other driver (and it works fine with GLES version of Intel Windows driver)
* Glmark2 had the same issue and nobody had noticed anything on Linux or Android with other drivers (at least I was the first one to file a bug about it)
Comment 26 Tapani Pälli 2017-02-03 10:35:08 UTC
this issue is hit by 'Gears4Android', if we will have some more interesting apps hitting it then we will likely implement workaround like in comment #23
Comment 27 Mauro Rossi 2017-07-17 21:22:10 UTC
Hi,

several Android apps are affected besides Gears,
as examples: Rajawali and Dragonball Z Dokkan Battle
featuring black screens like Gears, because of linker_error()

As a Proof of Concept linker_error() was replaced by linker_warning()
and Rajawali and Dragonball Z Dokkan Battle are working again.

For Android drirc option is unpractical

Mauro
Comment 28 Eero Tamminen 2017-08-14 14:48:32 UTC
(In reply to Mauro Rossi from comment #27)
> several Android apps are affected besides Gears,
> as examples: Rajawali and Dragonball Z Dokkan Battle
> featuring black screens like Gears, because of linker_error()
> 
> As a Proof of Concept linker_error() was replaced by linker_warning()
> and Rajawali and Dragonball Z Dokkan Battle are working again.

Was it enough to replace error with warning only in the case where the mismatching variable declarations were unused, or did you need to disable error also for variables that were actually used?
Comment 29 Tomasz Figa 2017-08-31 08:22:34 UTC
This is also affecting Forge of Empires on Android. With Kenneth's patch and the drirc option enabled for <application name="Default"> the game starts working again.
Comment 30 Eero Tamminen 2017-08-31 09:36:17 UTC
(In reply to Tomasz Figa from comment #29)
> This is also affecting Forge of Empires on Android. With Kenneth's patch and
> the drirc option enabled for <application name="Default"> the game starts
> working again.

IMHO it would be good the check whether the problem is just with a dead variable, or is there a real functional issue with the shader.

(If other drivers have accepted latter, they're pretty broken too...)

If env vars work with Android, you can do that either by using my original patch:
https://bugs.freedesktop.org/attachment.cgi?id=126582

Otherwise you could add the check from it to Kenneth's:
   if ((existing->data.used && var->data.used) 
   -> still error
   else
   -> warning

Or dump & inspect the shader code directly.
Comment 31 Tomasz Figa 2017-08-31 12:05:41 UTC
(In reply to Eero Tamminen from comment #30)
> (In reply to Tomasz Figa from comment #29)
> > This is also affecting Forge of Empires on Android. With Kenneth's patch and
> > the drirc option enabled for <application name="Default"> the game starts
> > working again.
> 
> IMHO it would be good the check whether the problem is just with a dead
> variable, or is there a real functional issue with the shader.
> 
[...]
> 
> Or dump & inspect the shader code directly.

I have a full trace captured with GAPID, which includes shaders code. Can't share here due to potential legal issues, but I can share my observations:

- vertex shader starts with "precision highp float",
- fragment shader starts with "precision mediump float",
- both shaders define "uniform mat4 M1" after that,
- both shaders define "uniform mat4 M2" after that,
- "M1" is not used in either of the shaders,
- "M2" is used only in vertex shader,
- the link fails with "error: declarations for uniform `M1` have mismatching precision qualifiers"
Comment 32 Tomasz Figa 2017-08-31 12:10:54 UTC
(In reply to Tomasz Figa from comment #31)
> (In reply to Eero Tamminen from comment #30)
> > (In reply to Tomasz Figa from comment #29)
> > > This is also affecting Forge of Empires on Android. With Kenneth's patch and
> > > the drirc option enabled for <application name="Default"> the game starts
> > > working again.
> > 
> > IMHO it would be good the check whether the problem is just with a dead
> > variable, or is there a real functional issue with the shader.
> > 
> [...]
> > 
> > Or dump & inspect the shader code directly.
> 
> I have a full trace captured with GAPID, which includes shaders code. Can't
> share here due to potential legal issues, but I can share my observations:
> 
> - vertex shader starts with "precision highp float",
> - fragment shader starts with "precision mediump float",
> - both shaders define "uniform mat4 M1" after that,
> - both shaders define "uniform mat4 M2" after that,
> - "M1" is not used in either of the shaders,
> - "M2" is used only in vertex shader,
> - the link fails with "error: declarations for uniform `M1` have mismatching
> precision qualifiers"

Replying myself, looks like the application is buggy or we are misunderstanding the spec (my understanding is exactly as found earlier in thhis bug).
Comment 33 Eero Tamminen 2017-08-31 13:33:10 UTC
Ok, so it's the same case as with GLB (and git version of fixed Glmark), the shaders are violating the spec, but it's not a functionality problem because variables in question are just declared with conflicting precision, but not used with conflicting precision.

It would be interesting to know whether the other drivers that allow this behavior, would fail when variables with mismatched precision are actually used across shader stages.

And does that behavior change depending on whether application is using GLES 2.x or GLES 3.x context (GLES 2.x spec didn't require this check, it came afterwards, but Mesa promotes all GLES 2.x contexts to 3.x).
Comment 34 Tomasz Figa 2017-09-10 03:25:26 UTC
The case seems to be developing further and it turns out some versions of GLES implementation for ARM Mali also prohibits such behavior:

https://github.com/cocos2d/cocos2d-x/issues/17256

It's likely that the engine gets fixed, but I wonder if app developers upgrade their apps, though.
Comment 35 Tomasz Figa 2017-09-10 03:51:32 UTC
By the way, it is not true that GLES 2.x didn't require this check, at least according to following thread:

https://github.com/AnalyticalGraphicsInc/cesium/issues/817

It refers to GLSL ES 1.00 specification (http://www.khronos.org/registry/OpenGL/specs/es/2.0/GLSL_ES_Specification_1.00.pdf) and quotes the following part:

> Do precision qualifiers for uniforms need to match?
> Option 1: Yes.
> Uniforms are defined to behave as if they are using the same storage in the
> vertex and fragment processors and may be implemented this way.
> If uniforms are used in both the vertex and fragment shaders, developers 
> should be warned if the precisions are different. Conversion of precision 
> should never be implicit.
> Option 2: No.
> Uniforms may be used by both shaders but the same precision may not be 
> available in both so there is a justification for allowing them to be 
> different.
> Using the same uniform in the vertex and fragment shaders will always require 
> the precision to be specified in the vertex shader (since the default 
> precision is highp). This is an unnecessary burden on developers.
> RESOLUTION: Yes, precision qualifiers for uniforms must match.
Comment 36 Eero Tamminen 2017-09-11 07:52:15 UTC
(In reply to Tomasz Figa from comment #35)
> By the way, it is not true that GLES 2.x didn't require this check, at least
> according to following thread:
> 
> https://github.com/AnalyticalGraphicsInc/cesium/issues/817
> 
> It refers to GLSL ES 1.00 specification
> (http://www.khronos.org/registry/OpenGL/specs/es/2.0/GLSL_ES_Specification_1.
> 00.pdf) and quotes the following part:
[...]

If you read it more carefully, it talks only about *used* uniforms.

GLES 3.x is more explicit and requires that *declared* uniforms have to match.


Different drivers could interpret *used* in different ways, as the match requirement being for uniforms that are:
* declared (same as GLES 3.x)
* referred after post-processing, or
* linked after all optimizations are done

(Last one is the only state that can cause real errors, but checking also declarations makes things easier for developers, things behave more consistently between drivers and different compiler versions.)

That's why I asked whether the problematic Android games actually use the uniforms or not.
Comment 37 Tomasz Figa 2017-09-11 07:56:26 UTC
Good point. I was an obvious example of interpreting "used" as "declared".
Comment 38 Ian Romanick 2017-09-25 21:00:25 UTC
(In reply to Eero Tamminen from comment #36)
> (In reply to Tomasz Figa from comment #35)
> > By the way, it is not true that GLES 2.x didn't require this check, at least
> > according to following thread:
> > 
> > https://github.com/AnalyticalGraphicsInc/cesium/issues/817
> > 
> > It refers to GLSL ES 1.00 specification
> > (http://www.khronos.org/registry/OpenGL/specs/es/2.0/GLSL_ES_Specification_1.
> > 00.pdf) and quotes the following part:
> [...]
> 
> If you read it more carefully, it talks only about *used* uniforms.
> 
> GLES 3.x is more explicit and requires that *declared* uniforms have to
> match.
> 
> 
> Different drivers could interpret *used* in different ways, as the match
> requirement being for uniforms that are:
> * declared (same as GLES 3.x)
> * referred after post-processing, or
> * linked after all optimizations are done

For the most part, the GLSL solves this problem by requiring things that are statically referenced to match.  This means rules are applied to some things that may be unnecessary, but it avoids the problem of depending on how hard a particular compiler will optimize things.

> (Last one is the only state that can cause real errors, but checking also
> declarations makes things easier for developers, things behave more
> consistently between drivers and different compiler versions.)
> 
> That's why I asked whether the problematic Android games actually use the
> uniforms or not.
Comment 39 Kenneth Graunke 2017-09-28 21:46:05 UTC
It sounds like we're going to take Tomasz's patch to relax this in GLSL ES 1.00.  We're not certain what the right behavior is for ES 3.00 / 3.10 / 3.20.  Ian is looking into this.
Comment 40 Mark Janes 2017-11-06 15:58:54 UTC
I could not tell if any fix for this bug has been made to master, let alone the stable branches.
Comment 41 Tomasz Figa 2017-11-07 04:03:45 UTC
Looks like Kenneth just pushed the fix:

https://cgit.freedesktop.org/mesa/mesa/commit/src/compiler/glsl?id=0886be093fb871b0b6169718277e0f4d18df3ea7
Comment 42 Eero Tamminen 2017-11-07 17:17:43 UTC
All my use-cases work nowadays without this fix.  -> Could somebody from the Android side (where the fix is still needed) verify this instead?
Comment 43 Tomasz Figa 2017-12-19 04:44:05 UTC
Verified that Android games that used to suffer from this problem are fine on 17.3 branch.
Comment 44 Eero Tamminen 2017-12-19 08:53:28 UTC
Thanks, marking the bug as verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.