Bug 95005 - Unreal engine demos segfault after shader compilation error with OpenGL 4.3
Summary: Unreal engine demos segfault after shader compilation error with OpenGL 4.3
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: glsl-compiler (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-18 22:26 UTC by Bas Nieuwenhuizen
Modified: 2016-06-01 03:35 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Shader (2.00 KB, text/plain)
2016-04-18 22:26 UTC, Bas Nieuwenhuizen
Details
Hack (1.15 KB, text/plain)
2016-04-18 22:35 UTC, Bas Nieuwenhuizen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bas Nieuwenhuizen 2016-04-18 22:26:12 UTC
Created attachment 123034 [details]
Shader

The attached shader is used by the Elemental Demo, extracted with apitrace. It gets the following compilation error:

0:9(31): error: struct 'TEXCOORD0' previously defined

After some discussion on #dri-devel, we think this is invalid GLSL. 

However, given the large number of games using the engine we might still want to support it. (TODO: is it really the engine or just multiple demos?)

I think adding a drirc quick is going to be infeasible given that the engine seems to be used by many games.
Comment 1 Bas Nieuwenhuizen 2016-04-18 22:35:36 UTC
Created attachment 123036 [details]
Hack

The attached patch allows the demos to not crash and render similar images to when GL4.3 is not enabled. However, the patch is not a proper fix.
Comment 2 Ian Romanick 2016-04-18 22:50:58 UTC
Has anyone check to see of glslang accepts this?
Comment 3 Bas Nieuwenhuizen 2016-04-18 22:56:45 UTC
glslang produces

Warning, version 430 is not yet complete; most version-specific features are present, but some are missing.
ERROR: 0:9: '' :  syntax error
ERROR: 1 compilation errors.  No code generated.

for me. (No flags, just glslangValidator elemental.geom)

No very informative, but it seems to refer to line 9, which was also the location of the mesa error.
Comment 4 Ian Romanick 2016-04-18 23:14:05 UTC
Hmm... I think the shader may be correct.  Section 4.1.8 (Structures) of the GLSL 1.40 and later specs say (emphasis mine):

    "...where name becomes the user-defined type, and can be used to declare
    variables to be of this new type. The name shares the same name space as
    other variables, types, and functions. *All previously visible variables,
    types, constructors, or functions with that name are hidden.*"

Based on this, the second TEXCOORD0 should "shadow" the first.

It's worth noting that the GLSL 1.30 spec says:

    "...where name becomes the user-defined type, and can be used to declare
    variables to be of this new type. The name shares the same name space as
    other variables, types, and functions, with the same scoping rules."

I have no recollection why this change was made.  I could find nothing in the 1.40 spec to explain the change.  I believe Mesa implements the GLSL 1.30 rule in all GLSL versions.

Curiously, all versions of GLSL ES contain the GLSL 1.40 language.
Comment 5 Bas Nieuwenhuizen 2016-04-18 23:36:02 UTC
That created confusion on IRC too. I think that it isn't allowed based on

"All variable names, structure type names, and function names in a given scope share the same name space. Function names can be redeclared in the same scope, with the same or different parameters, without error. An implicitly sized array can be re-declared in the same scope as an array of the same base type. Otherwise, within one compilation unit, a declared name cannot be redeclared in the same scope; doing so results in a redeclaration compile-time error."

(4.2 Scoping, GLSL 4.30 spec)

So I'm not entirely sure if you quoted patch conflicts with this, or that it should be interpreted as hiding things on declaration, but not specifiying when the declaration is allowed.
Comment 6 Christoph Haag 2016-04-20 21:02:56 UTC
Just FYI: the patch from comment 1 helps for the (very) old unreal demos that can be downloaded from the unreal wiki, but the current unreal engine has a similar (something with TEXCOORD0 again), but different problem.
For demos with the current engine I only know of the unreal tournament alpha that can be downloaded free of cost from here after registering at the forums:
https://www.epicgames.com/unrealtournament/forums/showthread.php?12011-Unreal-Tournament-Pre-Alpha-Playable-Build

Here is an apitrace, 78 megabytes:
http://haagch.frickel.club/files/UE4-Linux-Shipping.trace

apitrace replay shows this error message:

32283 @2 glLinkProgram(program = 3071)
32283: warning: link failed
32283: warning: error: vertex shader output `out_TEXCOORD0' declared as type `#anon_struct_0001', but geometry shader input declared as type `#anon_struct_0003[3]'
error: vertex shader output `out_TEXCOORD1' declared as type `#anon_struct_0002', but geometry shader input declared as type `#anon_struct_0004[3]'

32284: message: major api error 3: GL_INVALID_OPERATION in glUseProgram(program 3071 not linked)
32284 @2 glUseProgram(program = 3071)
Comment 7 Ilia Mirkin 2016-04-20 23:04:06 UTC
(In reply to Christoph Haag from comment #6)
> 32283 @2 glLinkProgram(program = 3071)
> 32283: warning: link failed
> 32283: warning: error: vertex shader output `out_TEXCOORD0' declared as type
> `#anon_struct_0001', but geometry shader input declared as type
> `#anon_struct_0003[3]'
> error: vertex shader output `out_TEXCOORD1' declared as type
> `#anon_struct_0002', but geometry shader input declared as type
> `#anon_struct_0004[3]'

----------------
GS:

#version 430 
#extension GL_ARB_separate_shader_objects : enable
#define INTERFACE_LOCATION(Pos) layout(location=Pos) 
#define INTERFACE_BLOCK(Pos, Interp, Modifiers, Semantic, PreType, PostType) layout(location=Pos) Interp Modifiers struct { PreType PostType; }


in gl_PerVertex
{
        vec4 gl_Position;
        float gl_ClipDistance[6];
} gl_in[];
out gl_PerVertex
{
        vec4 gl_Position;
        float gl_ClipDistance[6];
};

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;

INTERFACE_BLOCK(0, noperspective , in , TEXCOORD0, vec2, Data) in_TEXCOORD0[3];
INTERFACE_BLOCK(1, , in , TEXCOORD1, uint, Data) in_TEXCOORD1[3];
INTERFACE_BLOCK(0, noperspective , out , TEXCOORD0, vec2, Data) out_TEXCOORD0;
INTERFACE_BLOCK(1, flat , out , HLSLCC_LAYER_INDEX, uint, Data) out_HLSLCC_LAYER_INDEX;

---------------
VS:

#version 430 
#extension GL_ARB_separate_shader_objects : enable
#define INTERFACE_LOCATION(Pos) layout(location=Pos) 
#define INTERFACE_BLOCK(Pos, Interp, Modifiers, Semantic, PreType, PostType) layout(location=Pos) Interp Modifiers struct { PreType PostType; }


out gl_PerVertex
{
        vec4 gl_Position;
        float gl_ClipDistance[6];
};
uniform ivec4 vu_i[1];
uniform vec4 vu_h[1];
INTERFACE_LOCATION(0) in vec2 in_ATTRIBUTE0;
INTERFACE_LOCATION(1) in vec2 in_ATTRIBUTE1;
INTERFACE_BLOCK(0, noperspective , out , TEXCOORD0, vec2, Data) out_TEXCOORD0;
INTERFACE_BLOCK(1, , out , TEXCOORD1, uint, Data) out_TEXCOORD1;

---------------

So it looks like they dropped the "semantic" which was giving the named types and now just have anon types. Which I guess don't match up between programs for some reason? Probably due to some array bs... that seems like an unrelated issue though.
Comment 8 Ian Romanick 2016-05-10 18:30:56 UTC
This seems related to a bug that was reported via the mesa-dev mailing list back in 2013:

https://lists.freedesktop.org/archives/mesa-dev/2013-November/048843.html
Comment 9 Dave Airlie 2016-05-17 01:03:07 UTC
https://patchwork.freedesktop.org/patch/87530/

https://patchwork.freedesktop.org/patch/87532/

Proposed fixes for both bugs.
Comment 10 Mark Janes 2016-05-20 12:57:09 UTC
The fix for this bug (pushed on master) regresses a piglit test:

piglit.spec.glsl-1_50.compiler.interface-blocks-name-reused-globally-4.vert

Is the test wrong, or is there a real regression here? No dEQP or CTS tests were affected.
Comment 11 Mark Janes 2016-05-20 15:59:43 UTC
The piglit test does indeed show a regression.  Mesa crashes on a NULL dereference.  I've sent a patch to the list to guard against this case.
Comment 12 Shawn Starr 2016-05-22 04:27:10 UTC
neither patch fixes UE4 4.5.1 based games, get a dialog box (with nothing inside it) and crashes sadly.
Comment 13 Alexandre Demers 2016-05-26 05:22:13 UTC
(In reply to Shawn Starr from comment #12)
> neither patch fixes UE4 4.5.1 based games, get a dialog box (with nothing
> inside it) and crashes sadly.

Which patches are you refering to? You need the three of them for your test. If you are using latest mesa code, all three patches (1 from Mark and 2 from Ian) are included.
Comment 14 Alexandre Demers 2016-05-27 20:47:46 UTC
Since both patches were pushed (and the piglit test fixed) and since nouveau driver is now enabling OpenGL 4.3 by default (for nvc0), what's preventing us from exposing OpenGL 4.3 for radeonsi also? Both drivers are concerned by the pushed patches, isn't it?

Was somebody able to test against a UE4 4.5.1 based game with the latest git code since Shawn's comment?
Comment 15 Bas Nieuwenhuizen 2016-05-27 20:51:12 UTC
A radeonsi GL4.3 patch was just pushed.

As far as Shawn's comment goes I think he's referring to the ARK games, which still fail. Those also have a separate bug at https://bugs.freedesktop.org/show_bug.cgi?id=95374 .
Comment 16 Marek Olšák 2016-05-30 18:34:43 UTC
Can this bug be closed then?
Comment 17 Mike Lothian 2016-06-01 01:42:22 UTC
Running this on Skylake doesn't seem to render anything for me, I just see what was behind the window previously - the music plays.

On Tonga the demo loads but it seems some of the textures aren't working properly, it's really noticeable on the running lava which looks like grey squares glued together
Comment 18 Mike Lothian 2016-06-01 01:45:25 UTC
With Skylake I just wasn't letting it run long enough:

Using binned.
Increasing per-process limit of core file size to infinity.
- Existing per-process limit (soft=18446744073709551615, hard=18446744073709551615) is enough for us (need only 18446744073709551615)
intel_do_flush_locked failed: Input/output error
Signal 11 caught.
LowLevelFatalError [File:C:\UnrealEngine\4.5-src\Engine\Source\Runtime\RenderCore\Private\RenderResource.cpp] [Line: 118] 
A FRenderResource was deleted without being released first!
Signal 11 caught.
EngineCrashHandler: Signal=11
EngineCrashHandler: Signal=11
Segmentation fault (core dumped)

This was in the dmesg:

[11929.845840] [drm] GPU HANG: ecode 9:0:0x8ed9fff2, in RenderThread 1 [31514], reason: Engine(s) hung, action: reset
[11929.847998] drm/i915: Resetting chip after gpu hang
Comment 19 Ilia Mirkin 2016-06-01 01:47:51 UTC
(In reply to Mike Lothian from comment #17)
> Running this on Skylake doesn't seem to render anything for me, I just see
> what was behind the window previously - the music plays.

Odd. It used to work earlier on, but now I get one frame of rendering (same as when nouveau was avoiding writing to images with the wrong number of components declared), and then black and then it dies with

intel_batchbuffer.c:421: _intel_batchbuffer_flush: Assertion `!brw->no_batch_wrap' failed.

I think this might need jljusten's series among other things, not sure.

> 
> On Tonga the demo loads but it seems some of the textures aren't working
> properly, it's really noticeable on the running lava which looks like grey
> squares glued together

The grayish lava instead of orange/red is something I saw on nouveau as well.
Comment 20 Michel Dänzer 2016-06-01 03:35:35 UTC
(In reply to Ilia Mirkin from comment #19)
> The grayish lava instead of orange/red is something I saw on nouveau as well.

Could be a UE4 bug (could be one of the reasons why UE4 doesn't seem to use the 4.3 path with the proprietary drivers).

Anyway, the bug this report is about is clearly fixed, any remaining issues should be tracked separately.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.