Bug 93840 - [i965] Compiler backend uses too much stack with Alien: Isolation
Summary: [i965] Compiler backend uses too much stack with Alien: Isolation
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 77449
  Show dependency treegraph
 
Reported: 2016-01-24 15:55 UTC by Darius Spitznagel
Modified: 2019-07-04 18:00 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Alien: Isolation GDB log (226.78 KB, text/plain)
2016-02-14 14:00 UTC, Darius Spitznagel
Details
[PATCH] i965/fs: Allow spilling for SIMD16 compute shaders (2.88 KB, patch)
2016-02-22 20:31 UTC, Jordan Justen
Details | Splinter Review
Alien: Isolation GDB log with applied patch (224.97 KB, text/plain)
2016-02-24 21:18 UTC, Darius Spitznagel
Details
patch (1.60 KB, patch)
2016-08-30 19:35 UTC, Matt Turner
Details | Splinter Review
AI shaders (deleted)
2016-08-31 19:39 UTC, Darius Spitznagel
Details

Description Darius Spitznagel 2016-01-24 15:55:50 UTC
I tested this game with my Iris Pro 5200 (i965) and mesa master as GL_ARB_compute_shader for Intel is ready there.

But the game quits right before the 20th Century Fox video should play.
When I disable compute_shader (MESA_EXTENSION_OVERRIDE=-GL_ARB_compute_shader) I can get into the game but all I see are the legs from Ripley, the rest is totally dark. Switching SSAO doesn't help.

I also did an apitrace with disabled compute_shader and can confirm, that the game requests compute_shader right before the start of the 20th Century Fox video.

I have posted this issue also here...
https://bugs.freedesktop.org/show_bug.cgi?id=93144
But as suggested by Alexandre Demers I created a new report for i965.

Maybe the game also needs GL_ARB_stencil_texturing which is ready for Intel gen8 GPUs but not gen7 (like mine).
I write this because Shadow of Mordor also uses compute_shader but this games works great on Intel gen7 and mesa master.
Comment 1 Darius Spitznagel 2016-02-14 14:00:01 UTC
Created attachment 121746 [details]
Alien: Isolation GDB log
Comment 2 Darius Spitznagel 2016-02-14 14:01:26 UTC
I have create a backtrace with gdb.
Hope I did it right and it helps.
On every stop I did "bt" and "bt full" then "cont" until the program terminated.

Debugging Symbols installed:
libgbm1-dbg
libgl1-mesa-dri-dbg
libgl1-mesa-glx-dbg
libglapi-mesa-dbg

Environment:
export MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430

Mesa:
darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.2.0-devel (git-a4cff18)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 11.2.0-devel (git-a4cff18)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.2.0-devel (git-a4cff18)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
Comment 3 Darius Spitznagel 2016-02-19 23:57:45 UTC
Hello devs,

common guys,
I know you all (also me) have plenty do do, but I wish that someone could have a look into this.
I think if more games (apps) run on foss drivers, the better for all.
I really like to help you how much I can, but without any answer my hands are bound.

Kind regards
Darius
Comment 4 Jordan Justen 2016-02-22 20:31:41 UTC
Created attachment 121900 [details] [review]
[PATCH] i965/fs: Allow spilling for SIMD16 compute shaders

Does the attached patch help with the crash?

Note, the one compute shader program I looked at took
about 20 seconds to generate a program for, so startup
may take a while.

Additionally it had a fair amount of register spilling,
and therefore the performance may not be very good.
Comment 5 Darius Spitznagel 2016-02-24 17:09:29 UTC
(In reply to Jordan Justen from comment #4)
> Created attachment 121900 [details] [review] [review]
> [PATCH] i965/fs: Allow spilling for SIMD16 compute shaders
> 
> Does the attached patch help with the crash?
> 
> Note, the one compute shader program I looked at took
> about 20 seconds to generate a program for, so startup
> may take a while.
> 
> Additionally it had a fair amount of register spilling,
> and therefore the performance may not be very good.

Sadly no.

Tested with...
darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.3.0-devel (git-c95d5c5)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 11.3.0-devel (git-c95d5c5)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.3.0-devel (git-c95d5c5)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
Comment 6 Darius Spitznagel 2016-02-24 21:18:22 UTC
Created attachment 121953 [details]
Alien: Isolation GDB log with applied patch
Comment 7 Darius Spitznagel 2016-02-24 21:20:10 UTC
I have created a new backtrace log with applied patch.
Comment 8 Jordan Justen 2016-03-10 18:58:47 UTC
A commit related to this bug was merged. Apparently
it does not fix the issue:

commit e1d54b1ba5a9d579020fab058bb065866bc35554

    i965/fs: Allow spilling for SIMD16 compute shaders
Comment 9 Eero Tamminen 2016-08-29 08:03:26 UTC
Darius, could you check this with Mesa 12.x?
Comment 10 Darius Spitznagel 2016-08-29 16:13:28 UTC
(In reply to Eero Tamminen from comment #9)
> Darius, could you check this with Mesa 12.x?

Still crashing.

Tested with...
glxinfo   | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel (git-4c53267)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 12.1.0-devel (git-4c53267)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 12.1.0-devel (git-4c53267)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:
Comment 11 Eero Tamminen 2016-08-30 07:46:57 UTC
(In reply to Darius Spitznagel from comment #10)
> OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
> OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel
> (git-4c53267)

I forgot you use Haswell, where Mesa doesn't yet expose GL GL 4.x required by the game.  Game developer has also stated (in bug 93144 comment 31) that the game *requires* compute shaders (which you explicitly disable in first comment of this bug).

The fp64 support still missing for GL 4.3 GEN7 shouldn't affect your game though. Could you try the game also with:
MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 ?
Comment 12 Darius Spitznagel 2016-08-30 19:12:51 UTC
(In reply to Eero Tamminen from comment #11)
> (In reply to Darius Spitznagel from comment #10)
> > OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
> > OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel
> > (git-4c53267)
> 
> I forgot you use Haswell, where Mesa doesn't yet expose GL GL 4.x required
> by the game.  Game developer has also stated (in bug 93144 comment 31) that
> the game *requires* compute shaders (which you explicitly disable in first
> comment of this bug).
> 
> The fp64 support still missing for GL 4.3 GEN7 shouldn't affect your game
> though. Could you try the game also with:
> MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 ?

The test I did yesterday was run with "MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430".
Sorry I didn't mention it.
Comment 13 Darius Spitznagel 2016-08-30 19:21:35 UTC
"MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 %command%" is only set for this game inside Steam Client - not in my bash environment where I called glxinfo to show you that I was using git master.
Comment 14 Matt Turner 2016-08-30 19:35:34 UTC
Created attachment 126122 [details] [review]
patch

(In reply to Darius Spitznagel from comment #1)
> Created attachment 121746 [details]
> Alien: Isolation GDB log

The memset() at line 935 of a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is

>   memset(last_grf_write, 0, sizeof(last_grf_write));

The backtrace shows

#1  0x00007fffe79015fe in memset (__len=482688, __ch=0, __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84

__len=482688?!

last_grf_write is sized as

>   schedule_node *last_grf_write[grf_count * 16];

and 482688 / 16 is 30168. So we have 30168 virtual registers?!

I would very much like to see this shader... could you try to capture it with MESA_SHADER_CAPTURE_PATH=/tmp/alienisolation [...] (after making that directory)? It will write shaders to the directory, and I would expect the last one it writes to be the one causing the trouble. :)



I expect the attached patch will fix it, though I'm not sure it's exactly what we want to do. (Of course capture the shaders without this patch)
Comment 15 Darius Spitznagel 2016-08-30 20:26:44 UTC
Hello Matt,

great, it's working with your patch as you expected:)
No more crash at start!

I only played about 2 minutes.
The only strange thing that I saw where flashing little white boxes (not spots, max 4x4 pixels ore more) which maybe should simulate dust particles in the air - but this is not the issue of this bug report.

Many thanks Matt!

Do you have doubts this patch could land in mesa?
> I expect the attached patch will fix it, though I'm not sure it's exactly what we want to do.
Comment 16 Matt Turner 2016-08-30 21:41:31 UTC
(In reply to Darius Spitznagel from comment #15)
> Do you have doubts this patch could land in mesa?

I cannot say without seeing the shader. 30 thousand virtual registers used is an incredible amount. It's possible the shader is significantly larger than anything we've seen before, but it's equally likely that we have a bug somewhere else that's causing it.

Once I see the shader I should be able to determine what to do next.
Comment 17 Matt Turner 2016-08-31 18:52:25 UTC
So there's no misunderstanding, I'm waiting for you to capture the shader (without my patch).
Comment 18 Darius Spitznagel 2016-08-31 19:39:41 UTC
Created attachment 126150 [details]
AI shaders

Sorry Matt, I was very busy until now.
Here it comes...
Comment 19 Matt Turner 2016-08-31 20:01:12 UTC
(In reply to Darius Spitznagel from comment #18)
> Created attachment 126150 [details]
> AI shaders
> 
> Sorry Matt, I was very busy until now.
> Here it comes...

No problem. Sorry, I didn't mean to rush you. Just wanted to make sure I was communicating clearly.

Doesn't look like there are any compute shaders in there. I'll investigate if our capturing mechanism has a bug. (And I'll have a Bugzilla admin delete the attachment, since we probably cannot distribute the shaders. Sorry, I should have asked you to email them to me).
Comment 20 Tollef Fog Heen 2016-09-01 19:11:43 UTC
The content of attachment 126150 [details] has been deleted for the following reason:

Not public.
Comment 21 Eero Tamminen 2017-02-10 17:43:31 UTC
(In reply to Matt Turner from comment #14)
> Created attachment 126122 [details] [review] [review]
> patch
> 
> (In reply to Darius Spitznagel from comment #1)
> > Created attachment 121746 [details]
> > Alien: Isolation GDB log
> 
> The memset() at line 935 of
> a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is
> 
> >   memset(last_grf_write, 0, sizeof(last_grf_write));
> 
> The backtrace shows
> 
> #1  0x00007fffe79015fe in memset (__len=482688, __ch=0,
> __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84
> 
> __len=482688?!

In my case with today's Mesa & Alien Isoation:
Thread 17 "WinMain" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f09cc122700 (LWP 23775)]
__memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
161	../sysdeps/x86_64/multiarch/memset-avx2.S: No such file or directory.
(gdb) bt
#0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
#1  0x00007f09de059b0c in memset (__len=349056, __ch=0, __dest=0x7f09cc0c98c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
#2  fs_instruction_scheduler::calculate_deps (this=0x7f09cc11ee90) at ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991

-> 349056 byte memset.

After applying patch similar to yours, game crashes in startup when compiler is using another stack array.


> last_grf_write is sized as
> 
> >   schedule_node *last_grf_write[grf_count * 16];
>
> and 482688 / 16 is 30168. So we have 30168 virtual registers?!

This is 64-bit program and pointers are 8 bytes, so 482688/16/8 = 3771, or in my case, 349056/16/8 = 2727.

After a lot of reading assembly, head scratching of how the memset() could crash when according to GDB memory addresses are fine, I went through process memory mappings and it all came clear...


By default, thread stacks are with Glibc 8MB except for the main thread. However, this game sets several of the threads stacks sizes to few hundred kB (one thread was set to only 34KB).

Process' thread stack mappings are followed by 4kB "canary" page which doesn't have read/write access rights.  The segfaults happen when Mesa crosses the boundary from 384 kB stack to that.


When I manually tried following with Gdb:
------------------------
(gdb) b pthread_attr_setstacksize
Breakpoint 6 at 0x7f7ea34f21d0: file pthreadP.h, line 631.
(gdb) c
Continuing.
[New Thread 0x7f7e84532700 (LWP 3333)]

Thread 6 "WinMain" hit Breakpoint 6, __pthread_attr_setstacksize (attr=0x7f7e85bf7890, stacksize=393216) at pthread_attr_setstacksize.c:38
38	in pthread_attr_setstacksize.c
(gdb) finish
Run till exit from #0  __pthread_attr_setstacksize (attr=0x7f7e85bf7890, stacksize=393216) at pthread_attr_setstacksize.c:38
0x0000000000b1dd75 in ?? ()
Value returned is $12 = 0
(gdb) delete 6
(gdb) print __pthread_attr_setstacksize(0x7f7e85bf7890, 4194304)
$13 = 0
(gdb) b pthread_attr_setstacksize
Breakpoint 7 at 0x7f7ea34f21d0: file pthreadP.h, line 631.
(gdb) c
------------------------

To change too small stack sizes to something larger (in this case 4MB) at run-time, the game started fine, it just takes a *long* time.


So, either this game needs some LD_PRELOAD that maps pthread_attr_setstacksize() function to a no-op, or Mesa compiler needs to be changed to use heap for anything that might be even a bit larger (which can make it a bit slower).


It's interesting why the other compilers work fine with this, are they much more frugal in their stack usage?
Comment 22 Eero Tamminen 2017-02-13 13:02:39 UTC
(In reply to Eero Tamminen from comment #21)
> So, either this game needs some LD_PRELOAD that maps
> pthread_attr_setstacksize() function to a no-op, or Mesa compiler needs to
> be changed to use heap for anything that might be even a bit larger (which
> can make it a bit slower).

Short term workaround in Mesa could be checking current thread's stack size before compilation and fixing the size, if it's set too low by the application.

As to longer term solution... Just changing compiler to do larger allocs from heap instead of using stack, will still assume certain amount of stack being free, and it's not easy to track how much each compiler commit changes that.
-> Better would be doing compilation in separate thread (i.e. where application won't mess with its stack size).
Comment 23 Eero Tamminen 2017-02-13 16:50:19 UTC
(In reply to Eero Tamminen from comment #22)
> Short term workaround in Mesa could be checking current thread's stack size
> before compilation and fixing the size, if it's set too low by the
> application.

Tried:

* Adding such functionality to _mesa_compile_shader() -> there were still crashes

* Using LD_PRELOAD for pthread_attr_setstacksize() which filters out all calls setting stack sizes <8MB -> game works fine (so doing same manually with Gdb wasn't just timing related luck)

-> I assume compiler thread isn't the only one with stack size issues

(And this isn't the only issue with this game's stack handling, its stack is both writable & executable which is security-wise nasty for anything networked.)


Btw. While testing the Mesa workaround, I bumped also into larger reg allocation:
(gdb) bt
#0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
#1  0x00007f935282bf5c in memset (__len=968064, __ch=0, __dest=0x7f934000a6c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
#2  fs_instruction_scheduler::calculate_deps (this=0x7f93400f6e90) at ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991
(gdb) print 968064/16/8
$1 = 7563

That's 1MB stack allocation for this one array... 


> As to longer term solution... Just changing compiler to do larger allocs
> from heap instead of using stack, will still assume certain amount of stack
> being free, and it's not easy to track how much each compiler commit changes
> that.
> -> Better would be doing compilation in separate thread (i.e. where
> application won't mess with its stack size).

That would hopefully also speed up AlienIsolation startup, both to main menu, and from that to actual game.  Currently it's really slow.
Comment 24 Ian Romanick 2017-02-13 19:45:12 UTC
(In reply to Eero Tamminen from comment #21)
> (In reply to Matt Turner from comment #14)
> > Created attachment 126122 [details] [review] [review] [review]
> > patch
> > 
> > (In reply to Darius Spitznagel from comment #1)
> > > Created attachment 121746 [details]
> > > Alien: Isolation GDB log
> > 
> > The memset() at line 935 of
> > a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is
> > 
> > >   memset(last_grf_write, 0, sizeof(last_grf_write));
> > 
> > The backtrace shows
> > 
> > #1  0x00007fffe79015fe in memset (__len=482688, __ch=0,
> > __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84
> > 
> > __len=482688?!
> 
> In my case with today's Mesa & Alien Isoation:
> Thread 17 "WinMain" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f09cc122700 (LWP 23775)]
> __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
> 161	../sysdeps/x86_64/multiarch/memset-avx2.S: No such file or directory.
> (gdb) bt
> #0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
> #1  0x00007f09de059b0c in memset (__len=349056, __ch=0,
> __dest=0x7f09cc0c98c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
> #2  fs_instruction_scheduler::calculate_deps (this=0x7f09cc11ee90) at
> ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991
> 
> -> 349056 byte memset.
> 
> After applying patch similar to yours, game crashes in startup when compiler
> is using another stack array.

Eero, are you able to collect the shaders that being compiled when the stack explodes?  Matt's suspicion is that there is another bug somewhere that causes things to go sideways.  Without seeing the shaders, it's just a guessing game.
Comment 25 Eero Tamminen 2017-02-14 11:51:01 UTC
Got an apitrace, I'll mail you & Matt a link to it.


There are multiple FS shaders that spill 20 to 50 items, and compute shaders like this:
-------------------------
43621: message: shader compiler performance issue 655: compute shader triggered register spilling.  Try reducing the number of live scalar values to improve performance.
43621: message: shader compiler issue 656: CS SIMD32 shader: 23733 inst, 0 loops, 415136 cycles, 1507:3332 spills:fills, Promoted 0 constants, compacted 379728 to 283984 bytes.
47174: message: shader compiler performance issue 828: SIMD16 shader failed to compile: CS compile failed: Failure to register allocate.  Reduce number of live scalar values to avoid this.
47174: message: shader compiler issue 829: CS SIMD8 shader: 6683 inst, 0 loops, 83406 cycles, 263:590 spills:fills, Promoted 18 constants, compacted 106928 to 72800 bytes.
-------------------------

(The first CS shader takes at least 10 min to compile before getting to game menus, and same, or similar shader takes another 10 min when starting game from the menu.)
Comment 26 Eero Tamminen 2017-02-14 15:23:37 UTC
Alien Isolation apitrace replay naturally works fine without pthread_attr_setstacksize() LD_PRELOAD hack.

I think this can be marked as NOTOURBUG.  There's no reason why game should lower its thread stack sizes.  It's 64-bit program, so it won't run out of address space, and unused stack doesn't consume any real memory because Linux uses overcommit.
Comment 27 Darius Spitznagel 2017-02-14 18:05:48 UTC
Hello Eero,

> (The first CS shader takes at least 10 min to compile before getting to game
> menus, and same, or similar shader takes another 10 min when starting game
> from the menu.)

this really confuse me.

With my Intel Haswell Iris Pro 5200 system I need less then 3 Minutes to get into the game and play (continue from save). You say you need 20 Min?
I use Matts patch of course.

What I see is that video-play and shader compilation all run on one cpu core out of 8. What a waste!
The videos (original from the movie) are all blocked by shader compilation. I don't think this is the intention from the game developers.
To split video and shader compile into at least two threads should already bring better loading times and makes total sense.
And as far as I know this is the part of the OpenGL driver and not the application.

Mesa+radeonsi seems to run fine, look here...
https://bugs.freedesktop.org/show_bug.cgi?id=93144

In Comment 31...
https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31 Edwin Smith from Feral is happy to help.
Comment 28 Eero Tamminen 2017-02-15 08:46:30 UTC
(In reply to Darius Spitznagel from comment #27)
> > (The first CS shader takes at least 10 min to compile before getting to game
> > menus, and same, or similar shader takes another 10 min when starting game
> > from the menu.)

Sorry, I should have properly measured it.   The whole startup to menu is ~9 min, and ~9 min from menu to game, on SKL i5 with default Mesa compile options (no debug, -O2).

Based on INTEL_DEBUG=perf output, I would guess that the indicated single compute shader takes maybe half of that time (and few other compute shaders are quite slow to compile too), i.e. I was off by factor of 2.


> this really confuse me.
> 
> With my Intel Haswell Iris Pro 5200 system I need less then 3 Minutes to get
> into the game and play (continue from save).
>
> I use Matts patch of course.

At least in my case, on SKL GT2 with Mesa git from earlier this week, Matt's patch isn't enough, if Alien Isolation compiles also compute shaders.  It will then just run out of stack in another function


Are you forcing GL 4.3 / compute on, like earlier? [1]

If not, which Mesa version do you use?  Haswell officially provides GL 4.2 in Mesa 17.0.0 (required FP64 came on Jan 12th), which was released 2 days ago.    Only with latest Mesa git version Haswell officially supports GL 4.5 (Jan 16th commit).  Compute shaders come in GL 4.3.

[1] Game will start, but doesn't render correctly if compute shader support is missing:
  https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31


Also, which exact CPU model do you have?


> What I see is that video-play and shader compilation all run on one cpu core
> out of 8. What a waste!
> The videos (original from the movie) are all blocked by shader compilation.
> I don't think this is the intention from the game developers.
> To split video and shader compile into at least two threads should already
> bring better loading times and makes total sense.

Game needs to know when the parallel shader compilation has finished.  That's an extension on top of GL 4.5:
https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_parallel_shader_compile.txt

Game needs to specifically have support for that.


> And as far as I know this is the part of the OpenGL driver and not the
> application.
> 
> Mesa+radeonsi seems to run fine, look here...
> https://bugs.freedesktop.org/show_bug.cgi?id=93144

That uses gallium / LLVM backend.  Maybe it uses less stack, or used at that point, and would now also run out of stack.

300-400 KB of stack is ridiculously low amount of stack if there's something that actually uses it.  Game cannot assume that driver will use less than that (game itself will use some of that stack too).
Comment 29 Darius Spitznagel 2017-02-15 18:30:15 UTC
> Are you forcing GL 4.3 / compute on, like earlier? [1]
> 
> If not, which Mesa version do you use?  Haswell officially provides GL 4.2
> in Mesa 17.0.0 (required FP64 came on Jan 12th), which was released 2 days
> ago.    Only with latest Mesa git version Haswell officially supports GL 4.5
> (Jan 16th commit).  Compute shaders come in GL 4.3.
> 
> [1] Game will start, but doesn't render correctly if compute shader support
> is missing:
>   https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31

I use Mesa 17.0.0 with no override.
Mesa 17.0.0 exposes OpenGL 4.5 for Haswell. The release notes at http://mesa3d.org are totally WRONG!!!
Look here...
https://cgit.freedesktop.org/mesa/mesa/commit/?id=d2590eb65ff28a9cbd592353d15d7e6cbd2c6fc6

Clone branch mesa-17.0.0 and search for "Haswell" and you will see it.

IMPORTANT: You also need a newer kernel otherwise CS will not be exposed due to missing "pipelined register writes" support (see intel_extensions.c)
Kernel 4.8 should be fine.

So I use no magic but Matts patch and some others for Divinity: OS.

> Also, which exact CPU model do you have?

Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz

> Game needs to know when the parallel shader compilation has finished. 
> That's an extension on top of GL 4.5:
> https://www.khronos.org/registry/OpenGL/extensions/ARB/
> ARB_parallel_shader_compile.txt
> 
> Game needs to specifically have support for that.

AHA, very interesting! Thanks for this info.
 
> That uses gallium / LLVM backend.  Maybe it uses less stack, or used at that
> point, and would now also run out of stack.

I know.

> 300-400 KB of stack is ridiculously low amount of stack if there's something
> that actually uses it.  Game cannot assume that driver will use less than
> that (game itself will use some of that stack too).

A good point to look what others do.
Comment 30 Eero Tamminen 2017-02-16 12:58:45 UTC
(In reply to Darius Spitznagel from comment #29)
> I use Mesa 17.0.0 with no override.
> Mesa 17.0.0 exposes OpenGL 4.5 for Haswell. The release notes at
> http://mesa3d.org are totally WRONG!!!..
...
> Clone branch mesa-17.0.0 and search for "Haswell" and you will see it.

I see (should have checked that myself as it seemed odd).


> So I use no magic but Matts patch and some others for Divinity: OS.

Then it's indeed weird that you don't see the the stack overflow issues I'm seeing.

You aren't by any chance using Timothy's shader cache stuff?


>> Also, which exact CPU model do you have?
> 
> Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz

That cannot explain >2x shader compilation speed difference either.
Comment 31 Darius Spitznagel 2017-02-18 14:53:19 UTC
> You aren't by any chance using Timothy's shader cache stuff?

No.


In the meantime I have compiled mesa master and applied Matts patch + divinity (vendor+ARB_shading_language_include) patch.

darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:

Alien: Isolation behaves in the same way as with 17.0.0...
- no crash.
- ~3 minutes to get into game from save.

Maybe something is wrong with your build-environment?

Here are some details of my current system:
Debian 9.0 Stretch (Testing - Full freeze as of 5th February)

Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz (Hasswel GTIris Pro 5200)

8GB of RAM

libdrm 2.4.75 (needed to compile mesa master / before 2.4.74)

LLVM 3.9.1

gcc version 6.3.0 20170205 (Debian 6.3.0-6)

cat /var/log/Xorg.0.log| grep DRI
[  6530.102] (II) glamor: EGL version 1.4 (DRI2):
[  6530.211] (II) modeset(0): [DRI2] Setup complete
[  6530.211] (II) modeset(0): [DRI2]   DRI driver: i965
[  6530.211] (II) modeset(0): [DRI2]   VDPAU driver: i965
[  6530.215] (II) GLX: Initialized DRI2 GL provider for screen 0

Gnome Flashback session manager
Comment 32 Darius Spitznagel 2017-02-21 20:03:56 UTC
Hello Eero,

are there any news on this?

SKL and HSW are very different.
Did you test mesa git with HSW instead of SKL?

Maybe Xorg does play a role here? Forgot to mention my current Xorg version in last post...

xserver-xorg-core 1.19.1
Comment 33 Eero Tamminen 2017-02-22 14:34:31 UTC
(In reply to Darius Spitznagel from comment #32)
> are there any news on this?

I've provided Matt with apitrace of the startup, so he can investigate the compiler side.


> SKL and HSW are very different.
> Did you test mesa git with HSW instead of SKL?

Don't have a set up where I could try the real game on one, but apitrace replay didn't compile instruction-count-wise as large CS shaders as SKL (before it failed to run rest of the trace), this was the largest:
43621: message: shader compiler issue 653: CS SIMD16 shader: 5210 inst, 0 loops, 118772 cycles, 207:471 spills:fills, Promote
d 0 constants, compacted 83360 to 58672 bytes.


Btw. I tried Apitrace trace on BDW and as expected, it had the same issue as SKL.  Perf says time is spent in intel backend:
-----------------------
Overhead  Symbol                                          
  47,44%  ra_allocate
  15,38%  fs_visitor::virtual_grf_interferes
  12,04%  fs_visitor::assign_regs
   8,95%  ra_add_node_adjacency
   2,98%  decrement_q.isra.2
   1,63%  ra_add_node_interference
-----------------------
Which seems very similar to bug 98455.


> Maybe Xorg does play a role here? Forgot to mention my current Xorg version
> in last post...
> 
> xserver-xorg-core 1.19.1

X version should have no impact on what gets compiled in compute shader.

On BDW+, SIMD32 is needed to satisfy compute workgroup requirements, but from the Apitrace output it seems that on HSW SIMD16 is enough, and that requires a lot less registers.

-> That explains why things works on HSW, but not on anything newer.  >3x larger number of registers needs more stack to process and is *much* slower to compile (it spills a lot).
Comment 34 _archuser_ 2017-03-15 01:40:04 UTC
FYI:
Mesa 17.0.1 on Haswell with Kernel 4.10.1 on arch:

Without overrides I get the incompatibility warning and crash while loading the menu, by disabling the shader prewarmer in ~/.local/share/feral-interactive/AlienIsolation/preferences
(EnableShaderWarmer / EnableShaderWarmerPredraw = 0) the game starts and is playable (very fast load times, bad hickups as in game shaders are compiled on the fly?)
The prewarmer seems to compile all shaders for the game in one go, maybe that is a reason for such a huge running shader program ?
Comment 35 Darius Spitznagel 2017-03-15 17:41:46 UTC
(In reply to _archuser_ from comment #34)
> FYI:
> Mesa 17.0.1 on Haswell with Kernel 4.10.1 on arch:
> 
> Without overrides I get the incompatibility warning and crash while loading
> the menu, by disabling the shader prewarmer in
> ~/.local/share/feral-interactive/AlienIsolation/preferences
> (EnableShaderWarmer / EnableShaderWarmerPredraw = 0) the game starts and is
> playable (very fast load times, bad hickups as in game shaders are compiled
> on the fly?)
> The prewarmer seems to compile all shaders for the game in one go, maybe
> that is a reason for such a huge running shader program ?

I can confirm it's working without Matts patch this way.
Now I need less than 1 minute to get into game from save.
You are right... now some shaders get compiled when needed on the fly when you enter some rooms/floors and cause stuttering.

I also see the little flashing white boxes I have reported in comment 15.
It seems something is not rendered correctly?!
I will try enabling/disabling some effects and report back.

The only difference is that I don't get an incompatibility warning and I also use Mesa 17.0.1 now.
My kernel is 4.9.13 on Haswell Iris Pro 5200.

Eero, could you check your Skylake system with "EnableShaderWarmer" and "EnableShaderWarmerPredraw" set to 0 (disabled)?

Nice catch _archuser_:)
Comment 36 Eero Tamminen 2017-03-16 12:40:04 UTC
(In reply to _archuser_ from comment #34)
> by disabling the shader prewarmer in
> ~/.local/share/feral-interactive/AlienIsolation/preferences
> (EnableShaderWarmer / EnableShaderWarmerPredraw = 0) the game starts and is
> playable (very fast load times, bad hickups as in game shaders are compiled
> on the fly?)
> The prewarmer seems to compile all shaders for the game in one go, maybe
> that is a reason for such a huge running shader program ?

By using those options, the game starts very quickly on SKL too, and there are no crashes during startup or short gameplay (just more stuttering during actual gameplay due to on-demand shader compilation).

As it starts fine without dummy version of pthread_attr_setstacksize(),  none of those mega compute shaders is needed / used during startup or early gameplay.

I wonder at which place in the game they're actually used...
Comment 37 Darius Spitznagel 2017-03-16 16:59:20 UTC
(In reply to Eero Tamminen from comment #36)
> (In reply to _archuser_ from comment #34)
> > by disabling the shader prewarmer in
> > ~/.local/share/feral-interactive/AlienIsolation/preferences
> > (EnableShaderWarmer / EnableShaderWarmerPredraw = 0) the game starts and is
> > playable (very fast load times, bad hickups as in game shaders are compiled
> > on the fly?)
> > The prewarmer seems to compile all shaders for the game in one go, maybe
> > that is a reason for such a huge running shader program ?
> 
> By using those options, the game starts very quickly on SKL too, and there
> are no crashes during startup or short gameplay (just more stuttering during
> actual gameplay due to on-demand shader compilation).
> 
> As it starts fine without dummy version of pthread_attr_setstacksize(), 
> none of those mega compute shaders is needed / used during startup or early
> gameplay.
> 
> I wonder at which place in the game they're actually used...

Good news:)
But the real question is, how this problem can be solved in the manner of "out of the box experience" on intel i965?
I don't know if feral is using a shaderwarmer for more games. The ones I have beside Alien: Isolation work without any "tricks".

Maybe Alex Smith from feral who is involved in radv development could give more insight?
It would be interesting to know if the shaderwarmer is also a problem for other mesa drivers like radeon, radeonsi, nouveau...?

This report is now over 1 year old and thanks to _archuser_ there is now a workaround for intel.

I'am happy that I can compile new versions of Mesa now without Matts patch as it seems it will never go upstream and does not work with Skylake.

For me this report/issue can be closed, but I would of cource prefer an "out of the box experience" solution for intel users.
There is definitely somewhere a problem with huge/massive shaderprograms (maybe simply a misbehavior) on i965.
Comment 38 Matt Turner 2017-03-16 17:12:35 UTC
Cc'ing Edwin Smith from Feral.

Edwin, Alien Isolation crashes with i965 due to the i965 compiler using too much stack space. This is certainly something we can fix (and indeed I did with the patch in comment #14), but Eero identified that the game seems to be self-limiting the amount of stack space (See comment #21).

Can you shed some light on why Alien Isolation calls pthread_attr_setstacksize() to limit its threads' stack sizes?
Comment 39 Marc Di Luzio 2017-03-17 13:09:27 UTC
(In reply to Matt Turner from comment #38)
> Cc'ing Edwin Smith from Feral.
> 
> Edwin, Alien Isolation crashes with i965 due to the i965 compiler using too
> much stack space. This is certainly something we can fix (and indeed I did
> with the patch in comment #14), but Eero identified that the game seems to
> be self-limiting the amount of stack space (See comment #21).
> 
> Can you shed some light on why Alien Isolation calls
> pthread_attr_setstacksize() to limit its threads' stack sizes?

Marc (from Feral) here, Edwin copied me in. There's a couple of things here I can shed some light on.

To answer Darius first, for the shader warmer in AI we didn't test heavily on Mesa as support wasn't really on the roadmap. That'll explain why this didn't get spotted in production. We've not seen similar issues with shader warmers on other drivers, but that's explained by...

Matt - AI is self limiting it's stack sizes for it's rendering thread, as it does on other platforms. We didn't change that because at the time it didn't show up with any issues in our testing. Obviously that's not the case anymore. What's interesting is that AI stands out because on Intel it's using the same GLX_MESA_multithread_makecurrent path that XCOM:EU used, later ditched when we moved to our own dispatch method. This means the shader compiles will be happening on that same limited thread, causing you folks problems.

I suspect if you disable GLX_MESA_multithread_makecurrent temporarily you'll see the problem go away without the need for the patch, although you'll see a change in performance.

Interestingly we have recently spotted the issue in a different game on Mesa using a similar un-shipped non-dispatch method, so we've fixed that now internally and have made space for the Mesa compiler.

Also, from what I can tell, our games after AI are all giving the driver almost a full system default thread stack on our dispatch thread, so this hopefully won't be an issue going forward.

Cheers for the fix in the mean time.
Comment 40 Eero Tamminen 2017-03-17 13:52:42 UTC
To summarize the earlier discussion & analysis...

* the crash

This is due to game setting ridiculously small (<400kB instead of default 8MB) stack for the compiler thread.  

It's bug in game.  Changing stack size is unnecessary because:
- Linux uses overcommit, i.e. unused mappings don't take real memory
- game uses so few threads, and it's 64-bit one so that even without overcommit the default (8MB) thread sizes won't be any problem

On Mesa side, for HSW the issue could be worked around by allocating one compiler array from heap instead of stack (Matt's patch), but that isn't enough for newer HW.


* slow startup

This is related to above, the stack usage comes from few huge compute shaders, which register-spill badly, especially on BDW+ where HW requires compiling it in SIMD32 mode for the requested workgroup size (on HSW they can be compiled with SIMD16 mode).

If those heavy compute shaders are actually used in the game (with the user selected GFX quality level)[1], this is Mesa i965 backend problem instead of game issue:
- i965 register allocation and spill handling needs to be improved to speed up compilation
- With shader cache, compilation could be skipped after this (game compiles these shader before getting to main menu, *and* after the game is started from the menu), but that's not yet enabled for Intel

[1] After disabling the shader warmer, none of these heavy compute shaders seem to be used early in the game (neither in story, nor survival mode).


Threaded shader compilation could help both of these issues (separate thread would have full stack), but e.g. ARB_parallel_shader_compile extension isn't supported by Mesa yet, and would require game to utilize it.
Comment 41 Darius Spitznagel 2017-03-23 07:46:44 UTC
(In reply to Marc Di Luzio from comment #39)

> I suspect if you disable GLX_MESA_multithread_makecurrent temporarily you'll
> see the problem go away without the need for the patch, although you'll see
> a change in performance.

Hello Marc,
how can I disable this?

Neither
https://www.mesa3d.org/envvars.html
nor
https://www.mesa3d.org/extensions.html
explains it.

Trying "GLX_MESA_multithread_makecurrent=false %command%" in steam did not work.
Comment 42 Matt Turner 2017-03-23 18:46:24 UTC
(In reply to Marc Di Luzio from comment #39) 
> Interestingly we have recently spotted the issue in a different game on Mesa
> using a similar un-shipped non-dispatch method, so we've fixed that now
> internally and have made space for the Mesa compiler.
> 
> Also, from what I can tell, our games after AI are all giving the driver
> almost a full system default thread stack on our dispatch thread, so this
> hopefully won't be an issue going forward.
> 
> Cheers for the fix in the mean time.

Thanks Marc. Is that to say that we shouldn't expect an update or fix to Alien Isolation?

Trying to determine whether my patch should be committed or not.
Comment 43 Marc Di Luzio 2017-03-24 09:56:29 UTC
(In reply to Darius Spitznagel from comment #41)
> Hello Marc,
> how can I disable this?

You'd haver to compile a custom Mesa that doesn't expose the extension, but since that's not feasible for most here's a method in our preferences:
 
in ~/.local/share/feral-interactive/AlienIsolation/preferences you'll find an "SDL" "Config" section, on the "EnableForceFeedback" line replace the string with "DisableDispatchlessContext" and set it to 1. This will emulate not having multithread_makecurrent. It'll also more closely match with how the game will be running on AMD and Nvidia.


(In reply to Matt Turner from comment #42)
> Thanks Marc. Is that to say that we shouldn't expect an update or fix to
> Alien Isolation?

I'm hesitant to offer an exact time frame for a fix but consider it on our list.
Comment 44 Darius Spitznagel 2017-04-28 16:11:59 UTC
Sorry for being late but there is no SDL Config section in my preferences file.
Comment 45 Marc Di Luzio 2017-05-02 09:11:14 UTC
(In reply to Darius Spitznagel from comment #44)
> Sorry for being late but there is no SDL Config section in my preferences
> file.

Right, sorry, you'll have to add it.

It should look like this:

            <key name="SDL">
                <key name="Config">
                    <value name="DisableDispatchlessContext" type="integer">1</value>
                </key>
            </key>

With the "SDL" key section nested within the "Software" catagory.

Cheers,
Comment 46 Marc Di Luzio 2017-05-02 09:13:01 UTC
(In reply to Marc Di Luzio from comment #45)
> (In reply to Darius Spitznagel from comment #44)
> > Sorry for being late but there is no SDL Config section in my preferences
> > file.
> 
> Right, sorry, you'll have to add it.
>
>             <key name="SDL">
>                 <key name="Config">
>                     <value name="DisableDispatchlessContext"
> type="integer">1</value>
>                 </key>
>             </key>


Apologies for the spam, but let's correct that to be on one line for neatness...

<key name="SDL">
    <key name="Config">
        <value name="DisableDispatchlessContext" type="integer">1</value>
    </key>
</key>
Comment 47 Darius Spitznagel 2017-05-04 15:27:43 UTC
(In reply to Marc Di Luzio from comment #46)
> (In reply to Marc Di Luzio from comment #45)
> > (In reply to Darius Spitznagel from comment #44)
> > > Sorry for being late but there is no SDL Config section in my preferences
> > > file.
> > 
> > Right, sorry, you'll have to add it.
> >
> >             <key name="SDL">
> >                 <key name="Config">
> >                     <value name="DisableDispatchlessContext"
> > type="integer">1</value>
> >                 </key>
> >             </key>
> 
> 
> Apologies for the spam, but let's correct that to be on one line for
> neatness...
> 
> <key name="SDL">
>     <key name="Config">
>         <value name="DisableDispatchlessContext" type="integer">1</value>
>     </key>
> </key>

I can confirm - it works.
Thank you, Marc.
Hope your (Ferals) patch for this issue will come soon.
Comment 48 Timothy Arceri 2019-07-04 02:28:25 UTC
It was confirmed multiple times in this bug report that that wasn't a mesa bug. 

The game works out of the box on my Skylake so it would seem Feral fixed the issue as promised in one of its updates. Closing.
Comment 49 Matt Turner 2019-07-04 18:00:46 UTC
(In reply to Matt Turner from comment #14)
> Created attachment 126122 [details] [review] [review]
> patch
> 

For what it's worth, I committed basically an identical patch as a part of the fp64 work:

commit 18b4e87370d3ebb9d7dbb51e58b2da1b64a2227f
Author: Matt Turner <mattst88@gmail.com>
Date:   Mon Dec 10 11:50:55 2018 -0800

    intel/compiler: Heap-allocate temporary storage

No idea if they fixed the game, but I think we can safely say this is fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.