Bug 93840 - [i965] Compiler backend uses too much stack with Alien: Isolation
Summary: [i965] Compiler backend uses too much stack with Alien: Isolation
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-24 15:55 UTC by Darius Spitznagel
Modified: 2017-02-22 14:34 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Alien: Isolation GDB log (226.78 KB, text/plain)
2016-02-14 14:00 UTC, Darius Spitznagel
Details
[PATCH] i965/fs: Allow spilling for SIMD16 compute shaders (2.88 KB, patch)
2016-02-22 20:31 UTC, Jordan Justen
Details | Splinter Review
Alien: Isolation GDB log with applied patch (224.97 KB, text/plain)
2016-02-24 21:18 UTC, Darius Spitznagel
Details
patch (1.60 KB, patch)
2016-08-30 19:35 UTC, Matt Turner
Details | Splinter Review
AI shaders (deleted)
2016-08-31 19:39 UTC, Darius Spitznagel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Darius Spitznagel 2016-01-24 15:55:50 UTC
I tested this game with my Iris Pro 5200 (i965) and mesa master as GL_ARB_compute_shader for Intel is ready there.

But the game quits right before the 20th Century Fox video should play.
When I disable compute_shader (MESA_EXTENSION_OVERRIDE=-GL_ARB_compute_shader) I can get into the game but all I see are the legs from Ripley, the rest is totally dark. Switching SSAO doesn't help.

I also did an apitrace with disabled compute_shader and can confirm, that the game requests compute_shader right before the start of the 20th Century Fox video.

I have posted this issue also here...
https://bugs.freedesktop.org/show_bug.cgi?id=93144
But as suggested by Alexandre Demers I created a new report for i965.

Maybe the game also needs GL_ARB_stencil_texturing which is ready for Intel gen8 GPUs but not gen7 (like mine).
I write this because Shadow of Mordor also uses compute_shader but this games works great on Intel gen7 and mesa master.
Comment 1 Darius Spitznagel 2016-02-14 14:00:01 UTC
Created attachment 121746 [details]
Alien: Isolation GDB log
Comment 2 Darius Spitznagel 2016-02-14 14:01:26 UTC
I have create a backtrace with gdb.
Hope I did it right and it helps.
On every stop I did "bt" and "bt full" then "cont" until the program terminated.

Debugging Symbols installed:
libgbm1-dbg
libgl1-mesa-dri-dbg
libgl1-mesa-glx-dbg
libglapi-mesa-dbg

Environment:
export MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430

Mesa:
darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.2.0-devel (git-a4cff18)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 11.2.0-devel (git-a4cff18)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.2.0-devel (git-a4cff18)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
Comment 3 Darius Spitznagel 2016-02-19 23:57:45 UTC
Hello devs,

common guys,
I know you all (also me) have plenty do do, but I wish that someone could have a look into this.
I think if more games (apps) run on foss drivers, the better for all.
I really like to help you how much I can, but without any answer my hands are bound.

Kind regards
Darius
Comment 4 Jordan Justen 2016-02-22 20:31:41 UTC
Created attachment 121900 [details] [review]
[PATCH] i965/fs: Allow spilling for SIMD16 compute shaders

Does the attached patch help with the crash?

Note, the one compute shader program I looked at took
about 20 seconds to generate a program for, so startup
may take a while.

Additionally it had a fair amount of register spilling,
and therefore the performance may not be very good.
Comment 5 Darius Spitznagel 2016-02-24 17:09:29 UTC
(In reply to Jordan Justen from comment #4)
> Created attachment 121900 [details] [review] [review]
> [PATCH] i965/fs: Allow spilling for SIMD16 compute shaders
> 
> Does the attached patch help with the crash?
> 
> Note, the one compute shader program I looked at took
> about 20 seconds to generate a program for, so startup
> may take a while.
> 
> Additionally it had a fair amount of register spilling,
> and therefore the performance may not be very good.

Sadly no.

Tested with...
darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.3.0-devel (git-c95d5c5)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 11.3.0-devel (git-c95d5c5)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 11.3.0-devel (git-c95d5c5)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
Comment 6 Darius Spitznagel 2016-02-24 21:18:22 UTC
Created attachment 121953 [details]
Alien: Isolation GDB log with applied patch
Comment 7 Darius Spitznagel 2016-02-24 21:20:10 UTC
I have created a new backtrace log with applied patch.
Comment 8 Jordan Justen 2016-03-10 18:58:47 UTC
A commit related to this bug was merged. Apparently
it does not fix the issue:

commit e1d54b1ba5a9d579020fab058bb065866bc35554

    i965/fs: Allow spilling for SIMD16 compute shaders
Comment 9 Eero Tamminen 2016-08-29 08:03:26 UTC
Darius, could you check this with Mesa 12.x?
Comment 10 Darius Spitznagel 2016-08-29 16:13:28 UTC
(In reply to Eero Tamminen from comment #9)
> Darius, could you check this with Mesa 12.x?

Still crashing.

Tested with...
glxinfo   | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel (git-4c53267)
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 12.1.0-devel (git-4c53267)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 12.1.0-devel (git-4c53267)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:
Comment 11 Eero Tamminen 2016-08-30 07:46:57 UTC
(In reply to Darius Spitznagel from comment #10)
> OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
> OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel
> (git-4c53267)

I forgot you use Haswell, where Mesa doesn't yet expose GL GL 4.x required by the game.  Game developer has also stated (in bug 93144 comment 31) that the game *requires* compute shaders (which you explicitly disable in first comment of this bug).

The fp64 support still missing for GL 4.3 GEN7 shouldn't affect your game though. Could you try the game also with:
MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 ?
Comment 12 Darius Spitznagel 2016-08-30 19:12:51 UTC
(In reply to Eero Tamminen from comment #11)
> (In reply to Darius Spitznagel from comment #10)
> > OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
> > OpenGL core profile version string: 3.3 (Core Profile) Mesa 12.1.0-devel
> > (git-4c53267)
> 
> I forgot you use Haswell, where Mesa doesn't yet expose GL GL 4.x required
> by the game.  Game developer has also stated (in bug 93144 comment 31) that
> the game *requires* compute shaders (which you explicitly disable in first
> comment of this bug).
> 
> The fp64 support still missing for GL 4.3 GEN7 shouldn't affect your game
> though. Could you try the game also with:
> MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 ?

The test I did yesterday was run with "MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430".
Sorry I didn't mention it.
Comment 13 Darius Spitznagel 2016-08-30 19:21:35 UTC
"MESA_GL_VERSION_OVERRIDE=4.3 MESA_GLSL_VERSION_OVERRIDE=430 %command%" is only set for this game inside Steam Client - not in my bash environment where I called glxinfo to show you that I was using git master.
Comment 14 Matt Turner 2016-08-30 19:35:34 UTC
Created attachment 126122 [details] [review]
patch

(In reply to Darius Spitznagel from comment #1)
> Created attachment 121746 [details]
> Alien: Isolation GDB log

The memset() at line 935 of a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is

>   memset(last_grf_write, 0, sizeof(last_grf_write));

The backtrace shows

#1  0x00007fffe79015fe in memset (__len=482688, __ch=0, __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84

__len=482688?!

last_grf_write is sized as

>   schedule_node *last_grf_write[grf_count * 16];

and 482688 / 16 is 30168. So we have 30168 virtual registers?!

I would very much like to see this shader... could you try to capture it with MESA_SHADER_CAPTURE_PATH=/tmp/alienisolation [...] (after making that directory)? It will write shaders to the directory, and I would expect the last one it writes to be the one causing the trouble. :)



I expect the attached patch will fix it, though I'm not sure it's exactly what we want to do. (Of course capture the shaders without this patch)
Comment 15 Darius Spitznagel 2016-08-30 20:26:44 UTC
Hello Matt,

great, it's working with your patch as you expected:)
No more crash at start!

I only played about 2 minutes.
The only strange thing that I saw where flashing little white boxes (not spots, max 4x4 pixels ore more) which maybe should simulate dust particles in the air - but this is not the issue of this bug report.

Many thanks Matt!

Do you have doubts this patch could land in mesa?
> I expect the attached patch will fix it, though I'm not sure it's exactly what we want to do.
Comment 16 Matt Turner 2016-08-30 21:41:31 UTC
(In reply to Darius Spitznagel from comment #15)
> Do you have doubts this patch could land in mesa?

I cannot say without seeing the shader. 30 thousand virtual registers used is an incredible amount. It's possible the shader is significantly larger than anything we've seen before, but it's equally likely that we have a bug somewhere else that's causing it.

Once I see the shader I should be able to determine what to do next.
Comment 17 Matt Turner 2016-08-31 18:52:25 UTC
So there's no misunderstanding, I'm waiting for you to capture the shader (without my patch).
Comment 18 Darius Spitznagel 2016-08-31 19:39:41 UTC
Created attachment 126150 [details]
AI shaders

Sorry Matt, I was very busy until now.
Here it comes...
Comment 19 Matt Turner 2016-08-31 20:01:12 UTC
(In reply to Darius Spitznagel from comment #18)
> Created attachment 126150 [details]
> AI shaders
> 
> Sorry Matt, I was very busy until now.
> Here it comes...

No problem. Sorry, I didn't mean to rush you. Just wanted to make sure I was communicating clearly.

Doesn't look like there are any compute shaders in there. I'll investigate if our capturing mechanism has a bug. (And I'll have a Bugzilla admin delete the attachment, since we probably cannot distribute the shaders. Sorry, I should have asked you to email them to me).
Comment 20 Tollef Fog Heen 2016-09-01 19:11:43 UTC
The content of attachment 126150 [details] has been deleted for the following reason:

Not public.
Comment 21 Eero Tamminen 2017-02-10 17:43:31 UTC
(In reply to Matt Turner from comment #14)
> Created attachment 126122 [details] [review] [review]
> patch
> 
> (In reply to Darius Spitznagel from comment #1)
> > Created attachment 121746 [details]
> > Alien: Isolation GDB log
> 
> The memset() at line 935 of
> a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is
> 
> >   memset(last_grf_write, 0, sizeof(last_grf_write));
> 
> The backtrace shows
> 
> #1  0x00007fffe79015fe in memset (__len=482688, __ch=0,
> __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84
> 
> __len=482688?!

In my case with today's Mesa & Alien Isoation:
Thread 17 "WinMain" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f09cc122700 (LWP 23775)]
__memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
161	../sysdeps/x86_64/multiarch/memset-avx2.S: No such file or directory.
(gdb) bt
#0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
#1  0x00007f09de059b0c in memset (__len=349056, __ch=0, __dest=0x7f09cc0c98c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
#2  fs_instruction_scheduler::calculate_deps (this=0x7f09cc11ee90) at ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991

-> 349056 byte memset.

After applying patch similar to yours, game crashes in startup when compiler is using another stack array.


> last_grf_write is sized as
> 
> >   schedule_node *last_grf_write[grf_count * 16];
>
> and 482688 / 16 is 30168. So we have 30168 virtual registers?!

This is 64-bit program and pointers are 8 bytes, so 482688/16/8 = 3771, or in my case, 349056/16/8 = 2727.

After a lot of reading assembly, head scratching of how the memset() could crash when according to GDB memory addresses are fine, I went through process memory mappings and it all came clear...


By default, thread stacks are with Glibc 8MB except for the main thread. However, this game sets several of the threads stacks sizes to few hundred kB (one thread was set to only 34KB).

Process' thread stack mappings are followed by 4kB "canary" page which doesn't have read/write access rights.  The segfaults happen when Mesa crosses the boundary from 384 kB stack to that.


When I manually tried following with Gdb:
------------------------
(gdb) b pthread_attr_setstacksize
Breakpoint 6 at 0x7f7ea34f21d0: file pthreadP.h, line 631.
(gdb) c
Continuing.
[New Thread 0x7f7e84532700 (LWP 3333)]

Thread 6 "WinMain" hit Breakpoint 6, __pthread_attr_setstacksize (attr=0x7f7e85bf7890, stacksize=393216) at pthread_attr_setstacksize.c:38
38	in pthread_attr_setstacksize.c
(gdb) finish
Run till exit from #0  __pthread_attr_setstacksize (attr=0x7f7e85bf7890, stacksize=393216) at pthread_attr_setstacksize.c:38
0x0000000000b1dd75 in ?? ()
Value returned is $12 = 0
(gdb) delete 6
(gdb) print __pthread_attr_setstacksize(0x7f7e85bf7890, 4194304)
$13 = 0
(gdb) b pthread_attr_setstacksize
Breakpoint 7 at 0x7f7ea34f21d0: file pthreadP.h, line 631.
(gdb) c
------------------------

To change too small stack sizes to something larger (in this case 4MB) at run-time, the game started fine, it just takes a *long* time.


So, either this game needs some LD_PRELOAD that maps pthread_attr_setstacksize() function to a no-op, or Mesa compiler needs to be changed to use heap for anything that might be even a bit larger (which can make it a bit slower).


It's interesting why the other compilers work fine with this, are they much more frugal in their stack usage?
Comment 22 Eero Tamminen 2017-02-13 13:02:39 UTC
(In reply to Eero Tamminen from comment #21)
> So, either this game needs some LD_PRELOAD that maps
> pthread_attr_setstacksize() function to a no-op, or Mesa compiler needs to
> be changed to use heap for anything that might be even a bit larger (which
> can make it a bit slower).

Short term workaround in Mesa could be checking current thread's stack size before compilation and fixing the size, if it's set too low by the application.

As to longer term solution... Just changing compiler to do larger allocs from heap instead of using stack, will still assume certain amount of stack being free, and it's not easy to track how much each compiler commit changes that.
-> Better would be doing compilation in separate thread (i.e. where application won't mess with its stack size).
Comment 23 Eero Tamminen 2017-02-13 16:50:19 UTC
(In reply to Eero Tamminen from comment #22)
> Short term workaround in Mesa could be checking current thread's stack size
> before compilation and fixing the size, if it's set too low by the
> application.

Tried:

* Adding such functionality to _mesa_compile_shader() -> there were still crashes

* Using LD_PRELOAD for pthread_attr_setstacksize() which filters out all calls setting stack sizes <8MB -> game works fine (so doing same manually with Gdb wasn't just timing related luck)

-> I assume compiler thread isn't the only one with stack size issues

(And this isn't the only issue with this game's stack handling, its stack is both writable & executable which is security-wise nasty for anything networked.)


Btw. While testing the Mesa workaround, I bumped also into larger reg allocation:
(gdb) bt
#0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
#1  0x00007f935282bf5c in memset (__len=968064, __ch=0, __dest=0x7f934000a6c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
#2  fs_instruction_scheduler::calculate_deps (this=0x7f93400f6e90) at ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991
(gdb) print 968064/16/8
$1 = 7563

That's 1MB stack allocation for this one array... 


> As to longer term solution... Just changing compiler to do larger allocs
> from heap instead of using stack, will still assume certain amount of stack
> being free, and it's not easy to track how much each compiler commit changes
> that.
> -> Better would be doing compilation in separate thread (i.e. where
> application won't mess with its stack size).

That would hopefully also speed up AlienIsolation startup, both to main menu, and from that to actual game.  Currently it's really slow.
Comment 24 Ian Romanick 2017-02-13 19:45:12 UTC
(In reply to Eero Tamminen from comment #21)
> (In reply to Matt Turner from comment #14)
> > Created attachment 126122 [details] [review] [review] [review]
> > patch
> > 
> > (In reply to Darius Spitznagel from comment #1)
> > > Created attachment 121746 [details]
> > > Alien: Isolation GDB log
> > 
> > The memset() at line 935 of
> > a4cff18:src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp is
> > 
> > >   memset(last_grf_write, 0, sizeof(last_grf_write));
> > 
> > The backtrace shows
> > 
> > #1  0x00007fffe79015fe in memset (__len=482688, __ch=0,
> > __dest=0x7fffa216e550) at /usr/include/x86_64-linux-gnu/bits/string3.h:84
> > 
> > __len=482688?!
> 
> In my case with today's Mesa & Alien Isoation:
> Thread 17 "WinMain" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f09cc122700 (LWP 23775)]
> __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
> 161	../sysdeps/x86_64/multiarch/memset-avx2.S: No such file or directory.
> (gdb) bt
> #0  __memset_avx2 () at ../sysdeps/x86_64/multiarch/memset-avx2.S:161
> #1  0x00007f09de059b0c in memset (__len=349056, __ch=0,
> __dest=0x7f09cc0c98c0) at /usr/include/x86_64-linux-gnu/bits/string3.h:90
> #2  fs_instruction_scheduler::calculate_deps (this=0x7f09cc11ee90) at
> ../../../../../../src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp:991
> 
> -> 349056 byte memset.
> 
> After applying patch similar to yours, game crashes in startup when compiler
> is using another stack array.

Eero, are you able to collect the shaders that being compiled when the stack explodes?  Matt's suspicion is that there is another bug somewhere that causes things to go sideways.  Without seeing the shaders, it's just a guessing game.
Comment 25 Eero Tamminen 2017-02-14 11:51:01 UTC
Got an apitrace, I'll mail you & Matt a link to it.


There are multiple FS shaders that spill 20 to 50 items, and compute shaders like this:
-------------------------
43621: message: shader compiler performance issue 655: compute shader triggered register spilling.  Try reducing the number of live scalar values to improve performance.
43621: message: shader compiler issue 656: CS SIMD32 shader: 23733 inst, 0 loops, 415136 cycles, 1507:3332 spills:fills, Promoted 0 constants, compacted 379728 to 283984 bytes.
47174: message: shader compiler performance issue 828: SIMD16 shader failed to compile: CS compile failed: Failure to register allocate.  Reduce number of live scalar values to avoid this.
47174: message: shader compiler issue 829: CS SIMD8 shader: 6683 inst, 0 loops, 83406 cycles, 263:590 spills:fills, Promoted 18 constants, compacted 106928 to 72800 bytes.
-------------------------

(The first CS shader takes at least 10 min to compile before getting to game menus, and same, or similar shader takes another 10 min when starting game from the menu.)
Comment 26 Eero Tamminen 2017-02-14 15:23:37 UTC
Alien Isolation apitrace replay naturally works fine without pthread_attr_setstacksize() LD_PRELOAD hack.

I think this can be marked as NOTOURBUG.  There's no reason why game should lower its thread stack sizes.  It's 64-bit program, so it won't run out of address space, and unused stack doesn't consume any real memory because Linux uses overcommit.
Comment 27 Darius Spitznagel 2017-02-14 18:05:48 UTC
Hello Eero,

> (The first CS shader takes at least 10 min to compile before getting to game
> menus, and same, or similar shader takes another 10 min when starting game
> from the menu.)

this really confuse me.

With my Intel Haswell Iris Pro 5200 system I need less then 3 Minutes to get into the game and play (continue from save). You say you need 20 Min?
I use Matts patch of course.

What I see is that video-play and shader compilation all run on one cpu core out of 8. What a waste!
The videos (original from the movie) are all blocked by shader compilation. I don't think this is the intention from the game developers.
To split video and shader compile into at least two threads should already bring better loading times and makes total sense.
And as far as I know this is the part of the OpenGL driver and not the application.

Mesa+radeonsi seems to run fine, look here...
https://bugs.freedesktop.org/show_bug.cgi?id=93144

In Comment 31...
https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31 Edwin Smith from Feral is happy to help.
Comment 28 Eero Tamminen 2017-02-15 08:46:30 UTC
(In reply to Darius Spitznagel from comment #27)
> > (The first CS shader takes at least 10 min to compile before getting to game
> > menus, and same, or similar shader takes another 10 min when starting game
> > from the menu.)

Sorry, I should have properly measured it.   The whole startup to menu is ~9 min, and ~9 min from menu to game, on SKL i5 with default Mesa compile options (no debug, -O2).

Based on INTEL_DEBUG=perf output, I would guess that the indicated single compute shader takes maybe half of that time (and few other compute shaders are quite slow to compile too), i.e. I was off by factor of 2.


> this really confuse me.
> 
> With my Intel Haswell Iris Pro 5200 system I need less then 3 Minutes to get
> into the game and play (continue from save).
>
> I use Matts patch of course.

At least in my case, on SKL GT2 with Mesa git from earlier this week, Matt's patch isn't enough, if Alien Isolation compiles also compute shaders.  It will then just run out of stack in another function


Are you forcing GL 4.3 / compute on, like earlier? [1]

If not, which Mesa version do you use?  Haswell officially provides GL 4.2 in Mesa 17.0.0 (required FP64 came on Jan 12th), which was released 2 days ago.    Only with latest Mesa git version Haswell officially supports GL 4.5 (Jan 16th commit).  Compute shaders come in GL 4.3.

[1] Game will start, but doesn't render correctly if compute shader support is missing:
  https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31


Also, which exact CPU model do you have?


> What I see is that video-play and shader compilation all run on one cpu core
> out of 8. What a waste!
> The videos (original from the movie) are all blocked by shader compilation.
> I don't think this is the intention from the game developers.
> To split video and shader compile into at least two threads should already
> bring better loading times and makes total sense.

Game needs to know when the parallel shader compilation has finished.  That's an extension on top of GL 4.5:
https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_parallel_shader_compile.txt

Game needs to specifically have support for that.


> And as far as I know this is the part of the OpenGL driver and not the
> application.
> 
> Mesa+radeonsi seems to run fine, look here...
> https://bugs.freedesktop.org/show_bug.cgi?id=93144

That uses gallium / LLVM backend.  Maybe it uses less stack, or used at that point, and would now also run out of stack.

300-400 KB of stack is ridiculously low amount of stack if there's something that actually uses it.  Game cannot assume that driver will use less than that (game itself will use some of that stack too).
Comment 29 Darius Spitznagel 2017-02-15 18:30:15 UTC
> Are you forcing GL 4.3 / compute on, like earlier? [1]
> 
> If not, which Mesa version do you use?  Haswell officially provides GL 4.2
> in Mesa 17.0.0 (required FP64 came on Jan 12th), which was released 2 days
> ago.    Only with latest Mesa git version Haswell officially supports GL 4.5
> (Jan 16th commit).  Compute shaders come in GL 4.3.
> 
> [1] Game will start, but doesn't render correctly if compute shader support
> is missing:
>   https://bugs.freedesktop.org/show_bug.cgi?id=93144#c31

I use Mesa 17.0.0 with no override.
Mesa 17.0.0 exposes OpenGL 4.5 for Haswell. The release notes at http://mesa3d.org are totally WRONG!!!
Look here...
https://cgit.freedesktop.org/mesa/mesa/commit/?id=d2590eb65ff28a9cbd592353d15d7e6cbd2c6fc6

Clone branch mesa-17.0.0 and search for "Haswell" and you will see it.

IMPORTANT: You also need a newer kernel otherwise CS will not be exposed due to missing "pipelined register writes" support (see intel_extensions.c)
Kernel 4.8 should be fine.

So I use no magic but Matts patch and some others for Divinity: OS.

> Also, which exact CPU model do you have?

Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz

> Game needs to know when the parallel shader compilation has finished. 
> That's an extension on top of GL 4.5:
> https://www.khronos.org/registry/OpenGL/extensions/ARB/
> ARB_parallel_shader_compile.txt
> 
> Game needs to specifically have support for that.

AHA, very interesting! Thanks for this info.
 
> That uses gallium / LLVM backend.  Maybe it uses less stack, or used at that
> point, and would now also run out of stack.

I know.

> 300-400 KB of stack is ridiculously low amount of stack if there's something
> that actually uses it.  Game cannot assume that driver will use less than
> that (game itself will use some of that stack too).

A good point to look what others do.
Comment 30 Eero Tamminen 2017-02-16 12:58:45 UTC
(In reply to Darius Spitznagel from comment #29)
> I use Mesa 17.0.0 with no override.
> Mesa 17.0.0 exposes OpenGL 4.5 for Haswell. The release notes at
> http://mesa3d.org are totally WRONG!!!..
...
> Clone branch mesa-17.0.0 and search for "Haswell" and you will see it.

I see (should have checked that myself as it seemed odd).


> So I use no magic but Matts patch and some others for Divinity: OS.

Then it's indeed weird that you don't see the the stack overflow issues I'm seeing.

You aren't by any chance using Timothy's shader cache stuff?


>> Also, which exact CPU model do you have?
> 
> Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz

That cannot explain >2x shader compilation speed difference either.
Comment 31 Darius Spitznagel 2017-02-18 14:53:19 UTC
> You aren't by any chance using Timothy's shader cache stuff?

No.


In the meantime I have compiled mesa master and applied Matts patch + divinity (vendor+ARB_shading_language_include) patch.

darius@pc1:~$ glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Haswell Desktop 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.1.0-devel (git-ad019bf5c6)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:

Alien: Isolation behaves in the same way as with 17.0.0...
- no crash.
- ~3 minutes to get into game from save.

Maybe something is wrong with your build-environment?

Here are some details of my current system:
Debian 9.0 Stretch (Testing - Full freeze as of 5th February)

Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz (Hasswel GTIris Pro 5200)

8GB of RAM

libdrm 2.4.75 (needed to compile mesa master / before 2.4.74)

LLVM 3.9.1

gcc version 6.3.0 20170205 (Debian 6.3.0-6)

cat /var/log/Xorg.0.log| grep DRI
[  6530.102] (II) glamor: EGL version 1.4 (DRI2):
[  6530.211] (II) modeset(0): [DRI2] Setup complete
[  6530.211] (II) modeset(0): [DRI2]   DRI driver: i965
[  6530.211] (II) modeset(0): [DRI2]   VDPAU driver: i965
[  6530.215] (II) GLX: Initialized DRI2 GL provider for screen 0

Gnome Flashback session manager
Comment 32 Darius Spitznagel 2017-02-21 20:03:56 UTC
Hello Eero,

are there any news on this?

SKL and HSW are very different.
Did you test mesa git with HSW instead of SKL?

Maybe Xorg does play a role here? Forgot to mention my current Xorg version in last post...

xserver-xorg-core 1.19.1
Comment 33 Eero Tamminen 2017-02-22 14:34:31 UTC
(In reply to Darius Spitznagel from comment #32)
> are there any news on this?

I've provided Matt with apitrace of the startup, so he can investigate the compiler side.


> SKL and HSW are very different.
> Did you test mesa git with HSW instead of SKL?

Don't have a set up where I could try the real game on one, but apitrace replay didn't compile instruction-count-wise as large CS shaders as SKL (before it failed to run rest of the trace), this was the largest:
43621: message: shader compiler issue 653: CS SIMD16 shader: 5210 inst, 0 loops, 118772 cycles, 207:471 spills:fills, Promote
d 0 constants, compacted 83360 to 58672 bytes.


Btw. I tried Apitrace trace on BDW and as expected, it had the same issue as SKL.  Perf says time is spent in intel backend:
-----------------------
Overhead  Symbol                                          
  47,44%  ra_allocate
  15,38%  fs_visitor::virtual_grf_interferes
  12,04%  fs_visitor::assign_regs
   8,95%  ra_add_node_adjacency
   2,98%  decrement_q.isra.2
   1,63%  ra_add_node_interference
-----------------------
Which seems very similar to bug 98455.


> Maybe Xorg does play a role here? Forgot to mention my current Xorg version
> in last post...
> 
> xserver-xorg-core 1.19.1

X version should have no impact on what gets compiled in compute shader.

On BDW+, SIMD32 is needed to satisfy compute workgroup requirements, but from the Apitrace output it seems that on HSW SIMD16 is enough, and that requires a lot less registers.

-> That explains why things works on HSW, but not on anything newer.  >3x larger number of registers needs more stack to process and is *much* slower to compile (it spills a lot).


bug/show.html.tmpl processed on Feb 24, 2017 at 21:51:44.
(provided by the Example extension).