Bug 97267

Summary: [BDW] GL45-CTS.texture_cube_map_array.sampling asserts inside brw_fs.cpp
Product: Mesa Reporter: Ian Romanick <idr>
Component: Drivers/DRI/i965Assignee: Ian Romanick <idr>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: currojerez, mark.a.janes
Version: git   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Ian Romanick 2016-08-09 20:20:02 UTC
$ gdb --args ./glcts --deqp-case=GL45-CTS.texture_cube_map_array.sampling
GNU gdb (GDB) Fedora 7.10.1-31.fc23
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./glcts...done.
(gdb) r
Starting program: opengl-cts/cts/cts/glcts --deqp-case=GL45-CTS.texture_cube_map_array.sampling
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-17.fc23.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
dEQP Core GL-CTS-2.0 (0x0052484b) starting..
  target implementation = 'X11-EGL'
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable

Test case 'GL45-CTS.texture_cube_map_array.sampling'..
glcts: brw_fs.cpp:4254: void lower_sampler_logical_send_gen7(const brw::fs_builder&, fs_inst*, opcode, const fs_reg&, const fs_reg&, fs_reg, const fs_reg&, const fs_reg&, const fs_reg&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int): Assertion `inst->mlen <= 11' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff5f31a28 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-19.fc23.x86_64 elfutils-libelf-0.166-1.fc23.x86_64 elfutils-libs-0.166-1.fc23.x86_64 expat-2.1.1-2.fc23.x86_64 libattr-2.4.47-14.fc23.x86_64 libcap-2.24-8.fc23.x86_64 libgcc-5.3.1-6.fc23.x86_64 libselinux-2.4-4.fc23.x86_64 libstdc++-5.3.1-6.fc23.x86_64 mesa-libGLU-9.0.0-9.fc23.x86_64 pcre-8.39-2.fc23.x86_64 systemd-libs-222-14.fc23.x86_64 xz-libs-5.2.1-3.fc23.x86_64 zlib-1.2.8-9.fc23.x86_64
(gdb) bt
#0  0x00007ffff5f31a28 in raise () from /lib64/libc.so.6
#1  0x00007ffff5f3362a in abort () from /lib64/libc.so.6
#2  0x00007ffff5f2a227 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff5f2a2d2 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff2402051 in lower_sampler_logical_send_gen7 (bld=..., 
    inst=0x29e6960, op=SHADER_OPCODE_TXL, coordinate=..., shadow_c=..., 
    lod=..., lod2=..., sample_index=..., mcs=..., surface=..., 
    sampler=..., offset_value=..., coord_components=4, grad_components=0)
    at brw_fs.cpp:4254
#5  0x00007ffff24021f8 in lower_sampler_logical_send (bld=..., 
    inst=0x29e6960, op=SHADER_OPCODE_TEX) at brw_fs.cpp:4279
#6  0x00007ffff2402bd7 in fs_visitor::lower_logical_sends (this=
    0x7fffffff9390) at brw_fs.cpp:4432
#7  0x00007ffff2407930 in fs_visitor::optimize (this=0x7fffffff9390)
    at brw_fs.cpp:5759
#8  0x00007ffff240a001 in fs_visitor::run_cs (this=0x7fffffff9390)
    at brw_fs.cpp:6233
#9  0x00007ffff240b8f4 in brw_compile_cs (compiler=0x1ed7b40, 
    log_data=0x1ff9500, mem_ctx=0x25c3fe0, key=0x7fffffffaed0, 
    prog_data=0x7fffffffad90, src_shader=0x2a35f90, 
    shader_time_index=-1, final_assembly_size=0x7fffffffae4c, 
    error_str=0x7fffffffad88) at brw_fs.cpp:6788
#10 0x00007ffff234b355 in brw_codegen_cs_prog (brw=0x1ff9500, 
    prog=0x20fa380, cp=0x28dffe0, key=0x7fffffffaed0) at brw_cs.c:127
#11 0x00007ffff234b8ea in brw_cs_precompile (ctx=0x1ff9500, 
    shader_prog=0x20fa380, prog=0x28dffe0) at brw_cs.c:260
#12 0x00007ffff2356720 in brw_shader_precompile (ctx=0x1ff9500, 
    sh_prog=0x20fa380) at brw_link.cpp:68
#13 0x00007ffff2357086 in brw_link_shader (ctx=0x1ff9500, 
    shProg=0x20fa380) at brw_link.cpp:283
#14 0x00007ffff217f67b in _mesa_glsl_link_shader (ctx=0x1ff9500, 
    prog=0x20fa380) at program/ir_to_mesa.cpp:3070
#15 0x00007ffff201388d in _mesa_link_program (ctx=0x1ff9500, 
    shProg=0x20fa380) at main/shaderapi.c:1093
#16 0x00007ffff2014922 in _mesa_LinkProgram (programObj=193)
    at main/shaderapi.c:1594
#17 0x0000000000dc341b in glcts::TextureCubeMapArraySamplingTest::programDefinition::link (this=0x7fffffffc300)
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:5066
#18 0x0000000000dbf452 in glcts::TextureCubeMapArraySamplingTest::link (
    this=0x2037860, info=...)
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:3681
#19 0x0000000000dc2dcf in glcts::TextureCubeMapArraySamplingTest::programCollectionForFunction::init (this=0x7fffffffc300, gl=..., 
    shader_group=..., test=...)
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:4856
#20 0x0000000000dc2c4b in glcts::TextureCubeMapArraySamplingTest::programCollectionForFormat::init (this=0x7fffffffc300, gl=..., 
    shader_collection=..., test=...)
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:4796
#21 0x0000000000dc065c in glcts::TextureCubeMapArraySamplingTest::testFormats (this=0x2037860, 
    formats=std::vector of length 5, capacity 8 = {...}, 
    resolutions=std::vector of length 4, capacity 4 = {...})
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:4082
#22 0x0000000000dbed59 in glcts::TextureCubeMapArraySamplingTest::iterate
    (this=0x2037860)
    at cts/glesext/texture_cube_map_array/esextcTextureCubeMapArraySampling.cpp:3609
#23 0x00000000015462c7 in tcu::TestCaseWrapper::iterateTestCase (
    this=0x1eb44c8, testCase=0x2037860)
    at framework/common/tcuTestCaseWrapper.cpp:133
#24 0x0000000000e20a98 in glcts::TestCaseWrapper::iterateTestCase (
    this=0x1eb44c8, testCase=0x2037860)
    at cts/common/glcTestCaseWrapper.cpp:111
#25 0x00000000015479f1 in tcu::TestExecutor::iterate (this=0x1eb3c30)
    at framework/common/tcuTestExecutor.cpp:360
#26 0x000000000153ae67 in tcu::App::iterate (this=0x1eb2da0)
    at framework/common/tcuApp.cpp:137
#27 0x00000000007f9600 in main (argc=2, argv=0x7fffffffdcb8)
    at framework/platform/tcuMain.cpp:73
(gdb)
Comment 1 Ian Romanick 2016-08-09 20:21:26 UTC
Curro: git-blame says you wrote the assertion.  Could you take a look at this?
Comment 2 Francisco Jerez 2016-08-11 03:21:11 UTC
(In reply to Ian Romanick from comment #1)
> Curro: git-blame says you wrote the assertion.  Could you take a look at
> this?

I've been trying to reproduce this today, but the test passes for me...  Are you running this on mesa master or do you have additional changes applied?
Comment 3 Ian Romanick 2016-08-11 16:29:09 UTC
(In reply to Francisco Jerez from comment #2)
> (In reply to Ian Romanick from comment #1)
> > Curro: git-blame says you wrote the assertion.  Could you take a look at
> > this?
> 
> I've been trying to reproduce this today, but the test passes for me...  Are
> you running this on mesa master or do you have additional changes applied?

I started looking at this one because Mark had already observed a crash in this test on the CI.  Were you perhaps running a release build without assertions? :)  What platform are you on?  I and the CI saw this on BDW.  It looks like the CI does not encounter this on SKL.  Not sure about other platforms.

https://github.com/janesma/mesa_jenkins/blob/master/cts-test/bdw.conf
https://github.com/janesma/mesa_jenkins/blob/master/cts-test/skl.conf
Comment 4 Francisco Jerez 2016-08-11 19:23:40 UTC
(In reply to Ian Romanick from comment #3)
> (In reply to Francisco Jerez from comment #2)
> > (In reply to Ian Romanick from comment #1)
> > > Curro: git-blame says you wrote the assertion.  Could you take a look at
> > > this?
> > 
> > I've been trying to reproduce this today, but the test passes for me...  Are
> > you running this on mesa master or do you have additional changes applied?
> 
> I started looking at this one because Mark had already observed a crash in
> this test on the CI.  Were you perhaps running a release build without
> assertions? :)  What platform are you on?  I and the CI saw this on BDW.  It
> looks like the CI does not encounter this on SKL.  Not sure about other
> platforms.
> 
> https://github.com/janesma/mesa_jenkins/blob/master/cts-test/bdw.conf
> https://github.com/janesma/mesa_jenkins/blob/master/cts-test/skl.conf

Right, I was testing on SKL which doesn't crash at that point, I can reproduce the crash by running it on BDW.  I believe the reason why it works on SKL is that it now uses the TXL_LZ message instead of TXL, which reduces the payload size by two preventing it from exceeding the sampler payload limit.  I believe on Gen7-8 platforms sampling from a cubemap array with shadow comparitor on a non-FS stage has been broken since the FS back-end supports non-FS stages...  I'll send a patch.
Comment 5 Francisco Jerez 2016-08-13 05:14:53 UTC
That turned out to be the tip of the iceberg...  The TEX, TXL and TXB instructions were all trying to use SIMD16 for shadow cubemap arrays exceeding the maximum message size supported by the sampler, and TG4_OFFSET was falling back to SIMD8 incorrectly in some cases.  It should be fixed by this series I just sent to the mailing list:

https://lists.freedesktop.org/archives/mesa-dev/2016-August/125922.html
Comment 6 Francisco Jerez 2016-08-16 23:38:47 UTC
Should be fixed in master now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.