Bug 36919 - Yo Frankie: Crash in dst_register
Summary: Yo Frankie: Crash in dst_register
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: mesa-dev
QA Contact:
URL: http://www.yofrankie.org/
Whiteboard:
Keywords:
: 36935 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-05-06 12:57 UTC by Sven Arvidsson
Modified: 2011-05-11 06:25 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
backtrace (15.20 KB, text/plain)
2011-05-06 12:57 UTC, Sven Arvidsson
Details
Debug log (442.45 KB, text/plain)
2011-05-08 12:14 UTC, Sven Arvidsson
Details
apitrace of yo frankie (950.73 KB, application/x-gzip)
2011-05-08 13:08 UTC, Sven Arvidsson
Details
apitrace log (921.21 KB, application/x-gzip)
2011-05-09 12:39 UTC, Sven Arvidsson
Details

Description Sven Arvidsson 2011-05-06 12:57:38 UTC
Created attachment 46407 [details]
backtrace

When the game "Yo Frankie" is run with shaders enabled, it crashes in dst_register. Same problem with both r600g and llvmpipe.

Pretty sure this is a regression so I will bisect when I have time.


Program received signal SIGSEGV, Segmentation fault.
0xb676807e in dst_register (ctx=0x8975dd8, procType=0, ureg=0x104a03c8, 
    program=0x1073cf68, numInputs=2, inputMapping=0xbfffe888, 
    inputSemanticName=0xbfffe92c "\001\003x\266P\346\217\b\020", 
    inputSemanticIndex=0xbfffe94c "", interpMode=0xbfffe808, numOutputs=0, 
    outputMapping=0xbfffe900, 
    outputSemanticName=0xbfffe96c "\322\365t\266\360\370\215\b(Ś\b\017", 
    outputSemanticIndex=0xbfffe98c "\230\351\377\277@,_\017j\313ѿ", 
    passthrough_edgeflags=0 '\000') at state_tracker/st_mesa_to_tgsi.c:201
201	      return t->outputs[t->outputMapping[index]];
Comment 1 Sven Arvidsson 2011-05-06 13:07:17 UTC
Forgot to mention, this is using git master 27d3e0b25cc3f2bd9f72778f0c9f54cb90c48622
Comment 2 Alex Deucher 2011-05-07 08:11:30 UTC
*** Bug 36935 has been marked as a duplicate of this bug. ***
Comment 3 Sven Arvidsson 2011-05-07 13:33:48 UTC
I'm getting a very similar crash with the game Eternal Lands.
http://www.eternal-lands.com/
Comment 4 Brian Paul 2011-05-08 06:44:13 UTC
Can you run w/ a debug build and this env var:

export ST_DEBUG=mesa,tgsi

This will print the shader/program code to stdout.  Please attach that output to this bug.
Comment 5 Sven Arvidsson 2011-05-08 12:14:02 UTC
Created attachment 46451 [details]
Debug log

(In reply to comment #4)
> Can you run w/ a debug build and this env var:
> 
> export ST_DEBUG=mesa,tgsi
> 
> This will print the shader/program code to stdout.  Please attach that output
> to this bug.

Debug log is attached.
Comment 6 Brian Paul 2011-05-08 12:43:58 UTC
The last fragment shader looks suspicious:

# Fragment Program/Shader 0
  0: MAD_SAT TEMP[1].x, INPUT[3].xxxx, STATE[0].xxxx, STATE[0].yyyy;
  1: LRP OUTPUT[2].xyz, TEMP[1].xxxx, TEMP[0], STATE[1];
  2: MOV OUTPUT[2].w, TEMP[0];
  3: END

There are two reads from the TEMP[0] register, but it's never written to.  Though, this isn't exactly what the assertion is complaining about.

Since the Shader ID/number is zero, I think this is probably coming from the fixed-function fragment shader generation code.

Would you be willing to grab apitrace (https://github.com/apitrace/apitrace) and use that to get a trace of GL calls?  I could use that to debug this.  Otherwise, I'll have to download/install Yo Frankie some other day (I'm on a slow connection at the airport today).

If this is a regression, bisecting would be good too.
Comment 7 Sven Arvidsson 2011-05-08 13:08:21 UTC
Created attachment 46452 [details]
apitrace of yo frankie
Comment 8 Sven Arvidsson 2011-05-08 13:21:55 UTC
> Would you be willing to grab apitrace (https://github.com/apitrace/apitrace)
> and use that to get a trace of GL calls?  I could use that to debug this. 
> Otherwise, I'll have to download/install Yo Frankie some other day (I'm on a
> slow connection at the airport today).
> 
> If this is a regression, bisecting would be good too.

Sure, I've been meaning to give apitrace a try. The attached log is the output of tracedump, the complete trace is available here:
http://dl.dropbox.com/u/28577999/yofrankie-linux-i386.bin.trace.7z

I'll probably have time to try bisecting sometime tomorrow.
Comment 9 Jose Fonseca 2011-05-09 06:26:24 UTC
There was a regression in apitrace over the last couple of days (mea culpa), and this trace was affected by it -- it shows bogus calls such as:

  glCullFace(mode = GLX_RGBA)

So I'm afraid it is not reliable. A new trace will be necessary...
Comment 10 Sven Arvidsson 2011-05-09 12:39:12 UTC
Created attachment 46502 [details]
apitrace log

(In reply to comment #9)
> There was a regression in apitrace over the last couple of days (mea culpa),
> and this trace was affected by it -- it shows bogus calls such as:
> 
>   glCullFace(mode = GLX_RGBA)
> 
> So I'm afraid it is not reliable. A new trace will be necessary...

Thanks for letting us know! :) A new log has been attached, and a new trace available here:
http://dl.dropbox.com/u/28577999/yofrankie-linux-i386.bin.trace.7z
Comment 11 Jose Fonseca 2011-05-09 15:41:08 UTC
Thanks.  This trace is better, but it will cause an unrelated crash because Mesa's glVertexAttribPointer and glVertexAttribPointerARB functions don't alias, and apitrace wasn't expecting this.  I'll make a fix for it.

We're making progress on glretrace debugging but I'm afraid we're not progressing on this bug. I think it's better to put apitrace aside for this bug, at least until I get it working correctly with Yo Frankie.
Comment 12 Jose Fonseca 2011-05-10 05:46:02 UTC
I reproduced the bug here:

(gdb) bt
#0  0x00007ffff681aaa7 in _debug_assert_fail (expr=0x7ffff70285d7 "reg.File != TGSI_FILE_NULL", 
    file=0x7ffff70285b0 "src/gallium/auxiliary/tgsi/tgsi_ureg.h", line=863, function=0x7ffff7028c51 "ureg_writemask")
    at src/gallium/auxiliary/util/u_debug.c:282
#1  0x00007ffff6765607 in ureg_writemask (reg=..., writemask=7) at src/gallium/auxiliary/tgsi/tgsi_ureg.h:863
#2  0x00007ffff6766667 in translate_dst (t=0x7fffffffc1d0, DstReg=0xd217f34, saturate=0 '\000') at src/mesa/state_tracker/st_mesa_to_tgsi.c:310
#3  0x00007ffff6767699 in compile_instruction (t=0x7fffffffc1d0, inst=0xd217f18) at src/mesa/state_tracker/st_mesa_to_tgsi.c:673
#4  0x00007ffff67691bb in st_translate_mesa_program (ctx=0xf444b0, procType=0, ureg=0xf3ef2d0, program=0x111b7010, numInputs=2, 
    inputMapping=0x7fffffffd3a0, inputSemanticName=0x7fffffffd4b0 "\001\003", <incomplete sequence \372>, inputSemanticIndex=0x7fffffffd490 "", 
    interpMode=0x7fffffffd320, numOutputs=0, outputMapping=0x7fffffffd420, outputSemanticName=0x7fffffffd470 "\002", 
    outputSemanticIndex=0x7fffffffd450 "\240\324\377\377\377\177", passthrough_edgeflags=0 '\000') at src/mesa/state_tracker/st_mesa_to_tgsi.c:1230
#5  0x00007ffff66a48c0 in st_translate_fragment_program (st=0xf94810, stfp=0x111b7010, key=0x7fffffffd5a0) at src/mesa/state_tracker/st_program.c:609
#6  0x00007ffff66a4a23 in st_get_fp_variant (st=0xf94810, stfp=0x111b7010, key=0x7fffffffd5a0) at src/mesa/state_tracker/st_program.c:667
#7  0x00007ffff674dc6d in update_fp (st=0xf94810) at src/mesa/state_tracker/st_atom_shader.c:86
#8  0x00007ffff6749e05 in st_validate_state (st=0xf94810) at src/mesa/state_tracker/st_atom.c:196
#9  0x00007ffff67559ed in st_Clear (ctx=0xf444b0, mask=16) at src/mesa/state_tracker/st_cb_clear.c:497
#10 0x00007ffff66cb7df in _mesa_Clear (mask=256) at src/mesa/main/clear.c:250
#11 0x00000000006fbd36 in KX_KetsjiEngine::RenderShadowBuffers(KX_Scene*) ()
#12 0x00000000006fc1f3 in KX_KetsjiEngine::Render() ()
#13 0x000000000055ef12 in GPG_Application::processEvent(GHOST_IEvent*) ()
#14 0x000000000056be14 in GHOST_EventManager::dispatchEvent(GHOST_IEvent*) ()
#15 0x000000000056bd1d in GHOST_EventManager::dispatchEvent() ()
#16 0x000000000056bd6c in GHOST_EventManager::dispatchEvents() ()
#17 0x000000000056b548 in GHOST_System::dispatchEvents() ()
#18 0x000000000055d906 in main ()
(gdb) call _mesa_print_instruction(&program->Instructions[i])
LRP OUTPUT[2].xyz, TEMP[1].xxxx, TEMP[0], STATE[1];
(gdb) call _mesa_print_program(program)
# Fragment Program/Shader 0
  0: MAD_SAT TEMP[1].x, INPUT[3].xxxx, STATE[0].xxxx, STATE[0].yyyy;
  1: LRP OUTPUT[2].xyz, TEMP[1].xxxx, TEMP[0], STATE[1];
  2: MOV OUTPUT[2].w, TEMP[0];
  3: END
(gdb) 

The problem is that stfp->Base.Base.OutputsWritten is 0, which causes st_translate_mesa_program() is receiving a bad numOutputs=0, and eventually the assertion.


(gdb) p stfp->Base.Base.OutputsWritten
$18 = 0
(gdb) p stfp->Base.Base
$19 = {
  Id = 0, 
  String = 0x0, 
  RefCount = 3, 
  Target = 34820, 
  Format = 34933, 
  Resident = 1 '\001', 
  Instructions = 0xd217ed0, 
  InputsRead = 10, 
  OutputsWritten = 0, 
  SystemValuesRead = 0, 
  InputFlags = {0 <repeats 32 times>}, 
  OutputFlags = {0 <repeats 64 times>}, 
  TexturesUsed = {0 <repeats 16 times>}, 
  SamplersUsed = 0, 
  ShadowSamplers = 0, 
  Parameters = 0xc8e2a90, 
  LocalParams = {{0, 0, 0, 0} <repeats 1024 times>}, 
  Varying = 0x0, 
  Attributes = 0x0, 
  SamplerUnits = "\000\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017", 
  SamplerTargets = {TEXTURE_BUFFER_INDEX <repeats 16 times>}, 
  IndirectRegisterFiles = 0, 
  NumInstructions = 4, 
  NumTemporaries = 2, 
  NumParameters = 0, 
  NumAttributes = 0, 
  NumAddressRegs = 0, 
  NumAluInstructions = 1, 
  NumTexInstructions = 0, 
  NumTexIndirections = 1, 
  NumNativeInstructions = 0, 
  NumNativeTemporaries = 0, 
  NumNativeParameters = 0, 
  NumNativeAttributes = 0, 
  NumNativeAddressRegs = 0, 
  NumNativeAluInstructions = 0, 
  NumNativeTexInstructions = 0, 
  NumNativeTexIndirections = 0
}
Comment 13 Jose Fonseca 2011-05-11 06:25:10 UTC
I've managed to get a gltrace that captures the assertion failure by recording the app with AMD's fglrx driver:

  http://people.freedesktop.org/~jrfonseca/traces/yofrankie-1.1b-fglrx.trace

And I've run an automated git bisect with it:

  git bisect start 2812b60a84a2992516db28b37b305c2a7eb5f424 5d7c27f5ec2f30c264dc2d53c4980970b3a13ee5
  git bisect run /path/to/glretrace-bisect.py --build 'scons -j4 libgl-xlib' -- -b -db /srv/shared/automation/traces/yofrankie-1.1b-fglrx.trace

(I'll commit the glretrace-bisect.py script after cleanup it up)
and got

3aa21f93dc1329c6f956277f2746c2a0bdae5446 is the first bad commit
commit 3aa21f93dc1329c6f956277f2746c2a0bdae5446
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Fri Apr 15 18:38:54 2011 -0700

    mesa: Fix bugs in ff fragment shader fog handling
    
    This patch fixes two bugs related to fog in the fixed-function
    fragment shader generation code.
    
    Fog was only lowered to instructions if MRTs were used.  The fragment
    shader assembler always lowers "fog option" code to instructions, and
    many drivers (e.g., r300) expect this.
    
    When fog lowering did happen, it was after the instruction count was
    checked against implementation limits.  Since fog lowering may add up
    to 5 instructions, a program that was below the limits before lowering
    may exceed the limits after lowering.
    
    NOTE: This is a candidate for the stable branches.
    
    Reviewed-by: Eric Anholt <eric@anholt.net>
    Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
    Acked-by: Alex Deucher <alexdeucher@gmail.com>

It's a simple change, and finding the fix was trivial:

commit 7a271151a5d6990c8325008f19ab621d730d06c4
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed May 11 14:01:17 2011 +0100

    mesa: Don't append fog code for programs that don't output color.
    
    Fixes fdo 36919.
    
    NOTE: This is a candidate for the stable branches.
    
    It should be cherry-picked to the sames branches that
    3aa21f93dc1329c6f956277f2746c2a0bdae5446 was.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.