Bug 27512

Summary: Illegal instruction _mesa_x86_64_transform_points4_general
Product: Mesa Reporter: John Wimer <john>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: attachment-30621-0.html
Use prefetcht1 instead of prefetch[w]

Description John Wimer 2010-04-07 05:09:31 UTC
Version is 7.7-4ubuntu1 on Pentium 4 with 82945G/GZ Integrated Graphics Controller, running Ubuntu 10.04beta1 x86-64.

The crash occurs when starting a movie in XBMC, which calls glDisable(GL_FRAGMENT_PROGRAM_ARB).

Program received signal SIGILL, Illegal instruction.
0x00007fffe8d41d09 in _mesa_x86_64_transform_points4_general ()
   from /usr/lib/dri/i915_dri.so
(gdb) bt
#0  0x00007fffe8d41d09 in _mesa_x86_64_transform_points4_general ()
   from /usr/lib/dri/i915_dri.so
#1  0x00007fffe8c5c5cf in ?? () from /usr/lib/dri/i915_dri.so
#2  0x00007fffe8c4ea12 in _tnl_run_pipeline () from /usr/lib/dri/i915_dri.so
#3  0x00007fffe8ba52f8 in ?? () from /usr/lib/dri/i915_dri.so
#4  0x00007fffe8c4f8dc in _tnl_draw_prims () from /usr/lib/dri/i915_dri.so
#5  0x00007fffe8c4fbe6 in _tnl_vbo_draw_prims () from /usr/lib/dri/i915_dri.so
#6  0x00007fffe8c47fb5 in vbo_exec_vtx_flush () from /usr/lib/dri/i915_dri.so
#7  0x00007fffe8c43ad5 in vbo_exec_FlushVertices_internal ()
   from /usr/lib/dri/i915_dri.so
#8  0x00007fffe8c43ba2 in vbo_exec_FlushVertices ()
   from /usr/lib/dri/i915_dri.so
#9  0x00007fffe8be632c in _mesa_set_enable () from /usr/lib/dri/i915_dri.so
#10 0x0000000000a3f542 in Shaders::CARBShaderProgram::Disable (
    this=0x7fffdc150d48) at Shader.cpp:496
#11 0x000000000099dc60 in CLinuxRendererGL::RenderSinglePass (this=0x20c8790, 
    index=<value optimized out>, field=<value optimized out>)
    at LinuxRendererGL.cpp:1289
#12 0x000000000099eac1 in CLinuxRendererGL::RenderUpdate (this=0x20c8790, 
    clear=true, flags=0, alpha=255) at LinuxRendererGL.cpp:760
#13 0x000000000099b6e4 in CXBMCRenderManager::PresentSingle (this=0x10e5fc0)
    at RenderManager.cpp:435
#14 0x000000000099be8a in CXBMCRenderManager::Present (this=0x10e5fc0)


Someone observed this same crash one year ago: http://haxordbox.com/index.php?option=content&task=view&id=79

Apologies for not having full debug symbols, that appears to be a separate problem in Ubuntu packaging.
GDB issues this warning:
warning: the debug information found in "/usr/lib/debug//usr/lib/dri/i915_dri.so" does not match "/usr/lib/dri/i915_dri.so" (CRC mismatch).
Comment 1 Karl Schultz 2010-05-14 16:21:01 UTC
Possible workaround is to set the MESA_NO_ASM env var.

Further, I noted in the xform4.S file that there are several instances of

.byte 0x66, 0x66, 0x90		/* manual align += 3 */

but one instance of

.byte 0x66, 0x66, 0x66, 0x90		/* manual align += 3 */

in the failing _mesa_x86_64_transform_points4_general function.

I think that the point of these lines is to insert 3 nops, actually a single noop preceded by operand size override opcodes, for alignment.  The single instance noted above inserts 4 bytes, probably not what was intended.

Just a wild guess; hope it helps.
Comment 2 Chris Wilson 2010-07-19 10:23:00 UTC
Reassigning away from the driver as it appears to be an issue in the common x86-64 assembly, and hopefully someone more knowledgeable will be able to give a definite answer.
Comment 3 Chris Wilson 2010-07-25 14:46:21 UTC
*** Bug 29245 has been marked as a duplicate of this bug. ***
Comment 4 Marek Olšák 2011-03-02 06:14:33 UTC
Is this still an issue with the current Mesa master branch?
Comment 5 John Wimer 2011-03-02 07:23:13 UTC
(In reply to comment #4)
> Is this still an issue with the current Mesa master branch?

I haven't got hardware to test it on, but the code hasn't changed except for this, [http://cgit.freedesktop.org/mesa/mesa/commit/?id=3fda80246f0c41edebdfb4b1ce35bb4726a8c521] and I don't think that is related.
Comment 6 Michael Harder 2016-01-05 07:13:57 UTC
I am experiencing a crash with a SIGILL, Illegal instruction in Debian when using Kodi.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/x86_64-linux-gnu/kodi/kodi.bin --standalone'.
Program terminated with signal SIGILL, Illegal instruction.
#0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
72              prefetch 16(%rdx)
[Current thread is 1 (Thread 0x7f9054aeb9c0 (LWP 791))]


Thread 1 (Thread 0x7f9054aeb9c0 (LWP 791)):
#0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
#1  0x00007f902577102d in run_vertex_stage (ctx=0x1ae3248, stage=<optimized out>) at
 tnl/t_vb_vertex.c:160
#2  0x00007f902575fc62 in _tnl_run_pipeline (ctx=ctx@entry=0x1ae3248) at tnl/t_pipel
ine.c:241
#3  0x00007f90258f856f in intelRunPipeline (ctx=0x1ae3248) at intel_tris.c:1086
#4  0x00007f902575f27c in _tnl_draw_prims (ctx=0x1ae3248, prim=0x1b53938, nr_prims=1
, ib=0x0, index_bounds_valid=<optimized out>, min_index=0, max_index=7, tfb_vertcoun
t=0x0, stream=0, indirect=0x0) at tnl/t_draw.c:521
#5  0x00007f9025745504 in vbo_exec_vtx_flush (exec=0x1b53158, keepUnmapped=keepUnmap
ped@entry=0 '\000') at vbo/vbo_exec_draw.c:422
#6  0x00007f902572732f in vbo_exec_wrap_buffers (exec=exec@entry=0x1b53158) at vbo/vbo_exec_api.c:104
#7  0x00007f90257278e3 in vbo_exec_wrap_upgrade_vertex (exec=0x1b53158, attr=attr@entry=3, newSize=newSize@entry=4) at vbo/vbo_exec_api.c:280
#8  0x00007f9025727e73 in vbo_exec_fixup_vertex (ctx=ctx@entry=0x1ae3248, attr=attr@entry=3, newSize=newSize@entry=4, newType=newType@entry=5126) at vbo/vbo_exec_api.c:406
#9  0x00007f902572fe6e in vbo_Color4f (x=<optimized out>, y=<optimized out>, z=<optimized out>, w=<optimized out>) at vbo/vbo_attrib_tmp.h:402
#10 0x00000000009a7535 in CLinuxRendererGL::RenderUpdate(bool, unsigned int, unsigned int) ()
#11 0x000000000099ff84 in CXBMCRenderManager::PresentSingle(bool, unsigned int, unsigned int) ()
#12 0x00000000009a02f2 in CXBMCRenderManager::Render(bool, unsigned int, unsigned int, bool) ()
#13 0x0000000000eb63a8 in CGUIWindowFullScreen::Render() ()
#14 0x000000000081f239 in CGUIControl::DoRender() ()
#15 0x00000000008008a4 in CGUIWindow::DoRender() ()
#16 0x000000000080661e in CGUIWindowManager::RenderPass() const ()
#17 0x0000000000806853 in CGUIWindowManager::Render() ()
#18 0x0000000000d09d33 in CApplication::RenderNoPresent() ()
#19 0x0000000000d0df31 in CApplication::Render() ()
#20 0x0000000000dae551 in CXBApplicationEx::Run() ()
#21 0x0000000000db3dfb in XBMC_Run ()
#22 0x00000000006cb2e8 in main ()
Comment 7 Roland Scheidegger 2016-01-05 15:22:29 UTC
(In reply to Michael Harder from comment #6)
> I am experiencing a crash with a SIGILL, Illegal instruction in Debian when
> using Kodi.
> 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/lib/x86_64-linux-gnu/kodi/kodi.bin --standalone'.
> Program terminated with signal SIGILL, Illegal instruction.
> #0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
> 72              prefetch 16(%rdx)
> [Current thread is 1 (Thread 0x7f9054aeb9c0 (LWP 791))]
> 

Oh what cpu? As far as I can tell, intel cpus never supported "prefetch" (new ones support prefetchw which is the same opcode with different modr/m), only prefetcht0/t1/t2/nta so I wonder how this is supposed to work.
Comment 8 Michael Harder 2016-01-05 19:04:28 UTC
From 'cat /proc/cpuinfo'
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz

and from 'lspci'
VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)
Comment 9 Roland Scheidegger 2016-01-05 20:27:29 UTC
(In reply to Michael Harder from comment #8)
> From 'cat /proc/cpuinfo'
> model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz


Using any special build flags? As said I can't see how this code could work with intel cpus. There's other functions which should work (like _mesa_sse_transform_points4_general) albeit these might be working in 32bit builds only. Not my area of expertise...
At a quick glance USE_X86_64_ASM actually might be defined by default, but this particular cpu instruction just doesn't look like it could run on intel cpus. Unless some cpus tolerate that instruction even if the manuals don't say so (would not be all that surprising even, seems the OP also had a P4, so maybe all later cpus support prefetch/prefetchw for some reason regardless...). If so the code should be fixed up (replacing prefetch/prefetchw with one of prefetcht0/t1/t2/nta, these should run on all x86_64 capable cpus). I don't really know that code, though...
Comment 10 Patrick Baggett 2016-01-05 20:57:06 UTC
Created attachment 120821 [details]
attachment-30621-0.html

Given that there is a _mesa_3dnow_transform_points4_2d in the x86-64 asm
(using MMX/3DNow! is deprecated in x86-64), it appears that this code was
copy-pasted. I wrote a quick patch to change prefetch[w] to prefetcht1,
which is more or less the equivalent in SSE. However, I'm not actually sure
those prefetches really benefit the code since they appear to be monotonic
addresses and hinting only 16 bytes ahead (a cache line is almost always at
least 32 bytes) -- maybe that sort of testing is for another day.
Comment 11 Patrick Baggett 2016-01-05 21:09:57 UTC
Created attachment 120822 [details] [review]
Use prefetcht1 instead of prefetch[w]

This should fix the SIGILL when running this code. It replaces 3DNow! prefetch[w] instructions with SSE prefetcht1. I'm still not convinced that the prefetch logic actually has any beneficial performance characteristics.
Comment 12 Roland Scheidegger 2016-01-05 21:13:53 UTC
(In reply to Patrick Baggett from comment #10)
> Created attachment 120821 [details]
> attachment-30621-0.html
> 
> Given that there is a _mesa_3dnow_transform_points4_2d in the x86-64 asm
> (using MMX/3DNow! is deprecated in x86-64), it appears that this code was
> copy-pasted. I wrote a quick patch to change prefetch[w] to prefetcht1,
> which is more or less the equivalent in SSE. However, I'm not actually sure
> those prefetches really benefit the code since they appear to be monotonic
> addresses and hinting only 16 bytes ahead (a cache line is almost always at
> least 32 bytes) -- maybe that sort of testing is for another day.

I'd agree that it's dubious that "modern" cpus would benefit - as you said addresses are monotonic and certainly hw prefetchers should handle that pretty well. Though you could argue someone might still use some cpus with terrible prefetchers, and the prefetch instructions should not hurt (at least not much) on modern cpus neither...
Comment 13 Roland Scheidegger 2016-01-05 21:19:11 UTC
(In reply to Patrick Baggett from comment #11)
> Created attachment 120822 [details] [review] [review]
> Use prefetcht1 instead of prefetch[w]
> 
> This should fix the SIGILL when running this code. It replaces 3DNow!
> prefetch[w] instructions with SSE prefetcht1. I'm still not convinced that
> the prefetch logic actually has any beneficial performance characteristics.

I don't see much point though in replacing the prefetch[w] instructions in the 3dnow functions however. Though I'd guess since this is x86_64 it should still work...
Comment 14 Michael Harder 2016-01-05 22:07:35 UTC
(In reply to Patrick Baggett from comment #11)
> Created attachment 120822 [details] [review] [review]
> Use prefetcht1 instead of prefetch[w]
> 
> This should fix the SIGILL when running this code. It replaces 3DNow!
> prefetch[w] instructions with SSE prefetcht1. I'm still not convinced that
> the prefetch logic actually has any beneficial performance characteristics.

Thank you. This has resolved the issue for me.
Comment 15 Michael Harder 2016-01-12 01:39:08 UTC
It worked for a few days but now I get this:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/x86_64-linux-gnu/kodi/kodi.bin --standalone'.
Program terminated with signal SIGILL, Illegal instruction.
#0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
72              prefetcht1 16(%rdx)
[Current thread is 1 (Thread 0x7fd24af779c0 (LWP 797))]
Comment 16 Michael Harder 2016-01-26 03:44:16 UTC
I've been able to reinstall and get it working with the patch again. Not sure what I was doing wrong before. Do I need to do anything to move this along?
Comment 17 Timothy Arceri 2016-02-03 11:28:43 UTC
(In reply to Michael Harder from comment #15)
> It worked for a few days but now I get this:
> 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/lib/x86_64-linux-gnu/kodi/kodi.bin --standalone'.
> Program terminated with signal SIGILL, Illegal instruction.
> #0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
> 72              prefetcht1 16(%rdx)
> [Current thread is 1 (Thread 0x7fd24af779c0 (LWP 797))]

I ran into this problem with my new old hardware I've been playing with recently.

The problem can be reproduced running a number of piglit tests such as:

./bin/fbo-stencil readpixels GL_DEPTH24_STENCIL8 -auto -fbo


The patch doesn't fix the problem as it seems prefetcht1 doesn't like offsets.
If I change for example prefetcht1 16(%rdx) -> prefetcht1 (%rdx) removing the offset for all instances the piglit will now pass. Not sure how to work around this problem.
Comment 18 Roland Scheidegger 2016-02-03 17:11:55 UTC
(In reply to Timothy Arceri from comment #17)
> (In reply to Michael Harder from comment #15)
> > It worked for a few days but now I get this:
> > 
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Core was generated by `/usr/lib/x86_64-linux-gnu/kodi/kodi.bin --standalone'.
> > Program terminated with signal SIGILL, Illegal instruction.
> > #0  _mesa_x86_64_transform_points4_general () at x86-64/xform4.S:72
> > 72              prefetcht1 16(%rdx)
> > [Current thread is 1 (Thread 0x7fd24af779c0 (LWP 797))]
> 
> I ran into this problem with my new old hardware I've been playing with
> recently.
> 
> The problem can be reproduced running a number of piglit tests such as:
> 
> ./bin/fbo-stencil readpixels GL_DEPTH24_STENCIL8 -auto -fbo
> 
> 
> The patch doesn't fix the problem as it seems prefetcht1 doesn't like
> offsets.
> If I change for example prefetcht1 16(%rdx) -> prefetcht1 (%rdx) removing
> the offset for all instances the piglit will now pass. Not sure how to work
> around this problem.

That doesn't make sense to me. The offset is just part of the memory operand. Unless the assembler encodes it wrong I can't see why that wouldn't work (which I would think to be unlikely, but the locality hints are also encoded into the mod r/m byte - what's the encoding of the instruction?)
I suppose a solution would just be to ditch prefetch - as was pointed out it's not really far ahead enough in any case, even k8 and p4 had primitive hw prefetchers which should make such a simple software prefetch completely unnecessary.
Comment 19 Timothy Arceri 2016-02-03 21:10:04 UTC
(In reply to Roland Scheidegger from comment #18)
> 
> That doesn't make sense to me. The offset is just part of the memory
> operand. Unless the assembler encodes it wrong I can't see why that wouldn't
> work (which I would think to be unlikely, but the locality hints are also
> encoded into the mod r/m byte - what's the encoding of the instruction?)
> I suppose a solution would just be to ditch prefetch - as was pointed out
> it's not really far ahead enough in any case, even k8 and p4 had primitive
> hw prefetchers which should make such a simple software prefetch completely
> unnecessary.

I shouldn't play with asm passed my bedtime ... I was having trouble with the patch applying so recreated it myself. Seems the problem was I missed one instruction. All works well once fixed so I've sent that patch to the mailing list.
Comment 20 Timothy Arceri 2016-02-04 11:05:28 UTC
Fix pushed.

commit	9c78cfd547a69f6f45d7acaa8ade681640caee95

mesa: Use SSE prefetch instructions rather than 3DNow instructions

64-bit Pentium 4 CPUs don't have the 3DNow prefetch instructions
which results in an Illegal instruction crash.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Timothy Arceri <t_arceri@yahoo.com.au>
https://bugs.freedesktop.org/show_bug.cgi?id=27512

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.