2516 – some rasterization fallbacks cause segfaults

Bug 2516 - some rasterization fallbacks cause segfaults

Summary: some rasterization fallbacks cause segfaults

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/r200 (show other bugs)
Version:	git
Hardware:	x86 (IA32) Linux (All)

Importance:	high normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	2593 (view as bug list)
Depends on:
Blocks:

Reported:	2005-02-09 15:54 UTC by Roland Scheidegger
Modified:	2009-08-24 12:23 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
possible fix for rasterization fallbacks (1.28 KB, patch) 2005-03-09 15:11 UTC, Roland Scheidegger	Details \| Splinter Review
View All

Description Roland Scheidegger 2005-02-09 15:54:01 UTC

the use of texture borders causes segfaults with the r200 driver. Not that
anyone would use that feature, I just discovered that accidentally. Start
texdown, hit "b", boom.
Actually, at one time I did not get a segfault, but a drm error instead
([drm:radeon_cp_dispatch_texture] *ERROR* EFAULT). Since texture borders should
cause a fallback afaik, I'm not sure how that happened.
As pretty much always when something goes wrong in the pixel fallback path, I
can't get a backtrace from that neither since the segfault happens behind a
LOCK_HARDWARE.

Comment 1 Roland Scheidegger 2005-02-09 17:57:20 UTC

Some more digging found this is not only related to texture borders, other
rasterization fallbacks suffer from that as well (easily tested with texenv,
just changed one of the texture wrap modes to GL_CLAMP_TO_BORDER which causes a
rast fallback). Summary changed accordingly.
It reaches the r200Fallback path just fine, and then strangely later dies in
_tnl_run_pipeline called from r200WrapRunPipeline.

Comment 2 Roland Scheidegger 2005-02-09 18:29:30 UTC

Simply calling UNLOCK_HARDWARE in r200SpanRenderStart was all that was needed to
get this locally debuggable.

That said, I have no idea what is going on...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 988616000 (LWP 8096)]
_tnl_build_vertices (ctx=0x8060990, start=0, end=0, newinputs=0) at
tnl/t_vertex.c:1379
1379             a[j].inputstride = vptr->stride;
(gdb) bt
#0  _tnl_build_vertices (ctx=0x8060990, start=0, end=0, newinputs=0)
    at tnl/t_vertex.c:1379
#1  0x3afb4fe3 in run_render (ctx=0x8060990, stage=0x0) at tnl/t_vb_render.c:296
#2  0x3afa5fde in _tnl_run_pipeline (ctx=0x8060990) at tnl/t_pipeline.c:159
#3  0x3af2a92f in r200WrapRunPipeline (ctx=0x8060990) at r200_state.c:2316
#4  0x3afc340a in _tnl_flush_vtx (ctx=0x8060990) at tnl/t_vtx_exec.c:282
#5  0x3afbe8a8 in _tnl_FlushVertices (ctx=0x8060990, flags=1) at tnl/t_vtx_api.c:838
#6  0x3af3ca4c in r200FlushVertices (ctx=0x8060990, flags=1) at r200_swtcl.c:906
#7  0x3af8d6af in _mesa_TexImage2D (target=3553, level=0, internalFormat=6407,
    width=258, height=258, border=1, format=6407, type=5121, pixels=0x41180008)
    at main/teximage.c:2096
#8  0x08049321 in MeasureDownloadRate () at texdown.c:163
#9  0x080495b2 in Display () at texdown.c:256
#10 0x3aaeafa1 in fghcbDisplayWindow () from /usr/lib/libglut.so.3
#11 0x3aaee36a in fgEnumWindows () from /usr/lib/libglut.so.3
#12 0x3aaeb2ef in glutMainLoopEvent () from /usr/lib/libglut.so.3
#13 0x3aaebbf5 in glutMainLoop () from /usr/lib/libglut.so.3
#14 0x0804999c in main (argc=1, argv=0xaffff1c4) at texdown.c:384

Comment 3 Roland Scheidegger 2005-02-25 10:22:25 UTC

*** Bug 2593 has been marked as a duplicate of this bug. ***

Comment 4 Roland Scheidegger 2005-03-04 16:48:49 UTC

Some more debugging info (still from texdown):
(gdb) print count
$13 = 3
(gdb) print ((TNLcontext *)ctx->swtnl_context)->clipspace->attr[0]
$14 = {attrib = 0, format = 6, vertoffset = 0, vertattrsize = 16,
  inputptr = 0x8207b90 "", inputstride = 16, insert = 0x3b0d0db0,
  emit = 0x3afeb3a0 <insert_4f_3>, extract = 0x3afec7d0 <extract_4f_viewport>,
  vp = 0x808cd90}
(gdb) print ((TNLcontext *)ctx->swtnl_context)->clipspace->attr[1]
$15 = {attrib = 3, format = 15, vertoffset = 144, vertattrsize = 4,
  inputptr = 0x80614ec "", inputstride = 0, insert = 0x3b0d0eac,
  emit = 0x3afeb910 <insert_4ub_4f_rgba_3>,
  extract = 0x3afec990 <extract_4chan_4f_rgba>, vp = 0x808cd90}
(gdb) print ((TNLcontext *)ctx->swtnl_context)->clipspace->attr[2]
$16 = {attrib = 8, format = 3, vertoffset = 16, vertattrsize = 16,
  inputptr = 0x81f2954 "", inputstride = 16, insert = 0x3b0d0d5c,
  emit = 0x3afeb4b0 <insert_2f_2>, extract = 0x3afec8a0 <extract_4f>, vp =
0x808cd90}
(gdb) print VB->AttribPtr
$17 = {0x0, 0x81f69ec, 0x81f6a08, 0x81f6a24, 0x81f6a40, 0x81f6a5c, 0x81f6a78,
  0x81f6a94, 0x81f6ab0, 0x81f6acc, 0x81f6ae8, 0x81f6b04, 0x81f6b20, 0x81f6b3c,
  0x81f6b58, 0x81f6b74, 0x81f6b90, 0x81f6bac, 0x81f6bc8, 0x81f6be4, 0x81f6c00,
  0x81f6c1c, 0x81f6c38, 0x81f6c54, 0x81f6c70, 0x81f6c8c, 0x81f6ca8, 0x81f6cc4,
  0x81f6ce0, 0x0, 0x0}

So, the first pointer (for attrib 0 (position)) is the NULL pointer leading to
the segfault, the other ones (attrib 3 (color0), attrib 8 (tex0)) look rather
reasonable (?). Unfortunately I'm no tnl wizard...

Comment 5 Roland Scheidegger 2005-03-09 15:06:35 UTC

Some more debugging showed the problem:
When a rasterization fallback is hit, _tnl_need_projected_coords must be set to
true. This indeed happens, however r200ChooseVertexState will get called later
and this may set (depending on current vertex format) this to false again. When
this is set to false this will cause VB->NdcPtr to be NULL (in t_vb_vertex.c,
run_vertex_stage) which will in turn cause the VB->AttribPtr[VERT_ATTRIB_POS] to
be NULL too (ss_context.c, _swsetup_RenderStart) --> boom.

Comment 6 Roland Scheidegger 2005-03-09 15:11:08 UTC

Created attachment 2067 [details] [review]
possible fix for rasterization fallbacks

this patch fixes the segfaults (tested with gl-117 and texdown), however I'm
not sure if it actually is really correct. I can't quite follow the command
flow of the tnl pipeline, what should and what actually is executed, validated
and what not is hard to understand. Someone with a more thourough understanding
of the tnl pipeline care to comment?

Comment 7 Dieter Nützel 2005-03-12 12:25:07 UTC

Sorry Roland, 
 
but your fix isn't enough. 
Play around with 's', 'f' and 'b'. 
 
progs/demos> GL_VENDOR = Tungsten Graphics, Inc. 
GL_VERSION = 1.3 Mesa 6.3 
GL_RENDERER = Mesa DRI R200 20041207 AGP 4x x86/MMX+/3DNow!+/SSE TCL 
w*h=65536  count=655  time=3.000000 
w*h=66564  count=617  time=3.002000 
w*h=66564  count=684  time=3.001000 
w*h=66564  count=984  time=3.001000 
w*h=66564  count=675  time=3.003000 
DRM_RADEON_TEXTURE: return = -14 
   offset=0xe9e00000 
   image width=258 height=130 
    blit width=258 height=131 data=0x4506f200 
 
[1]    Exitcode 1                        ./texdown

Comment 8 Roland Scheidegger 2005-03-12 16:30:42 UTC

(In reply to comment #7)
> but your fix isn't enough. 
> Play around with 's', 'f' and 'b'. 
> DRM_RADEON_TEXTURE: return = -14 
>    offset=0xe9e00000 
>    image width=258 height=130 
>     blit width=258 height=131 data=0x4506f200 
I'll look at it, but I'm pretty sure this is a different problem (though related
to the texture border raster fallback). When analyzing the segfaults I already
noticed that the driver will attempt to upload textures with borders which will
always cause a fallback to the gpu anyway, and wondered if it's actually safe to
do that. Looks like that's not the case...

Comment 9 Eric Anholt 2005-05-31 15:21:22 UTC

I can reproduce a similar backtrace on radeon and r200 using neverball, which is
doing a rasterization fallback beacuse I'm running at 16bpp (no stencil).

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1214635360 (LWP 1346)]
0xb70559a6 in update_input_ptrs (ctx=0x0, start=0) at tnl/t_vertex.c:386
386           a[j].inputptr = ((GLubyte *)vptr->data) + start * vptr->stride;
(gdb) bt
#0  0xb70559a6 in update_input_ptrs (ctx=0x0, start=0) at tnl/t_vertex.c:386
#1  0xb7055a52 in _tnl_build_vertices (ctx=0x8110180, start=0, end=0,
newinputs=4294967295)
    at tnl/t_vertex.c:408
#2  0xb704a066 in run_render (ctx=0x8110180, stage=0x830d314) at
tnl/t_vb_render.c:295
#3  0xb70410b5 in _tnl_run_pipeline (ctx=0x8110180) at tnl/t_pipeline.c:159
#4  0xb6fba915 in r200WrapRunPipeline (ctx=0x8110180) at r200_state.c:2316
#5  0xb704640c in _tnl_playback_vertex_list (ctx=0x8110180, data=0x85e5198)
    at tnl/t_save_playback.c:209
#6  0xb6feccb3 in execute_list (ctx=0x8110180, list=135332224) at main/dlist.c:5679
#7  0xb6fef760 in _mesa_CallList (list=80) at main/dlist.c:6747
#8  0xb703a9e2 in neutral_CallList (i=0) at vtxfmt_tmp.h:301

Comment 10 Roland Scheidegger 2005-05-31 17:09:10 UTC

(In reply to comment #9)
> I can reproduce a similar backtrace on radeon and r200 using neverball
I'll commit the fix if you don't come up with a better one, the radeon driver is
only affected since it was converted to t_vertex.

Comment 11 Eric Anholt 2005-05-31 17:59:55 UTC

I was just noting the ability to reproduce, not saying something either way
about the patch.  I haven't looked into the patch.

Comment 12 Eric Anholt 2005-06-26 14:59:08 UTC

Patch committed to CVS.  Thanks!

Comment 13 Adam Jackson 2009-08-24 12:23:03 UTC

Mass version move, cvs -> git

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.