Bug 30266 - Regression, segfault in libdrm_intel when calling glBitmap
Regression, segfault in libdrm_intel when calling glBitmap
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965
git
Other All
: high critical
Assigned To: Eric Anholt
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-19 09:32 UTC by Thomas Jones
Modified: 2011-07-07 15:18 UTC (History)
0 users

See Also:


Attachments
test code showing the bug (478.80 KB, application/gzip)
2011-05-25 19:01 UTC, Thomas Jones
Details
0001-intel-Fix-use-of-freed-buffer-if-glBitmap-is-called-.patch (1.57 KB, patch)
2011-07-06 11:45 UTC, Eric Anholt
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Jones 2010-09-19 09:32:22 UTC
I rebuilt mesa with --enable-debug and it moved up into mesa, rather than libdrm_intel, here's the backtrace from the debug mesa

Program received signal SIGSEGV, Segmentation fault.
intel_batchbuffer_emit_reloc_fenced (batch=0xa4c240, buffer=0x0, 
    read_domains=2, write_domain=2, delta=0) at intel_batchbuffer.c:226
226	   assert(delta < buffer->size);
(gdb) bt
#0  intel_batchbuffer_emit_reloc_fenced (batch=0xa4c240, buffer=0x0, 
    read_domains=2, write_domain=2, delta=0) at intel_batchbuffer.c:226
#1  0x00007fffe5c090ed in intelEmitImmediateColorExpandBlit (intel=0x710960, 
    cpp=<value optimized out>, src_bits=<value optimized out>, 
    src_size=<value optimized out>, fg_color=<value optimized out>, 
    dst_pitch=<value optimized out>, dst_buffer=0x0, dst_offset=0, 
    dst_tiling=1, x=88, y=4, w=8, h=16, logic_op=5379) at intel_blit.c:447
#2  0x00007fffe5c1bb96 in do_blit_bitmap (ctx=<value optimized out>, 
    x=<value optimized out>, y=<value optimized out>, 
    width=<value optimized out>, height=<value optimized out>, 
    unpack=<value optimized out>, pixels=0x7fffffffdd50 "")
    at intel_pixel_bitmap.c:279
#3  intelBitmap (ctx=<value optimized out>, x=<value optimized out>, 
    y=<value optimized out>, width=<value optimized out>, 
    height=<value optimized out>, unpack=<value optimized out>, 
    pixels=0x7fffffffdd50 "") at intel_pixel_bitmap.c:511
#4  0x00007fffe5c9bd9a in _mesa_Bitmap (width=8, height=16, 
    xorig=<value optimized out>, yorig=<value optimized out>, 
    xmove=<value optimized out>, ymove=<value optimized out>, 
    bitmap=0x7fffffffdd50 "") at main/drawpix.c:284
#5  0x000000000040674a in draw_string (str=0x44193c "Done\n")
    at src/opengl/glmisc.c:176
#6  0x000000000040468c in init_gl (opt_data=0x7fffffffde60, width=512, 
    height=512) at src/opengl/glmain.c:106
#7  0x0000000000403c63 in main (argc=1, argv=0x7fffffffdfb8)
    at src/opengl/sdlmain.c:19
I would include a small test program that reproduces the bug, only I couldn't make it happen...
Comment 1 Thomas Jones 2010-10-23 12:59:54 UTC
Oh I should note you can grab the code for the app from my github here: http://github.com/Spudd86/julia-vis
Comment 2 Eric Anholt 2011-05-25 14:46:49 UTC
Don't suppose you could come up with a smaller testcase for this that we could put into the regression test suite?  http://cgit.freedesktop.org/piglit is where we keep testcases, and http://cgit.freedesktop.org/piglit/tree/tests/general/provoking-vertex.c would be an example of a simple testcase.

I tried building your program, but got lost in some dependency hell in the process :(
Comment 3 Thomas Jones 2011-05-25 17:51:47 UTC
I tried to reduce it to a small testcase before posting the bug, I failed. I have no idea what I missed... I thought I had pretty much all the same gl calls that led up to the crash, but it failed to die.

As for building my program I don't think it should have any dependancies other than SDL, fftw and obviously mesa... 

I'll take another shot at reducing it though.

The odd thing is that it only happens early on (I'm using it to draw some text, back when mesa was still using the old shader compiler it took a significant amount of time to compile the shaders so I had it tell you what it was doing, the same code is later used to draw the framerate, only the message about shaders caused the crash) 

I will also make sure it still happens at the same time :P
Comment 4 Thomas Jones 2011-05-25 18:05:49 UTC
oh right you also need portaudio for it to compile... 

And I tried running it again with mesa from a week or two ago and it seemed to cause X to hang... vt switch didn't work either, but the mouse pointer kept moving.

I'll try again with mesa master and then if it still crashes/does nasty things to X I'll take another shot at a reduced test case
Comment 5 Thomas Jones 2011-05-25 19:01:00 UTC
Created attachment 47168 [details]
test code showing the bug

Yup it still crashes X... but I've got a test case that just segfaults... I'm not sure what's going on here. 

I've added the test case as an attachment, it's not very reduced, mostly just almost all of my start up code paths copy/pasted with some branches removed because they aren't taken in the case we are interested in (or they don't matter either way)
Comment 6 Thomas Jones 2011-05-25 19:05:00 UTC
(In reply to comment #5)
> Created an attachment (id=47168) [details]
> test code showing the bug
> 
> Yup it still crashes X... but I've got a test case that just segfaults... I'm
> not sure what's going on here. 
> 
> I've added the test case as an attachment, it's not very reduced, mostly just
> almost all of my start up code paths copy/pasted with some branches removed
> because they aren't taken in the case we are interested in (or they don't
> matter either way)

I'm also not 100% certain this is the same crash since the mesa I'm running against was built with --enable-debug but the backtrace is different from the one above (and in particular it is inside libdrm...
Comment 7 Eric Anholt 2011-06-03 13:09:26 UTC
For me, that testcase is starting up, showing a window briefly, and exiting cleanly.
Comment 8 Eric Anholt 2011-06-08 15:47:00 UTC
Here's another thought for how to get something that can reproduce the problem for me: apitrace might correctly capture the GL calls so I can replay them here with hopefully no differences from your system.

https://github.com/apitrace/apitrace
Comment 9 Eric Anholt 2011-07-05 21:54:28 UTC
Also, all 4 binaries of julia-vis are working for me (just starting up, watching a while, and exiting) now that I got it built.
Comment 10 Eric Anholt 2011-07-05 21:56:12 UTC
There's a bunch of usual bug report info missing here -- I wonder if maybe you're just getting a GPU hang on whatever hardware you're on and that's what's different between your system and mine?  That's the only thing that should be able to make X hang.

(http://intellinuxgraphics.org/how_to_report_bug.html)
Comment 11 Thomas Jones 2011-07-06 09:09:30 UTC
(In reply to comment #10)
> There's a bunch of usual bug report info missing here -- I wonder if maybe
> you're just getting a GPU hang on whatever hardware you're on and that's what's
> different between your system and mine?  That's the only thing that should be
> able to make X hang.
> 
> (http://intellinuxgraphics.org/how_to_report_bug.html)

My system is:

Gentoo, Linux 2.6.39-gentoo-r2, on Lenovo T500 with a GM45 Express 

lspci -v for the GPU

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device 20e4
	Flags: bus master, fast devsel, latency 0, IRQ 44
	Memory at f4400000 (64-bit, non-prefetchable) [size=4M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at 1800 [size=8]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 3
	Kernel driver in use: i915

00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
	Subsystem: Lenovo Device 20e4
	Flags: bus master, fast devsel, latency 0
	Memory at f4200000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: [d0] Power Management version 3

I don't remember what versions of Xorg or anything I had at the time of the original report. but right now I'm running xorg 1.10.2, libdrm 2.4.26, and my xf86-video-intel is 2.15.0 (+ two patches to fix bug #36319 )

Also did you pull from my git repo since the last time? If so you'll have a version that's missing the call that triggered segfault. (I pushed some stuff, in April) I'm pretty sure that commit 36dd78edcc0db843a395ec929a7955d8b943cbb2 still has it. 

I'm not sure the original segfault is still present, since crashing X is not really the same thing at all...
Comment 12 Thomas Jones 2011-07-06 09:19:43 UTC
Hmm just decided to run it inside valgrind, no Xorg hangs or segfaults, but... it did find this:

==16343== Invalid read of size 8
==16343==    at 0xFEC0C71: drm_intel_gem_bo_free (intel_bufmgr_gem.c:878)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404AB9: init_gl (glmain.c:108)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343==  Address 0x1173c638 is 152 bytes inside a block of size 192 free'd
==16343==    at 0x4C27B8D: free (vg_replace_malloc.c:366)
==16343==    by 0xFEC0CBB: drm_intel_gem_bo_free (intel_bufmgr_gem.c:889)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404BAE: init_gl (glmain.c:106)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343== 
==16343== Invalid read of size 4
==16343==    at 0xFEC0C8D: drm_intel_gem_bo_free (intel_bufmgr_gem.c:883)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404AB9: init_gl (glmain.c:108)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343==  Address 0x1173c5d4 is 52 bytes inside a block of size 192 free'd
==16343==    at 0x4C27B8D: free (vg_replace_malloc.c:366)
==16343==    by 0xFEC0CBB: drm_intel_gem_bo_free (intel_bufmgr_gem.c:889)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404BAE: init_gl (glmain.c:106)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343== 
==16343== Invalid free() / delete / delete[]
==16343==    at 0x4C27B8D: free (vg_replace_malloc.c:366)
==16343==    by 0xFEC0CBB: drm_intel_gem_bo_free (intel_bufmgr_gem.c:889)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404AB9: init_gl (glmain.c:108)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343==  Address 0x1173c5a0 is 0 bytes inside a block of size 192 free'd
==16343==    at 0x4C27B8D: free (vg_replace_malloc.c:366)
==16343==    by 0xFEC0CBB: drm_intel_gem_bo_free (intel_bufmgr_gem.c:889)
==16343==    by 0xFEC0E5C: drm_intel_gem_bo_unreference_final (intel_bufmgr_gem.c:979)
==16343==    by 0xFEC1067: drm_intel_gem_bo_unreference (intel_bufmgr_gem.c:995)
==16343==    by 0xF9926A9: intel_prepare_render (intel_context.c:471)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404BAE: init_gl (glmain.c:106)
==16343==    by 0x403CB2: main (sdlmain.c:19)


Looks like a use after free inside glBitmap()

It looks to me like something could still be wrong, it's just not causing a segfault anymore.

For reference here's the first one that valgrind spat out (the above is from the tail, I killed the program)

==16343== Invalid read of size 2
==16343==    at 0xF9A1A00: intelBitmap (intel_pixel_bitmap.c:268)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404BAE: init_gl (glmain.c:106)
==16343==    by 0x403CB2: main (sdlmain.c:19)
==16343==  Address 0x1173c6b8 is 24 bytes inside a block of size 88 free'd
==16343==    at 0x4C27B8D: free (vg_replace_malloc.c:366)
==16343==    by 0xF99B419: intel_region_release (intel_regions.c:299)
==16343==    by 0xF9C9F82: brw_set_draw_region (brw_vtbl.c:131)
==16343==    by 0xF990FBC: intel_draw_buffer (intel_buffers.c:232)
==16343==    by 0xF992633: intel_prepare_render (intel_context.c:436)
==16343==    by 0xF9A15BF: intelBitmap (intel_pixel_bitmap.c:227)
==16343==    by 0xFB57E28: _mesa_Bitmap (drawpix.c:246)
==16343==    by 0x405FB4: draw_string (glmisc.c:176)
==16343==    by 0x404BAE: init_gl (glmain.c:106)
==16343==    by 0x403CB2: main (sdlmain.c:19)


As a side note, if you see anything other than black with some text on it from gl-test then the bug I reported is not occurring.
Comment 13 Eric Anholt 2011-07-06 11:45:18 UTC
Created attachment 48827 [details] [review]
0001-intel-Fix-use-of-freed-buffer-if-glBitmap-is-called-.patch

That happens to be the commit I was on, and glBitmap() is definitely called.  But if the bug is related to prepare_render, it may be that we're looking up the buffer to render to before we're updating the list of current buffers... and there it is.  Patch attached. I don't see the bug because my window manager isn't causing a buffer change there, so I don't hit this path.

Note that after glXSwapBuffers(), the contents of the backbuffer is undefined, so you'll want to initialize that somehow.  The reason nobody else ran into this bug is that before doing some glBitmap() rendering after a swap, they've done a glClear() or other rendering that updated the buffers already, so the prepare_render in glBitmap was a noop.  So, this is a driver bug triggered by an app bug :)
Comment 14 Thomas Jones 2011-07-06 12:10:58 UTC
(In reply to comment #13)
> Created an attachment (id=48827) [details]
> 0001-intel-Fix-use-of-freed-buffer-if-glBitmap-is-called-.patch
> 
> That happens to be the commit I was on, and glBitmap() is definitely called. 
> But if the bug is related to prepare_render, it may be that we're looking up
> the buffer to render to before we're updating the list of current buffers...
> and there it is.  Patch attached. I don't see the bug because my window manager
> isn't causing a buffer change there, so I don't hit this path.
> 
> Note that after glXSwapBuffers(), the contents of the backbuffer is undefined,
> so you'll want to initialize that somehow.  The reason nobody else ran into
> this bug is that before doing some glBitmap() rendering after a swap, they've
> done a glClear() or other rendering that updated the buffers already, so the
> prepare_render in glBitmap was a noop.  So, this is a driver bug triggered by
> an app bug :)

Eh, I don't really care much, I only added the text display there because I started writing this thing before the shader compiler landed and back then it took several seconds to start up. So that stuff was there to reassure myself it wasn't hung or something, so I wasn't too annoyed at taking it out, I'll add some glClear() calls though.

And the patch does seem to fix it, valgrind no longer complains about the glBitmap() calls.
Comment 15 Eric Anholt 2011-07-07 15:18:46 UTC
commit 066bee64e1611093c7e641ba77bbd43f70d08cec
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Jul 6 11:31:00 2011 -0700

    intel: Fix use of freed buffer if glBitmap is called after a swap.
    
    Regions looked up from the framebuffer are invalid after
    intel_prepare_render().
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30266
    Tested-by: Thomas Jones <thomas.jones@utoronto.ca>