Summary: | Firefox GLX test crashes with Mesa 7.12-devel, Gallium 0.4 on AMD JUNIPER | ||
---|---|---|---|
Product: | Mesa | Reporter: | Benoit Jacob <bjacob> |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED NOTOURBUG | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | bjacob, jlp.bugs, MostAwesomedude |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
URL: | https://crash-stats.mozilla.com/report/index/4a1da9e3-fef3-4ce6-be03-ce07d2110827 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Stand-alone version of Mozilla's glxtext
Stand-alone version of Mozilla's glxtext, v2 Stand-alone version of Mozilla's glxtext, v3 |
Description
Benoit Jacob
2011-08-27 11:21:33 UTC
Notice that besides looking at the source code (above link) you can also try this yourself just by running Firefox. Any version >= 6 will do, but the present crash report is on Nightly (9): http://nightly.mozilla.org/ ... to check this in Firefox, go to the about:support page and scroll down to Graphics. If the GLX test process crashed, this will be explicitly reported there, like "GLXtest process failed (received signal 11)" (In reply to comment #3) > You can find the source code of this test here: > http://hg.mozilla.org/mozilla-central/file/d0700ba932b4/toolkit/xre/glxtest.cpp#l107 It has at least one obvious flaw: dlsym() can only be used to retrieve glXGetProcAddress(ARB) and the symbols defined by the Linux OpenGL ABI (basically OpenGL 1.2 + GL_ARB_multitexture). All other GL(X) symbols need to be retrieved with glXGetProcAddress(ARB). I didn't look further than that. > Unfortunately I don't have any more information about it: no stack, no more > device information. Even assuming the crash wasn't due to a test bug, it'll be hard to do anything about it without that information. (In reply to comment #3) > (In reply to comment #3) > > You can find the source code of this test here: > > http://hg.mozilla.org/mozilla-central/file/d0700ba932b4/toolkit/xre/glxtest.cpp#l107 > > It has at least one obvious flaw: dlsym() can only be used to retrieve > glXGetProcAddress(ARB) and the symbols defined by the Linux OpenGL ABI > (basically OpenGL 1.2 + GL_ARB_multitexture). All other GL(X) symbols need to > be retrieved with glXGetProcAddress(ARB). That is kind of odd, but I don't think it's the problem. The only GL symbol that they use dlsym on is glGetString. The library has to statically export that. :) glXQueryExtension, glXQueryVersion, glXCreatePixmap, glXMakeCurrent, glXDestroyPixmap, and glXDestroyContext are in the same boat. Technically speaking, glXChooseFBConfig, glXGetVisualFromFBConfig, and glXCreateNewContext should be retrieved via glXGetProcAddress. If dlsym returns non-NULL, it seems unlikely to be a problem on our stack. Since is already done for glXBindTexImageEXT, it is a good idea to use glXGetProcAddress for these functions as well. > I didn't look further than that. > > > > Unfortunately I don't have any more information about it: no stack, no more > > device information. > > Even assuming the crash wasn't due to a test bug, it'll be hard to do anything > about it without that information. I hacked up a stand-alone version, but I wasn't able to reproduce the crash on the Intel drivers. Created attachment 50665 [details]
Stand-alone version of Mozilla's glxtext
Looks like this crash report came from me. With Nightly from today and Mesa 7.12-devel (git-110f846) I still get "GLXtest process failed (received signal 11)" in about:support. But if I download the glxtest attachment from Ian and compile it with "g++ -ldl -lX11 glxtest.cpp -o glxtest" I don't get any crash, it exits normally. Oh, and the hardware is AMD Radeon HD 5750 Hi Jure :-) Thanks Ian and Michel, will use glXGetProcAddress. (In reply to comment #6) > Looks like this crash report came from me. With Nightly from today and Mesa > 7.12-devel (git-110f846) I still get "GLXtest process failed (received signal > 11)" in about:support. But if I download the glxtest attachment from Ian and > compile it with "g++ -ldl -lX11 glxtest.cpp -o glxtest" I don't get any crash, > it exits normally. That's really weird isn't it. Can you run the standalone glxtest, compiled with -g3, in valgrind --tool=memcheck so we see if it's hitting uninitialized memory? Created attachment 50707 [details]
Stand-alone version of Mozilla's glxtext, v2
The original version used an uninitialized value for the file descriptor it was writing to. This version now uses standard output, so one can see what information glxtest is sending to Firefox.
(In reply to comment #9) > Created an attachment (id=50707) [details] > Stand-alone version of Mozilla's glxtext, v2 > > The original version used an uninitialized value er, sorry, it used -1, but i still guess that using stdout is better. > --- Comment #9 from Benoit Jacob <bjacob@mozilla.com> 2011-08-30 14:00:11 PDT ---
> The original version used an uninitialized value for the file descriptor it was
> writing to. This version now uses standard output, so one can see what
> information glxtest is sending to Firefox.
>
Not that it matters much, but stdout is 1, not 0 ;)
(In reply to comment #11) > > --- Comment #9 from Benoit Jacob <bjacob@mozilla.com> 2011-08-30 14:00:11 PDT --- > > The original version used an uninitialized value for the file descriptor it was > > writing to. This version now uses standard output, so one can see what > > information glxtest is sending to Firefox. > > > Not that it matters much, but stdout is 1, not 0 ;) Oops! You're right. Now I'm totally confused as to why my change actually made it print stuff to stdout. Created attachment 50724 [details]
Stand-alone version of Mozilla's glxtext, v3
This is what valgrind gives on my laptop with ATI Mobility Radeon 5470: $ valgrind --tool=memcheck ./glxtest ==20468== Memcheck, a memory error detector ==20468== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==20468== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==20468== Command: ./glxtest ==20468== ==20468== Invalid read of size 4 ==20468== at 0x52341B2: _mesa_make_extension_string (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5246397: _mesa_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5236B01: st_api_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x521F352: dri_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x51FF655: driBindContext (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x3C14056B0E: dri2_bind_context (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x3C1403032D: MakeContextCurrent (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x400F1C: glxtest() (glxtest.cpp:193) ==20468== by 0x4010B2: main (glxtest.cpp:239) ==20468== Address 0x4f157b0 is 0 bytes inside a block of size 1 alloc'd ==20468== at 0x4A063E4: calloc (vg_replace_malloc.c:467) ==20468== by 0x5234429: _mesa_make_extension_string (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5246397: _mesa_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x5236B01: st_api_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x521F352: dri_make_current (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x51FF655: driBindContext (in /usr/lib64/mesa/r600g_dri.so) ==20468== by 0x3C14056B0E: dri2_bind_context (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x3C1403032D: MakeContextCurrent (in /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2) ==20468== by 0x400F1C: glxtest() (glxtest.cpp:193) ==20468== by 0x4010B2: main (glxtest.cpp:239) ==20468== VENDOR X.Org RENDERER Gallium 0.4 on AMD CEDAR VERSION 2.1 Mesa 7.12-devel (git-82fff5f) TFP TRUE ==20468== ==20468== HEAP SUMMARY: ==20468== in use at exit: 212,008 bytes in 531 blocks ==20468== total heap usage: 3,105 allocs, 2,574 frees, 2,729,162 bytes allocated ==20468== ==20468== LEAK SUMMARY: ==20468== definitely lost: 1,948 bytes in 4 blocks ==20468== indirectly lost: 0 bytes in 0 blocks ==20468== possibly lost: 120 bytes in 3 blocks ==20468== still reachable: 209,940 bytes in 524 blocks ==20468== suppressed: 0 bytes in 0 blocks ==20468== Rerun with --leak-check=full to see details of leaked memory ==20468== ==20468== For counts of detected and suppressed errors, rerun with: -v ==20468== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 11 from 9) Ups forgot the debugging symbols for mesa: $ valgrind --tool=memcheck ./glxtest ==28048== Memcheck, a memory error detector ==28048== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==28048== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info ==28048== Command: ./glxtest ==28048== ==28048== Invalid read of size 4 ==28048== at 0x54B71B2: _mesa_make_extension_string (extensions.c:912) ==28048== by 0x54C9397: _mesa_make_current (context.c:1514) ==28048== by 0x54B9B01: st_api_make_current (st_manager.c:789) ==28048== by 0x54A2352: dri_make_current (dri_context.c:205) ==28048== by 0x5482655: driBindContext (dri_util.c:196) ==28048== by 0x509FB0E: dri2_bind_context (dri2_glx.c:151) ==28048== by 0x507932D: MakeContextCurrent (glxcurrent.c:269) ==28048== by 0x400F1C: glxtest() (glxtest.cpp:193) ==28048== by 0x4010B2: main (glxtest.cpp:239) ==28048== Address 0x4f157b0 is 0 bytes inside a block of size 1 alloc'd ==28048== at 0x4A063E4: calloc (vg_replace_malloc.c:467) ==28048== by 0x54B7429: _mesa_make_extension_string (extensions.c:776) ==28048== by 0x54C9397: _mesa_make_current (context.c:1514) ==28048== by 0x54B9B01: st_api_make_current (st_manager.c:789) ==28048== by 0x54A2352: dri_make_current (dri_context.c:205) ==28048== by 0x5482655: driBindContext (dri_util.c:196) ==28048== by 0x509FB0E: dri2_bind_context (dri2_glx.c:151) ==28048== by 0x507932D: MakeContextCurrent (glxcurrent.c:269) ==28048== by 0x400F1C: glxtest() (glxtest.cpp:193) ==28048== by 0x4010B2: main (glxtest.cpp:239) ==28048== VENDOR X.Org RENDERER Gallium 0.4 on AMD CEDAR VERSION 2.1 Mesa 7.12-devel (git-11f6466) TFP TRUE ==28048== ==28048== HEAP SUMMARY: ==28048== in use at exit: 212,008 bytes in 531 blocks ==28048== total heap usage: 3,105 allocs, 2,574 frees, 2,729,162 bytes allocated ==28048== ==28048== LEAK SUMMARY: ==28048== definitely lost: 1,948 bytes in 4 blocks ==28048== indirectly lost: 0 bytes in 0 blocks ==28048== possibly lost: 120 bytes in 3 blocks ==28048== still reachable: 209,940 bytes in 524 blocks ==28048== suppressed: 0 bytes in 0 blocks ==28048== Rerun with --leak-check=full to see details of leaked memory ==28048== ==28048== For counts of detected and suppressed errors, rerun with: -v ==28048== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 11 from 9) Is there anything else I can test to help fix this? (In reply to comment #16) > Is there anything else I can test to help fix this? Looking at the Valgrind error from comment 15, it suggests that _mesa_make_extension_string (extensions.c:912) makes the assumption that it's safe to read 4 bytes from a string and that assumption fails as the string is empty with just 1 byte allocated (and set to 0) by calloc. This kind of error is very commonly found in string manipulation and is generally considered safe to ignore as it can hardly lead to a crash. Indeed, AFAIK it could only crash if reading those 4 bytes led to crossing a (4k) page boundary, which would mean that the pointer was not even 4-byte-aligned... It's not inconceivable to have a Firefox-specific crash there as Firefox uses a custom memory allocator (jemalloc). What you could do to investigate this bug, if you can recompile Mesa, is add a printf in _mesa_make_extension_string (extensions.c:912) to print this pointer. And perhaps add a printf at the next line to see if it's there that we crash. ... or just use gdb to do the same. Not sure why I was recommending printf-based debugging ;-) oh yes, it might be a little tricky with gdb as this is in a child process, you might have to break on fork to get the child process PID and then attach gdb to the child process. (In reply to comment #17) > Looking at the Valgrind error from comment 15, it suggests that > _mesa_make_extension_string (extensions.c:912) makes the assumption that it's > safe to read 4 bytes from a string and that assumption fails as the string is > empty with just 1 byte allocated (and set to 0) by calloc. Indeed, looks like a bug in gcc/glibc. Note that current Mesa Git works around this by rounding up the allocation sizes to multiples of four bytes. Note, again (see comment 17), that this is an extremely common bug in string functions, and that I'm not at all sure that it's the cause of the crash since it could only lead to a crash if the string were not 4-byte aligned, which would be surprising. It isn't a bug, it is a perfectly valid and entirely safe optimization. Valgrind supplies its own overrides for the string functions that don't do this optimization specifically to prevent false positives from the C library, but in this case the false positive results from a call to strlen() that gcc has inlined. Does jemalloc (or Firefox's fork of jemalloc) obey the platform ABI requirements for the alignment of the return values of the malloc functions? Because gcc's value range propagation has carried the fact that the pointer was returned by calloc to the point where it has inlined strlen(), and knowing that the pointer has to be word aligned, it has omitted the usual byte-at-a-time strlen() preamble and skipped straight to word-at-a-time loads. jemalloc's INSTALL lists the following option: --disable-tiny Disable tiny (sub-quantum-sized) object support. Technically it is not legal for a malloc implementation to allocate objects with less than quantum alignment (8 or 16 bytes, depending on architecture), but in practice it never causes any problems if, for example, 4-byte allocations are 4-byte-aligned. If, for example, jemalloc decides to stick a 4 byte allocation in the last 4 bytes of a page on a 64-bit plaform (i.e. word size is 8 bytes), and gcc's value range propagation can follow a pointer from a malloc function to an inlined strlen, things will explode violently. Looking at Firefox's fork of jemalloc, TINY_MIN_2POW is 1, making the smallest allocation size 2 bytes, which is broken. (In reply to comment #22) > Looking at Firefox's fork of jemalloc, TINY_MIN_2POW is 1, making the smallest > allocation size 2 bytes, which is broken. This sounds like something that would be very much worth discussing with the JEMalloc author, Jason Evans, by filing a bug at bugzilla.mozilla.org, product 'Core', component 'jemalloc', CC'ing Jason Evans ... and CC me too please, I'm interested. I'm not really competent to file this myself. Already filed https://bugzilla.mozilla.org/show_bug.cgi?id=691003 yesterday. Upstream jemalloc is fine (and has diverged rather significantly from Firefox's copy). Thanks a lot. https://bugzilla.mozilla.org/show_bug.cgi?id=691003 is fixed as of today, will be in tomorrow's Nightly. It seems that this bug could have explained a lot of mysterious Firefox crashiness on linux. Having it fixed fills me with indescriptible glee. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.