Bug 99527 - Provide option for llvmpipe JIT code to run cleanly under valgrind
Summary: Provide option for llvmpipe JIT code to run cleanly under valgrind
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/llvmpipe (show other bugs)
Version: 13.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-25 01:37 UTC by John Firebaugh
Modified: 2018-04-13 09:58 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Firebaugh 2017-01-25 01:37:31 UTC
Currently llvmpipe JIT code is known to trigger errors when run under valgrind. For example, bug #29922 reports the following, which I also observe:

==17795== Conditional jump or move depends on uninitialised value(s)
==17795==    at 0x573F792: ???
==17795==    by 0x4171342: lp_rast_shade_quads_mask (lp_rast.c:473)
==17795==    by 0x4173EE9: do_block_4_3 (lp_rast_tri_tmp.h:61)
==17795==    by 0x4178087: lp_rast_triangle_3_16 (lp_rast_tri.c:229)
==17795==    by 0x4171913: rasterize_bin (lp_rast.c:667)
==17795==    by 0x4171ACE: rasterize_scene (lp_rast.c:766)
==17795==    by 0x4171BA4: lp_rast_queue_scene (lp_rast.c:791)
==17795==    by 0x4178EB4: lp_scene_rasterize (lp_scene.c:405)
==17795==    by 0x4179DF4: lp_setup_rasterize_scene (lp_setup.c:158)
==17795==    by 0x417A296: set_scene_state (lp_setup.c:260)
==17795==    by 0x417A39C: lp_setup_flush (lp_setup.c:295)
==17795==    by 0x416E756: llvmpipe_flush (lp_flush.c:56)

That bug is closed as RESOLVED WONTFIX but I would like to ask that this be reconsidered. Conscientious downstream developers want to make sure their code runs cleanly under valgrind. If libraries they use trigger lots of errors, it makes this task more difficult. For instance, I first had to determine whether or not this error represented a misuse of OpenGL by my own code. In this case, it's possible to search for "valgrind lp_rast_shade_quads_mask" and find the above bug report, so I was able to reasonably conclude that this was not a bug I was responsible for. In many of the other errors in JIT code that valgrind reports, that's not the case, and I'm still not 100% sure of the status -- whether it's a bug in my code, a bug in llvm, a supposedly harmless use of an uninitialized value, or a true false positive.

I'm not the only one dissatisfied with the status quo. For a more strongly worded opinion, see http://www.americanteeth.org/2013/08/14/valgrind-is-not-optional/.

If you believe that fixing these errors would harm performance of production builds, please consider using the `--enable-valgrind` configure flag as an explicit opt-in mechanism.

For reference, here are some of the other errors I have received:

==9337== Conditional jump or move depends on uninitialised value(s)
==9337==    at 0x402E63D: ???
==9337==    by 0xD32C84D: lp_rast_shade_quads_all (lp_rast_priv.h:271)
==9337==    by 0xD32C368: block_full_4 (lp_rast_tri.c:46)
==9337==    by 0xD329222: do_block_16_32_3 (lp_rast_tri_tmp.h:167)
==9337==    by 0xD328E52: lp_rast_triangle_32_3 (lp_rast_tri_tmp.h:305)
==9337==    by 0xD32073C: do_rasterize_bin (lp_rast.c:609)
==9337==    by 0xD3203EB: rasterize_bin (lp_rast.c:628)
==9337==    by 0xD31FBD1: rasterize_scene (lp_rast.c:688)
==9337==    by 0xD321823: thread_function (lp_rast.c:828)
==9337==    by 0xD321A61: impl_thrd_routine (threads_posix.h:87)
==9337==    by 0x4E42183: start_thread (pthread_create.c:312)
==9337==    by 0x6A6E37C: clone (clone.S:111)
==9337==  Uninitialised value was created by a heap allocation
==9337==    at 0x4C2B221: operator new(unsigned long) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/valgrind/3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9337==    by 0xDB14217: llvm::User::operator new(unsigned long, unsigned int) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA60CDA: llvm::ConstantFP::get(llvm::LLVMContext&, llvm::APFloat const&) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA629BD: llvm::ConstantFP::get(llvm::Type*, double) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xD29993E: lp_build_const_elem (lp_bld_const.c:309)
==9337==    by 0xD2999F0: lp_build_const_vec (lp_bld_const.c:333)
==9337==    by 0xD29B902: lp_build_conv (lp_bld_conv.c:654)
==9337==    by 0xD29B08E: lp_build_conv_auto (lp_bld_conv.c:491)
==9337==    by 0xD344C3C: generate_unswizzled_blend (lp_state_fs.c:1884)
==9337==    by 0xD342505: generate_fragment (lp_state_fs.c:2452)
==9337==    by 0xD340947: generate_variant (lp_state_fs.c:2637)
==9337==    by 0xD33FC79: llvmpipe_update_fs (lp_state_fs.c:3204)
==9337== 

==9337== Thread 3 llvmpipe-1:
==9337== Use of uninitialised value of size 8
==9337==    at 0x4035AEE: ???
==9337==    by 0x40354D4: ???
==9337==    by 0xD31F6A6: lp_rast_shade_quads_mask (lp_rast.c:457)
==9337==    by 0xD32D069: do_block_4_32_2 (lp_rast_tri_tmp.h:67)
==9337==    by 0xD328943: do_block_16_32_2 (lp_rast_tri_tmp.h:152)
==9337==    by 0xD328610: lp_rast_triangle_32_2 (lp_rast_tri_tmp.h:305)
==9337==    by 0xD32073C: do_rasterize_bin (lp_rast.c:609)
==9337==    by 0xD3203EB: rasterize_bin (lp_rast.c:628)
==9337==    by 0xD31FBD1: rasterize_scene (lp_rast.c:688)
==9337==    by 0xD321823: thread_function (lp_rast.c:828)
==9337==    by 0xD321A61: impl_thrd_routine (threads_posix.h:87)
==9337==    by 0x4E42183: start_thread (pthread_create.c:312)
==9337==  Uninitialised value was created by a heap allocation
==9337==    at 0x4C2B221: operator new(unsigned long) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/valgrind/3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9337==    by 0xDB14217: llvm::User::operator new(unsigned long, unsigned int) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA60CDA: llvm::ConstantFP::get(llvm::LLVMContext&, llvm::APFloat const&) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA629BD: llvm::ConstantFP::get(llvm::Type*, double) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xD29993E: lp_build_const_elem (lp_bld_const.c:309)
==9337==    by 0xD2999F0: lp_build_const_vec (lp_bld_const.c:333)
==9337==    by 0xD296F04: lp_build_sin_or_cos (lp_bld_arit.c:2914)
==9337==    by 0xD2965EE: lp_build_sin (lp_bld_arit.c:2930)
==9337==    by 0xD2C9A63: sin_emit_cpu (lp_bld_tgsi_action.c:2028)
==9337==    by 0xD2CC759: lp_build_tgsi_inst_llvm (lp_bld_tgsi.c:306)
==9337==    by 0xD2CCD79: lp_build_tgsi_llvm (lp_bld_tgsi.c:523)
==9337==    by 0xD2CF475: lp_build_tgsi_soa (lp_bld_tgsi_soa.c:4058)
==9337==
Comment 1 Roland Scheidegger 2017-01-25 02:25:36 UTC
I agree it would be really nice if we wouldn't get valgrind errors.
If you figure out how to fix it, patches welcome...
I tried to look into it at some point but couldn't really figure it out (didn't invest all that much time though). I'm not even sure this isn't a valgrind bug (last I checked there could still be some problems with simd instructions).

Tracking this stuff down in jit code isn't exactly easy, and having these harmless errors makes it more difficult to debug real issues (I've seen invalid reads and writes which needed to be fixed, and they got kinda buried in the valgrind output).
Comment 2 Jose Fonseca 2017-01-27 22:59:42 UTC
I agree with Roland.  I wouldn't oppose patches fixing these (including --enable-valgrind build option if performance suffered).


But it won't be trivial.  A big source of issues is the fact that we need padding to fill the SIMD instructions lanes.  That is, we might need to process 3 floats, but then we allocate tempoary variables worth of 8 floats, where the last 5 are unitizialied garbagge.  It's not hard to fix this, but there are hundreds of places this happens.


I suspect that in many cases are actually false positives, ie, limitatio in valgrind analysis: we use uninitialize memory for temporary calculations, but they are soon discarded.


How about `--suppressions` option? For llvmpipe there doesn't seem to be many different stack traces.


Another option is to use call VALGRIND_DISABLE_ERROR_REPORTING / VALGRIND_ENABLE_ERROR_REPORTING before/after JIT code.


Feel free to experiment with any of the above FWIW.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.