Bug 99527

Summary: Provide option for llvmpipe JIT code to run cleanly under valgrind
Product: Mesa Reporter: John Firebaugh <john.firebaugh>
Component: Drivers/Gallium/llvmpipeAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: 13.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description John Firebaugh 2017-01-25 01:37:31 UTC
Currently llvmpipe JIT code is known to trigger errors when run under valgrind. For example, bug #29922 reports the following, which I also observe:

==17795== Conditional jump or move depends on uninitialised value(s)
==17795==    at 0x573F792: ???
==17795==    by 0x4171342: lp_rast_shade_quads_mask (lp_rast.c:473)
==17795==    by 0x4173EE9: do_block_4_3 (lp_rast_tri_tmp.h:61)
==17795==    by 0x4178087: lp_rast_triangle_3_16 (lp_rast_tri.c:229)
==17795==    by 0x4171913: rasterize_bin (lp_rast.c:667)
==17795==    by 0x4171ACE: rasterize_scene (lp_rast.c:766)
==17795==    by 0x4171BA4: lp_rast_queue_scene (lp_rast.c:791)
==17795==    by 0x4178EB4: lp_scene_rasterize (lp_scene.c:405)
==17795==    by 0x4179DF4: lp_setup_rasterize_scene (lp_setup.c:158)
==17795==    by 0x417A296: set_scene_state (lp_setup.c:260)
==17795==    by 0x417A39C: lp_setup_flush (lp_setup.c:295)
==17795==    by 0x416E756: llvmpipe_flush (lp_flush.c:56)

That bug is closed as RESOLVED WONTFIX but I would like to ask that this be reconsidered. Conscientious downstream developers want to make sure their code runs cleanly under valgrind. If libraries they use trigger lots of errors, it makes this task more difficult. For instance, I first had to determine whether or not this error represented a misuse of OpenGL by my own code. In this case, it's possible to search for "valgrind lp_rast_shade_quads_mask" and find the above bug report, so I was able to reasonably conclude that this was not a bug I was responsible for. In many of the other errors in JIT code that valgrind reports, that's not the case, and I'm still not 100% sure of the status -- whether it's a bug in my code, a bug in llvm, a supposedly harmless use of an uninitialized value, or a true false positive.

I'm not the only one dissatisfied with the status quo. For a more strongly worded opinion, see http://www.americanteeth.org/2013/08/14/valgrind-is-not-optional/.

If you believe that fixing these errors would harm performance of production builds, please consider using the `--enable-valgrind` configure flag as an explicit opt-in mechanism.

For reference, here are some of the other errors I have received:

==9337== Conditional jump or move depends on uninitialised value(s)
==9337==    at 0x402E63D: ???
==9337==    by 0xD32C84D: lp_rast_shade_quads_all (lp_rast_priv.h:271)
==9337==    by 0xD32C368: block_full_4 (lp_rast_tri.c:46)
==9337==    by 0xD329222: do_block_16_32_3 (lp_rast_tri_tmp.h:167)
==9337==    by 0xD328E52: lp_rast_triangle_32_3 (lp_rast_tri_tmp.h:305)
==9337==    by 0xD32073C: do_rasterize_bin (lp_rast.c:609)
==9337==    by 0xD3203EB: rasterize_bin (lp_rast.c:628)
==9337==    by 0xD31FBD1: rasterize_scene (lp_rast.c:688)
==9337==    by 0xD321823: thread_function (lp_rast.c:828)
==9337==    by 0xD321A61: impl_thrd_routine (threads_posix.h:87)
==9337==    by 0x4E42183: start_thread (pthread_create.c:312)
==9337==    by 0x6A6E37C: clone (clone.S:111)
==9337==  Uninitialised value was created by a heap allocation
==9337==    at 0x4C2B221: operator new(unsigned long) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/valgrind/3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9337==    by 0xDB14217: llvm::User::operator new(unsigned long, unsigned int) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA60CDA: llvm::ConstantFP::get(llvm::LLVMContext&, llvm::APFloat const&) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA629BD: llvm::ConstantFP::get(llvm::Type*, double) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xD29993E: lp_build_const_elem (lp_bld_const.c:309)
==9337==    by 0xD2999F0: lp_build_const_vec (lp_bld_const.c:333)
==9337==    by 0xD29B902: lp_build_conv (lp_bld_conv.c:654)
==9337==    by 0xD29B08E: lp_build_conv_auto (lp_bld_conv.c:491)
==9337==    by 0xD344C3C: generate_unswizzled_blend (lp_state_fs.c:1884)
==9337==    by 0xD342505: generate_fragment (lp_state_fs.c:2452)
==9337==    by 0xD340947: generate_variant (lp_state_fs.c:2637)
==9337==    by 0xD33FC79: llvmpipe_update_fs (lp_state_fs.c:3204)
==9337== 

==9337== Thread 3 llvmpipe-1:
==9337== Use of uninitialised value of size 8
==9337==    at 0x4035AEE: ???
==9337==    by 0x40354D4: ???
==9337==    by 0xD31F6A6: lp_rast_shade_quads_mask (lp_rast.c:457)
==9337==    by 0xD32D069: do_block_4_32_2 (lp_rast_tri_tmp.h:67)
==9337==    by 0xD328943: do_block_16_32_2 (lp_rast_tri_tmp.h:152)
==9337==    by 0xD328610: lp_rast_triangle_32_2 (lp_rast_tri_tmp.h:305)
==9337==    by 0xD32073C: do_rasterize_bin (lp_rast.c:609)
==9337==    by 0xD3203EB: rasterize_bin (lp_rast.c:628)
==9337==    by 0xD31FBD1: rasterize_scene (lp_rast.c:688)
==9337==    by 0xD321823: thread_function (lp_rast.c:828)
==9337==    by 0xD321A61: impl_thrd_routine (threads_posix.h:87)
==9337==    by 0x4E42183: start_thread (pthread_create.c:312)
==9337==  Uninitialised value was created by a heap allocation
==9337==    at 0x4C2B221: operator new(unsigned long) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/valgrind/3.12.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9337==    by 0xDB14217: llvm::User::operator new(unsigned long, unsigned int) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA60CDA: llvm::ConstantFP::get(llvm::LLVMContext&, llvm::APFloat const&) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xDA629BD: llvm::ConstantFP::get(llvm::Type*, double) (in /home/travis/build/mapbox/mapbox-gl-native/mason_packages/linux-x86_64/mesa/13.0.3/lib/dri/swrast_dri.so)
==9337==    by 0xD29993E: lp_build_const_elem (lp_bld_const.c:309)
==9337==    by 0xD2999F0: lp_build_const_vec (lp_bld_const.c:333)
==9337==    by 0xD296F04: lp_build_sin_or_cos (lp_bld_arit.c:2914)
==9337==    by 0xD2965EE: lp_build_sin (lp_bld_arit.c:2930)
==9337==    by 0xD2C9A63: sin_emit_cpu (lp_bld_tgsi_action.c:2028)
==9337==    by 0xD2CC759: lp_build_tgsi_inst_llvm (lp_bld_tgsi.c:306)
==9337==    by 0xD2CCD79: lp_build_tgsi_llvm (lp_bld_tgsi.c:523)
==9337==    by 0xD2CF475: lp_build_tgsi_soa (lp_bld_tgsi_soa.c:4058)
==9337==
Comment 1 Roland Scheidegger 2017-01-25 02:25:36 UTC
I agree it would be really nice if we wouldn't get valgrind errors.
If you figure out how to fix it, patches welcome...
I tried to look into it at some point but couldn't really figure it out (didn't invest all that much time though). I'm not even sure this isn't a valgrind bug (last I checked there could still be some problems with simd instructions).

Tracking this stuff down in jit code isn't exactly easy, and having these harmless errors makes it more difficult to debug real issues (I've seen invalid reads and writes which needed to be fixed, and they got kinda buried in the valgrind output).
Comment 2 Jose Fonseca 2017-01-27 22:59:42 UTC
I agree with Roland.  I wouldn't oppose patches fixing these (including --enable-valgrind build option if performance suffered).


But it won't be trivial.  A big source of issues is the fact that we need padding to fill the SIMD instructions lanes.  That is, we might need to process 3 floats, but then we allocate tempoary variables worth of 8 floats, where the last 5 are unitizialied garbagge.  It's not hard to fix this, but there are hundreds of places this happens.


I suspect that in many cases are actually false positives, ie, limitatio in valgrind analysis: we use uninitialize memory for temporary calculations, but they are soon discarded.


How about `--suppressions` option? For llvmpipe there doesn't seem to be many different stack traces.


Another option is to use call VALGRIND_DISABLE_ERROR_REPORTING / VALGRIND_ENABLE_ERROR_REPORTING before/after JIT code.


Feel free to experiment with any of the above FWIW.
Comment 3 GitLab Migration User 2019-09-18 18:32:40 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/242.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.