Bug 98602

Summary: Data races when rendering from multiple threads
Product: Mesa Reporter: Steinar H. Gunderson <sgunderson>
Component: Drivers/DRI/i965Assignee: Intel 3D Bugs Mailing List <intel-3d-bugs>
Status: RESOLVED MOVED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: chadversary
Version: 13.0   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=99085
https://bugs.freedesktop.org/show_bug.cgi?id=99209
Whiteboard:
i915 platform: i915 features:

Description Steinar H. Gunderson 2016-11-05 19:12:25 UTC
Hi,

I have an application that renders from multiple threads; they use separate contexts, but the contexts share data. I believe I stay clear of illegal behavior (such as rendering from one texture in thread A while rendering into the same texture in thread B), but still, the application would crash or get spurious GL errors. Helgrind confirms that there are indeed races within Mesa.

I've already filed patches for two of those races (in Mesa core), but there are some left that are too much into the details of i965 for me to understand. I haven't actually seen the last ones turn into crashes, but it'd be nice to have them fixed nevertheless. I don't know if they stem from the same root issue or not, so I'm filing only one bug.

The first is a race between rendering and setting a fence:

==14794== Possible data race during read of size 8 at 0x34552520 by thread #1
==14794== Locks held: none
==14794==    at 0x1F1752F3: brw_emit_surface_state (brw_wm_surface_state.c:162)
==14794==    by 0x1F176EFB: brw_update_texture_surface (brw_wm_surface_state.c:629)
==14794==    by 0x1F1770D7: update_stage_texture_surfaces (brw_wm_surface_state.c:1258)
==14794==    by 0x1F1771DB: brw_update_texture_surfaces (brw_wm_surface_state.c:1289)
==14794==    by 0x1F16C5C8: check_and_emit_atom (brw_state_upload.c:763)
==14794==    by 0x1F16C5C8: brw_upload_pipeline_state (brw_state_upload.c:876)
==14794==    by 0x1F16C5C8: brw_upload_render_state (brw_state_upload.c:898)
==14794==    by 0x1F14E9F8: brw_try_draw_prims (brw_draw.c:584)
==14794==    by 0x1F14E9F8: brw_draw_prims (brw_draw.c:675)
==14794==    by 0x1EF24A79: vbo_draw_arrays (vbo_exec_array.c:467)
==14794==    by 0x60FBC35: movit::EffectChain::execute_phase(movit::Phase*, bool, std::set<int, std::less<int>, std::allocator<int> >*, std::map<movit::Phase*, unsigned int, std::less<movit::Phase*>, std::allocator<std::pair<movit::Phase* const, unsigned int> > >*, std::set<movit::Phase*, std::less<movit::Phase*>, std::allocator<movit::Phase*> >*) (effect_chain.cpp:1956)
==14794==    by 0x60FC37B: movit::EffectChain::render_to_fbo(unsigned int, unsigned int, unsigned int) (effect_chain.cpp:1785)
==14794==    by 0x12EBA7: render_to_screen (effect_chain.h:346)
==14794==    by 0x12EBA7: GLWidget::paintGL() (glwidget.cpp:109)
==14794==    by 0x4070C53: QGLWidget::glDraw() (in /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5.7.1)
==14794==    by 0x40705FC: QGLWidget::paintEvent(QPaintEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5.7.1)
==14794==
==14794== This conflicts with a previous write of size 8 by thread #20
==14794== Locks held: 3, at addresses 0x3E7BE0 0x1ACC42A8 0x2DE3E538
==14794==    at 0x1F6CF490: drm_intel_update_buffer_offsets2 (intel_bufmgr_gem.c:2254)
==14794==    by 0x1F6CF490: do_exec2 (intel_bufmgr_gem.c:2411)
==14794==    by 0x1F6D1839: drm_intel_gem_bo_context_exec (intel_bufmgr_gem.c:2454)
==14794==    by 0x1F184B9C: do_flush_locked (intel_batchbuffer.c:359)
==14794==    by 0x1F184B9C: _intel_batchbuffer_flush.part.2 (intel_batchbuffer.c:422)
==14794==    by 0x1EEC89C8: _mesa_FenceSync (syncobj.c:295)
==14794==    by 0x19E1E9: locked_glFenceSync (ref_counted_gl_sync.h:27)
==14794==    by 0x19E1E9: RefCountedGLsync (ref_counted_gl_sync.h:20)
==14794==    by 0x19E1E9: QuickSyncEncoderImpl::end_frame(long, long, std::vector<RefCountedFrame, std::allocator<RefCountedFrame> > const&) (quicksync_encoder.cpp:1899)
==14794==    by 0x19E92B: QuickSyncEncoder::end_frame(long, long, std::vector<RefCountedFrame, std::allocator<RefCountedFrame> > const&) (quicksync_encoder.cpp:2183)
==14794==    by 0x1A3766: VideoEncoder::end_frame(long, long, std::vector<RefCountedFrame, std::allocator<RefCountedFrame> > const&) (video_encoder.cpp:133)
==14794==    by 0x16E0FD: Mixer::render_one_frame(long) (mixer.cpp:819)
==14794==  Address 0x34552520 is 48 bytes inside a block of size 256 alloc'd
==14794==    at 0x4C2DFE5: calloc (vg_replace_malloc.c:711)
==14794==    by 0x1F6D0188: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:805)
==14794==    by 0x1F18C66D: miptree_create (intel_mipmap_tree.c:715)
==14794==    by 0x1F18BD19: intel_miptree_create (intel_mipmap_tree.c:739)
==14794==    by 0x1F1950F7: intel_miptree_create_for_teximage (intel_tex_image.c:88)
==14794==    by 0x1F1940DC: intel_alloc_texture_image_buffer (intel_tex.c:95)
==14794==    by 0x1F194DF5: intelTexImage (intel_tex_image.c:119)
==14794==    by 0x1EEDE3D6: teximage (teximage.c:3066)
==14794==    by 0x1EEDF1CF: _mesa_TexImage2D (teximage.c:3105)
==14794==    by 0x61078BC: movit::ResourcePool::create_2d_texture(int, int, int) (resource_pool.cpp:379)
==14794==    by 0x60FBDC9: movit::EffectChain::execute_phase(movit::Phase*, bool, std::set<int, std::less<int>, std::allocator<int> >*, std::map<movit::Phase*, unsigned int, std::less<movit::Phase*>, std::allocator<std::pair<movit::Phase* const, unsigned int> > >*, std::set<movit::Phase*, std::less<movit::Phase*>, std::allocator<movit::Phase*> >*) (effect_chain.cpp:1881)
==14794==    by 0x60FC37B: movit::EffectChain::render_to_fbo(unsigned int, unsigned int, unsigned int) (effect_chain.cpp:1785)
==14794==  Block was alloc'd by thread #20

Here's a race between drawing and uploading a texture:

==14794== Possible data race during write of size 1 at 0x3314FF69 by thread #1
==14794== Locks held: none
==14794==    at 0x1F6CE387: do_bo_emit_reloc (intel_bufmgr_gem.c:1984)
==14794==    by 0x1F6CE62C: drm_intel_gem_bo_emit_reloc (intel_bufmgr_gem.c:2066)
==14794==    by 0x1F17539E: brw_emit_surface_state (brw_wm_surface_state.c:169)
==14794==    by 0x1F176EFB: brw_update_texture_surface (brw_wm_surface_state.c:629)
==14794==    by 0x1F1770D7: update_stage_texture_surfaces (brw_wm_surface_state.c:1258)
==14794==    by 0x1F1771DB: brw_update_texture_surfaces (brw_wm_surface_state.c:1289)
==14794==    by 0x1F16C5C8: check_and_emit_atom (brw_state_upload.c:763)
==14794==    by 0x1F16C5C8: brw_upload_pipeline_state (brw_state_upload.c:876)
==14794==    by 0x1F16C5C8: brw_upload_render_state (brw_state_upload.c:898)
==14794==    by 0x1F14E9F8: brw_try_draw_prims (brw_draw.c:584)
==14794==    by 0x1F14E9F8: brw_draw_prims (brw_draw.c:675)
==14794==    by 0x1EF24A79: vbo_draw_arrays (vbo_exec_array.c:467)
==14794==    by 0x60FBC35: movit::EffectChain::execute_phase(movit::Phase*, bool, std::set<int, std::less<int>, std::allocator<int> >*, std::map<movit::Phase*, unsigned int, std::less<movit::Phase*>, std::allocator<std::pair<movit::Phase* const, unsigned int> > >*, std::set<movit::Phase*, std::less<movit::Phase*>, std::allocator<movit::Phase*> >*) (effect_chain.cpp:1956)
==14794==    by 0x60FC37B: movit::EffectChain::render_to_fbo(unsigned int, unsigned int, unsigned int) (effect_chain.cpp:1785)
==14794==    by 0x12EBA7: render_to_screen (effect_chain.h:346)
==14794==    by 0x12EBA7: GLWidget::paintGL() (glwidget.cpp:109)
==14794==
==14794== This conflicts with a previous write of size 1 by thread #20
==14794== Locks held: none
==14794==    at 0x1F6CE387: do_bo_emit_reloc (intel_bufmgr_gem.c:1984)
==14794==    by 0x1F6CE62C: drm_intel_gem_bo_emit_reloc (intel_bufmgr_gem.c:2066)
==14794==    by 0x1F17539E: brw_emit_surface_state (brw_wm_surface_state.c:169)
==14794==    by 0x1F176EFB: brw_update_texture_surface (brw_wm_surface_state.c:629)
==14794==    by 0x1F1770D7: update_stage_texture_surfaces (brw_wm_surface_state.c:1258)
==14794==    by 0x1F1771DB: brw_update_texture_surfaces (brw_wm_surface_state.c:1289)
==14794==    by 0x1F16C5C8: check_and_emit_atom (brw_state_upload.c:763)
==14794==    by 0x1F16C5C8: brw_upload_pipeline_state (brw_state_upload.c:876)
==14794==    by 0x1F16C5C8: brw_upload_render_state (brw_state_upload.c:898)
==14794==    by 0x1F14E9F8: brw_try_draw_prims (brw_draw.c:584)
==14794==    by 0x1F14E9F8: brw_draw_prims (brw_draw.c:675)
==14794==  Address 0x3314ff69 is 233 bytes inside a block of size 256 alloc'd
==14794==    at 0x4C2DFE5: calloc (vg_replace_malloc.c:711)
==14794==    by 0x1F6D0188: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:805)
==14794==    by 0x1F18C66D: miptree_create (intel_mipmap_tree.c:715)
==14794==    by 0x1F18BD19: intel_miptree_create (intel_mipmap_tree.c:739)
==14794==    by 0x1F1950F7: intel_miptree_create_for_teximage (intel_tex_image.c:88)
==14794==    by 0x1F1940DC: intel_alloc_texture_image_buffer (intel_tex.c:95)
==14794==    by 0x1F194DF5: intelTexImage (intel_tex_image.c:119)
==14794==    by 0x1EEDE3D6: teximage (teximage.c:3066)
==14794==    by 0x1EEDF1CF: _mesa_TexImage2D (teximage.c:3105)
==14794==    by 0x166B3E: operator() (mixer.cpp:458)
==14794==    by 0x166B3E: std::_Function_handler<void (), Mixer::bm_frame(unsigned int, unsigned short, bmusb::FrameAllocator::Frame, unsigned long, bmusb::VideoFormat, bmusb::FrameAllocator::Frame, unsigned long, bmusb::AudioFormat)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (functional:1740)
==14794==    by 0x16F824: operator() (functional:2136)
==14794==    by 0x16F824: Mixer::thread_func() (mixer.cpp:592)
==14794==    by 0xA95590E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==14794==  Block was alloc'd by thread #20

And here's a race between rendering to an FBO and rendering to the screen (FBO 0):

==14794== ----------------------------------------------------------------
==14794==
==14794== Possible data race during write of size 4 at 0x2B206718 by thread #20
==14794== Locks held: none
==14794==    at 0x1F19624A: intel_update_max_level (intel_tex_validate.c:55)
==14794==    by 0x1F19624A: intel_finalize_mipmap_tree (intel_tex_validate.c:88)
==14794==    by 0x1F196583: brw_validate_textures (intel_tex_validate.c:199)
==14794==    by 0x1F14E46B: brw_try_draw_prims (brw_draw.c:448)
==14794==    by 0x1F14E46B: brw_draw_prims (brw_draw.c:675)
==14794==    by 0x1EF24A79: vbo_draw_arrays (vbo_exec_array.c:467)
==14794==    by 0x60FBC35: movit::EffectChain::execute_phase(movit::Phase*, bool, std::set<int, std::less<int>, std::allocator<int> >*, std::map<movit::Phase*, unsigned int, std::less<movit::Phase*>, std::allocator<std::pair<movit::Phase* const, unsigned int> > >*, std::set<movit::Phase*, std::less<movit::Phase*>, std::allocator<movit::Phase*> >*) (effect_chain.cpp:1956)
==14794==    by 0x60FC37B: movit::EffectChain::render_to_fbo(unsigned int, unsigned int, unsigned int) (effect_chain.cpp:1785)
==14794==    by 0x16E037: Mixer::render_one_frame(long) (mixer.cpp:805)
==14794==    by 0x16F88F: Mixer::thread_func() (mixer.cpp:598)
==14794==    by 0xA95590E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==14794==    by 0x4C31D06: mythread_wrapper (hg_intercepts.c:389)
==14794==    by 0x72BC463: start_thread (pthread_create.c:333)
==14794==    by 0xB2209DE: clone (clone.S:105)
==14794==
==14794== This conflicts with a previous write of size 4 by thread #1
==14794== Locks held: none
==14794==    at 0x1F19624A: intel_update_max_level (intel_tex_validate.c:55)
==14794==    by 0x1F19624A: intel_finalize_mipmap_tree (intel_tex_validate.c:88)
==14794==    by 0x1F196583: brw_validate_textures (intel_tex_validate.c:199)
==14794==    by 0x1F14E46B: brw_try_draw_prims (brw_draw.c:448)
==14794==    by 0x1F14E46B: brw_draw_prims (brw_draw.c:675)
==14794==    by 0x1EF24A79: vbo_draw_arrays (vbo_exec_array.c:467)
==14794==    by 0x60FBC35: movit::EffectChain::execute_phase(movit::Phase*, bool, std::set<int, std::less<int>, std::allocator<int> >*, std::map<movit::Phase*, unsigned int, std::less<movit::Phase*>, std::allocator<std::pair<movit::Phase* const, unsigned int> > >*, std::set<movit::Phase*, std::less<movit::Phase*>, std::allocator<movit::Phase*> >*) (effect_chain.cpp:1956)
==14794==    by 0x60FC37B: movit::EffectChain::render_to_fbo(unsigned int, unsigned int, unsigned int) (effect_chain.cpp:1785)
==14794==    by 0x12EBA7: render_to_screen (effect_chain.h:346)
==14794==    by 0x12EBA7: GLWidget::paintGL() (glwidget.cpp:109)
==14794==    by 0x4070C53: QGLWidget::glDraw() (in /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5.7.1)
==14794==  Address 0x2b206718 is 1,048 bytes inside a block of size 1,088 alloc'd
==14794==    at 0x4C2DFE5: calloc (vg_replace_malloc.c:711)
==14794==    by 0x1F194291: intelNewTextureObject (intel_tex.c:35)
==14794==    by 0x1EEE3A81: create_textures (texobj.c:1227)
==14794==    by 0x173504: PBOFrameAllocator::PBOFrameAllocator(unsigned long, unsigned int, unsigned int, unsigned long, unsigned int, unsigned int, unsigned int) (pbo_frame_allocator.cpp:38)
==14794==    by 0x16A11B: Mixer::configure_card(unsigned int, bmusb::CaptureInterface*, bool) (mixer.cpp:273)
==14794==    by 0x16BD4E: Mixer::Mixer(QSurfaceFormat const&, unsigned int) (mixer.cpp:179)
==14794==    by 0x12E52B: operator() (glwidget.cpp:54)
==14794==    by 0x12E52B: _M_invoke<> (functional:1400)
==14794==    by 0x12E52B: operator() (functional:1389)
==14794==    by 0x12E52B: void std::__once_call_impl<std::_Bind_simple<GLWidget::initializeGL()::{lambda()#1} ()> >() (mutex:587)
==14794==    by 0x72C3778: __pthread_once_slow (pthread_once.c:116)
==14794==    by 0x12EEE8: __gthread_once (gthr-default.h:699)
==14794==    by 0x12EEE8: call_once<GLWidget::initializeGL()::<lambda()> > (mutex:619)
==14794==    by 0x12EEE8: GLWidget::initializeGL() (glwidget.cpp:58)
==14794==    by 0x407067C: QGLWidget::glInit() (in /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5.7.1)
==14794==    by 0x40762DB: QGLWidget::resizeEvent(QResizeEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5OpenGL.so.5.7.1)
==14794==    by 0x4FDE62D: QWidget::event(QEvent*) (in /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.7.1)
==14794==  Block was alloc'd by thread #1

There are more, but they seem similar, and possibly harmless (like drm_intel_gem_bo_busy() racing against itself to set what's basically just a cached flag).
Comment 1 GitLab Migration User 2019-09-25 18:59:03 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1549.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.