Created attachment 94169 [details]
program which seg fault.
I have example program, that creates 2 context for 2 threads. Program runs 2-3 seconds and crashes (segmentation fault). It crashes in glClear( GL_COLOR_BUFFER_BIT ); Exactly in: rctx->gtt += rr->buf->size; (114 line; function: r600_context_add_resource_size; file: r600_context_adr600_buffer_common.c) Flags: autogen.sh --with-dri-drivers="" --enable-gbm --enable-gallium-gbm --enable-glx-tls --with-gallium-drivers=r600 --enable-driglx-direct --enable-opencl --enable-opencl-icd --with-llvm-shared-libs --enable-r600-llvm-compiler --enable-openvg --enable-gles2 --enable-gles1 --prefix=/usr/local --enable-xa --enable-gallium-egl --enable-texture-float --enable-selinux
Please attach a backtrace of the crash from gdb.
Also, running the program in valgrind (with/out --tool=helgrind) might give some hints.
Created attachment 94275 [details]
gdb full backtrace
Created attachment 94276 [details]
valgrind + hellgrind
Created attachment 94277 [details]
Created attachment 94321 [details] [review]
Bail if rr->buf is NULL
Does this patch help?
(In reply to comment #3)
> valgrind + hellgrind
Thanks, but the proper spelling is 'helgrind', so it was still using memcheck as you can see. Just as a note for next time.
(In reply to comment #5)
> Created attachment 94321 [details] [review] [review]
> Bail if rr->buf is NULL
> Does this patch help?
> (In reply to comment #3)
> > valgrind + hellgrind
> Thanks, but the proper spelling is 'helgrind', so it was still using
> memcheck as you can see. Just as a note for next time.
SIGSEGV in evergreen_emit_vertex_buffers (evergreen_state.c:2479)
Created attachment 94418 [details]
gdb with patch
Created attachment 94419 [details]
valgrind only (with patch)
Created attachment 94422 [details]
valgrind + helgrind (with patch)
AFAICT the problem is that both threads access the same struct r600_resource concurrently. It might be relatively easy to avoid the crashes by updating the buf member atomically in r600_init_resource() instead of setting it to NULL first in r600_invalidate_buffer(), but I suspect there could be more subtle issues with other members, in particular valid_buffer_range.
Marek, any thoughts on how to solve this?
All writes to valid_buffer_range are protected by a mutex. Only the reads are not.
I've got no idea what to do with invalidate_buffer. If we added mutexes everywhere, it would slow down the driver.
I think that calling BufferData in one thread and using the buffer for rendering in some other thread is a race condition in the application and should be fixed in the app.
If the program called MapBufferRange(MAP_INVALIDATE_BUFFER_BIT) instead of BufferData, I think it would be okay to crash:
From GL 4.4 spec:
"MAP_INVALIDATE_BUFFER_BIT indicates that the previous contents of the entire buffer may be discarded. Data within the entire buffer are undefined with the exception of subsequently written data. No GL error is generated if subsequent GL operations access unwritten data, but the result is undefined and system errors (possibly including program termination) may occur."
In other words, no context should access the buffer until UnmapBuffer() is complete, because there may be unwritten data.
Created attachment 95375 [details] [review]
Could you please try this patch?
Created attachment 95378 [details] [review]
possible fix 2
Please test this one. The previous patch wasn't correct.
I'm sorry that I leave this bug. Just read comment 13 and realized that it was my fault. Do I need to test this patch?
Was this fixed?
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/495.