Bug 75061

Summary: bug in clearing color buffer
Product: Mesa Reporter: Maxim <ya.maxis11>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: simonandric5
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: program which seg fault.
gdb full backtrace
valgrind + hellgrind
valgrind only
Bail if rr->buf is NULL
gdb with patch
valgrind only (with patch)
valgrind + helgrind (with patch)
possible fix
possible fix 2

Description Maxim 2014-02-16 17:22:00 UTC
Created attachment 94169 [details]
program which seg fault.

I have example program, that creates 2 context for 2 threads. Program runs 2-3 seconds and crashes (segmentation fault). It crashes in glClear( GL_COLOR_BUFFER_BIT ); Exactly in: rctx->gtt += rr->buf->size; (114 line; function: r600_context_add_resource_size; file: r600_context_adr600_buffer_common.c) Flags: autogen.sh --with-dri-drivers="" --enable-gbm --enable-gallium-gbm --enable-glx-tls --with-gallium-drivers=r600 --enable-driglx-direct --enable-opencl --enable-opencl-icd --with-llvm-shared-libs --enable-r600-llvm-compiler --enable-openvg --enable-gles2 --enable-gles1 --prefix=/usr/local --enable-xa --enable-gallium-egl --enable-texture-float --enable-selinux
Comment 1 Michel Dänzer 2014-02-17 06:45:35 UTC
Please attach a backtrace of the crash from gdb.

Also, running the program in valgrind (with/out --tool=helgrind) might give some hints.
Comment 2 Maxim 2014-02-18 10:10:10 UTC
Created attachment 94275 [details]
gdb full backtrace
Comment 3 Maxim 2014-02-18 10:10:42 UTC
Created attachment 94276 [details]
valgrind + hellgrind
Comment 4 Maxim 2014-02-18 10:11:14 UTC
Created attachment 94277 [details]
valgrind only
Comment 5 Michel Dänzer 2014-02-19 04:03:03 UTC
Created attachment 94321 [details] [review]
Bail if rr->buf is NULL

Does this patch help?

(In reply to comment #3)
> valgrind + hellgrind

Thanks, but the proper spelling is 'helgrind', so it was still using memcheck as you can see. Just as a note for next time.
Comment 6 Maxim 2014-02-20 09:43:27 UTC
(In reply to comment #5)
> Created attachment 94321 [details] [review] [review]
> Bail if rr->buf is NULL
> 
> Does this patch help?
> 
> (In reply to comment #3)
> > valgrind + hellgrind
> 
> Thanks, but the proper spelling is 'helgrind', so it was still using
> memcheck as you can see. Just as a note for next time.

SIGSEGV in evergreen_emit_vertex_buffers (evergreen_state.c:2479)
Comment 7 Maxim 2014-02-20 09:43:56 UTC
Created attachment 94418 [details]
gdb with patch
Comment 8 Maxim 2014-02-20 09:44:35 UTC
Created attachment 94419 [details]
valgrind only (with patch)
Comment 9 Maxim 2014-02-20 09:52:36 UTC
Created attachment 94422 [details]
valgrind + helgrind (with patch)
Comment 10 Maxim 2014-02-23 09:09:24 UTC
Any progress?
Comment 11 Michel Dänzer 2014-02-24 07:04:30 UTC
AFAICT the problem is that both threads access the same struct r600_resource concurrently. It might be relatively easy to avoid the crashes by updating the buf member atomically in r600_init_resource() instead of setting it to NULL first in r600_invalidate_buffer(), but I suspect there could be more subtle issues with other members, in particular valid_buffer_range.

Marek, any thoughts on how to solve this?
Comment 12 Marek Olšák 2014-02-24 12:04:42 UTC
All writes to valid_buffer_range are protected by a mutex. Only the reads are not.

I've got no idea what to do with invalidate_buffer. If we added mutexes everywhere, it would slow down the driver.

I think that calling BufferData in one thread and using the buffer for rendering in some other thread is a race condition in the application and should be fixed in the app.
Comment 13 Marek Olšák 2014-02-24 12:36:17 UTC
If the program called MapBufferRange(MAP_INVALIDATE_BUFFER_BIT) instead of BufferData, I think it would be okay to crash:

From GL 4.4 spec:

"MAP_INVALIDATE_BUFFER_BIT indicates that the previous contents of the entire buffer may be discarded. Data within the entire buffer are undefined with the exception of subsequently written data. No GL error is generated if subsequent GL operations access unwritten data, but the result is undefined and system errors (possibly including program termination) may occur."

In other words, no context should access the buffer until UnmapBuffer() is complete, because there may be unwritten data.
Comment 14 Marek Olšák 2014-03-08 22:39:36 UTC
Created attachment 95375 [details] [review]
possible fix

Could you please try this patch?
Comment 15 Marek Olšák 2014-03-08 22:57:28 UTC
Created attachment 95378 [details] [review]
possible fix 2

Please test this one. The previous patch wasn't correct.
Comment 16 Maxim 2014-09-17 23:27:29 UTC
I'm sorry that I leave this bug. Just read comment 13 and realized that it was my fault. Do I need to test this patch?
Comment 17 Vedran Miletić 2017-03-22 15:24:02 UTC
Was this fixed?
Comment 18 GitLab Migration User 2019-09-18 19:15:06 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/495.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.