Bug 97537 - nvc0 occasionally crashes in glDrawArrays in a multi-threaded/multi-context app
Summary: nvc0 occasionally crashes in glDrawArrays in a multi-threaded/multi-context app
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: 11.2
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
Depends on: 92077
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-29 20:41 UTC by Suzuki, Shinji
Modified: 2019-09-18 20:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Suzuki, Shinji 2016-08-29 20:41:11 UTC
Ubuntu 16.04.01 LTS (version of libgl1-mesa-dri is 11.2.0-1ubuntu2.1) / Nvidia Quadro K4000.

I am examining a sample app from SDK for a video capture card. The app is multi-threaded. The app runs fine for some time (line 10 - 20 mins) but it eventually crahes in glDrawArrays. I intend to try to come up with small code for reproduction if time permits but I'll attach the stack trace from the app for now in a hope someone can figure out what's going wrong.

(gdb) core glDrawArrays.core
warning: core file may not match specified executable file.
[New LWP 14181]
[New LWP 14175]
[New LWP 14179]
[New LWP 14180]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `{app-name-hidden}'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  nvc0_resource_validate (flags=<optimized out>, res=<optimized out>)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_screen.h:156
156     ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_screen.h: No such file or directory.
[Current thread is 1 (Thread 0x7fffee488700 (LWP 14181))]
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7fffee488700 (LWP 14181) nvc0_resource_validate (flags=<optimized out>,
    res=<optimized out>)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_screen.h:156
  2    Thread 0x7ffff7fd2740 (LWP 14175) 0x00007ffff741798d in recvmsg ()
    at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7fffef48a700 (LWP 14179) 0x00007ffff6684687 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:84
  4    Thread 0x7fffeec89700 (LWP 14180) 0x00007ffff7416867 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x611838 <gPboFifo+344>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
(gdb) where
#0  nvc0_resource_validate (flags=<optimized out>, res=<optimized out>)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_screen.h:156
#1  nvc0_bufctx_fence (nvc0=nvc0@entry=0x7842c0, bufctx=<optimized out>,
    on_flush=on_flush@entry=true)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_context.c:434
#2  0x00007ffff38b69c3 in nvc0_state_validate (nvc0=nvc0@entry=0x7842c0,
    mask=mask@entry=4294967295)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c:760
#3  0x00007ffff38c1dcd in nvc0_draw_vbo (pipe=0x7842c0, info=0x7fffee487890)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c:972
#4  0x00007ffff3572e77 in st_draw_vbo (ctx=0x7ae730, prims=<optimized out>, nr_prims=1,
    ib=0x0, index_bounds_valid=<optimized out>, min_index=0, max_index=3,
    tfb_vertcount=0x0, stream=0, indirect=0x0)
    at ../../../src/mesa/state_tracker/st_draw.c:288
#5  0x00007ffff353994a in vbo_draw_arrays (ctx=0x7ae730, mode=6, start=0, count=4,
    numInstances=1, baseInstance=0) at ../../../src/mesa/vbo/vbo_exec_array.c:497
#6  0x0000000000404e8b in renderToWindow (target=4) at main.cpp:825
#7  0x0000000000405efd in RenderThread (pArg=0x7fffffffe140) at main.cpp:1105
#8  0x00007ffff740e6fa in start_thread (arg=0x7fffee488700) at pthread_create.c:333
#9  0x00007ffff668eb5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) p res
$1 = <optimized out>
(gdb) up
#1  nvc0_bufctx_fence (nvc0=nvc0@entry=0x7842c0, bufctx=<optimized out>,
    on_flush=on_flush@entry=true)
    at ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_context.c:434
434     ../../../../../src/gallium/drivers/nouveau/nvc0/nvc0_context.c: No such file or directory.
(gdb) p res
$2 = (struct nv04_resource *) 0xfa
(gdb) p ref
$3 = (struct nouveau_bufref *) 0x786f18
(gdb) p ref->priv
$4 = (void *) 0xfa
(gdb)
Comment 1 Ilia Mirkin 2016-08-29 20:46:34 UTC
Does the app ever draw from multiple threads at once (into separate contexts)? If so, it's a known isse. You can use this branch, it may help:

https://github.com/imirkin/mesa/commits/locking
Comment 2 Suzuki, Shinji 2016-08-29 21:05:33 UTC
Yes, each thread writes to its own context without any explicit synchronization other than GL-fences to synchronize access on shared resources (texture and PBO) with.

Thank you for the info. I'll examine the branch and wet my feet in building mesa myself.
Comment 3 Suzuki, Shinji 2016-09-01 17:05:00 UTC
I managed to have glxinfo run with newly built libs through the use of LIBGL_DRIVERS_PATH and LD_LIBRARY_PATH env-vars. But once I copy them to the system dir (/usr/lib/x86_64_linux_gnu/{|mesa|dri}) and reboot then ibus-ui-gtk3 fails to load due to segfault. Since the libs also have an issue that glxinfo shows incorrect values for "Max core profile version" and "Max compat profile version", "OpenGL shading language version string", I guess I'd give Ubuntu 16.10 a try.
Comment 4 Ilia Mirkin 2016-09-01 22:19:31 UTC
(In reply to shinji.suzuki from comment #3)
> I managed to have glxinfo run with newly built libs through the use of
> LIBGL_DRIVERS_PATH and LD_LIBRARY_PATH env-vars. But once I copy them to the
> system dir (/usr/lib/x86_64_linux_gnu/{|mesa|dri}) and reboot then
> ibus-ui-gtk3 fails to load due to segfault. Since the libs also have an
> issue that glxinfo shows incorrect values for "Max core profile version" and
> "Max compat profile version", "OpenGL shading language version string", I
> guess I'd give Ubuntu 16.10 a try.

Just run the application in question with LD_LIBRARY_PATH=... (note - using LIBGL_DRIVERS_PATH is highly unadvisable unless you *really* know what you're doing. Just build with some prefix, install, and point LD_LIBRARY_PATH at that.)

Anyways, if doing your own mesa build is beyond your powers that's fine. We can keep this open and I can assume it'll magically get fixed when I get around to finishing up those locking patches.
Comment 5 Tomasz Paweł Gajc 2016-12-10 14:36:17 UTC
Looks like this is related to broken multi-threading in nouveau, see linked bugs.
Comment 6 GitLab Migration User 2019-09-18 20:43:03 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1110.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.