Bug 98645

Summary: X Freeze while rendering video with multiple displays and TearFree enabled
Product: DRI Reporter: Charlotte Manning <charlotte0m>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: julien.isorce, n770galaxy
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
system diagnostics
none
X Server log with xf86-video-ati master and DRI3
none
Script to build mesa an x server from master none

Description Charlotte Manning 2016-11-08 22:37:52 UTC
Created attachment 127849 [details]
system diagnostics

This was first observed while rendering video in our application after upgrading to Ubuntu 16.04 and enabling TearFree setting.  We have reproduced the issue outside of our application with a minimal code sample:
    https://github.com/charo-m/4K_displaywall_bench/tree/amd_xfreeze_repro

steps to reproduce:
git clone https://github.com/charo-m/4K_displaywall_bench.git
git checkout amd_xfreeze_repro
sudo apt-get install libxrandr-dev libxinerama-dev libxcursor-dev libglfw3-dev libglm-dev
cmake .
make
./4K_displaywall_bench -width 3840 -height 1080 -swap_interval 1 -gl 3 -i images/bubble4K.png


X.Org X Server 1.18.4
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT GL [FirePro W9000]
OpenGL renderer string: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)
OpenGL version string: 3.0 Mesa 11.2.0
OpenGL core profile version string: 4.1 (Core Profile) Mesa 11.2.0
OS:    Ubuntu 16.04.1 LTS
Kernel:  4.4.0-45-generic

Two 2K monitors:
  1920x1080 @ 60Hz  (not rotated)  DP1 at pos 0x0, DP2 at pos 1920x0
Server hardware:
  Dell  Precision T5600

**** Option “TearFree” “on”  ****


The issue does not seem to happen with a single monitor.  The code sample re-uploads the same image data every frame with glTexSubImage2D.  The "-swap_interval [0|1]" option maps to glXSwapIntervalEXT(0) which disables vsync or glXSwapIntervalEXT(1) which enables vsync.  The issue seems to be a race condition and so is more likely to happen under certain conditions (at least two monitors, T5600, tearfree enabled).

single monitor:
./4K_displaywall_bench -width 1920 -height 1080 -swap_interval 1 -gl 3 -i images/bubble4K.png   -> unlikely to happen at all / never reproduced

two monitors, viewport covers both, vsync enabled:
./4K_displaywall_bench -width 3840 -height 1080 -swap_interval 1 -gl 3 -i images/bubble4K.png   -> most likely to happen (usually within 15 min, but not always)

two monitors, viewport only covers first, vsync enabled:
./4K_displaywall_bench -width 1920 -height 1080 -swap_interval 1 -gl 3 -i images/bubble4K.png   -> unlikely to happen

two monitors, viewport covers both, vsync disabled:
./4K_displaywall_bench -width 3840 -height 1080 -swap_interval 0 -gl 3 -i images/bubble4K.png   -> unlikely to happen


stacktrace:
Thread 2 (Thread 0x7f61abb7e700 (LWP 8249)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f61afed8213 in cnd_wait (mtx=0xdef5b0, cond=0xdef5d8)
    at ../../../../../../include/c11/threads_posix.h:159
#2  pipe_semaphore_wait (sema=0xdef5b0) at ../../../../../../src/gallium/auxiliary/os/os_thread.h:259
#3  radeon_drm_cs_emit_ioctl (param=param@entry=0xdef180)
    at ../../../../../../src/gallium/winsys/radeon/drm/radeon_drm_winsys.c:688
#4  0x00007f61afed7937 in impl_thrd_routine (p=<optimized out>) at ../../../../../../include/c11/threads_posix.h:87
#5  0x00007f61b34ea6fa in start_thread (arg=0x7f61abb7e700) at pthread_create.c:333
#6  0x00007f61b3d48b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f61b4ecb740 (LWP 8248)):
#0  0x00007f61b3d3ce8d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f61b289cc62 in poll (__timeout=-1, __nfds=1, __fds=0x7ffdbd4045a0)
    at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  _xcb_conn_wait (c=c@entry=0xd64000, cond=cond@entry=0x7ffdbd4046c0, vector=vector@entry=0x0, 
    count=count@entry=0x0) at ../../src/xcb_conn.c:459
#3  0x00007f61b289e617 in wait_for_reply (c=c@entry=0xd64000, request=279249, e=e@entry=0x7ffdbd404790)
    at ../../src/xcb_in.c:516
#4  0x00007f61b289e721 in xcb_wait_for_reply (c=c@entry=0xd64000, request=279249, e=e@entry=0x7ffdbd404790)
    at ../../src/xcb_in.c:546
#5  0x00007f61b373da47 in _XReply (dpy=dpy@entry=0xd62cf0, rep=rep@entry=0x7ffdbd404810, extra=extra@entry=0, 
    discard=discard@entry=0) at ../../src/xcb_io.c:602
#6  0x00007f61b1c3a29a in DRI2GetBuffersWithFormat (dpy=0xd62cf0, drawable=2097159, width=width@entry=0xf52e38, 
    height=height@entry=0xf52e3c, attachments=0x7ffdbd4049b0, count=1, outCount=0x7ffdbd404970)
    at ../../../src/glx/dri2.c:491
#7  0x00007f61b1c3a5d7 in dri2GetBuffersWithFormat (driDrawable=<optimized out>, width=0xf52e38, height=0xf52e3c, 
    attachments=<optimized out>, count=<optimized out>, out_count=0x7ffdbd404970, loaderPrivate=0xf53df0)
    at ../../../src/glx/dri2_glx.c:900
#8  0x00007f61afb4674a in dri2_drawable_get_buffers (count=<synthetic pointer>, atts=0xf54550, drawable=0xf53ee0)
    at ../../../../../src/gallium/state_trackers/dri/dri2.c:213
#9  dri2_allocate_textures (ctx=0xd971f0, drawable=0xf53ee0, statts=0xf54550, statts_count=2)
    at ../../../../../src/gallium/state_trackers/dri/dri2.c:407
#10 0x00007f61afb42f9c in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, 
    statts=0xf54550, count=2, out=0x7ffdbd404ae0)
    at ../../../../../src/gallium/state_trackers/dri/dri_drawable.c:83
#11 0x00007f61afa693be in st_framebuffer_validate (stfb=0xf540f0, st=st@entry=0xf4acc0)
    at ../../../src/mesa/state_tracker/st_manager.c:202
#12 0x00007f61afa6a929 in st_manager_validate_framebuffers (st=st@entry=0xf4acc0)
    at ../../../src/mesa/state_tracker/st_manager.c:877
#13 0x00007f61afa0fa12 in st_validate_state (st=st@entry=0xf4acc0, pipeline=pipeline@entry=ST_PIPELINE_RENDER)
    at ../../../src/mesa/state_tracker/st_atom.c:235
#14 0x00007f61afa17a61 in st_Clear (ctx=0xf06460, mask=18) at ../../../src/mesa/state_tracker/st_cb_clear.c:393
#15 0x0000000000405605 in proto::Scene::Draw() ()
#16 0x0000000000402ca5 in main ()
Comment 1 Michel Dänzer 2016-11-09 00:50:06 UTC
Does the problem also occur with DRI3 enabled?

Does it still happen with current xf86-video-ati Git master?
Comment 2 Josep Torra 2016-11-09 01:30:40 UTC
Yes, it occurs with DRI3 enabled and with last week master of xf86-video-ati.

I'll double check again tomorrow.
Comment 3 Josep Torra 2016-11-09 12:42:41 UTC
Tried again with xf86-video-ati master and DRI3 with similar results.

Application stack strace:

#0  0x00007fcbeb83eb5d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fcbe840ec62 in poll (__timeout=-1, __nfds=1, __fds=0x7ffe81b03b80) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  _xcb_conn_wait (c=c@entry=0x32fb8b0, cond=cond@entry=0x3314dd8, vector=vector@entry=0x0, count=count@entry=0x0) at ../../src/xcb_conn.c:459
#3  0x00007fcbe84109a9 in xcb_wait_for_special_event (c=0x32fb8b0, se=0x3314db0) at ../../src/xcb_in.c:789
#4  0x00007fcbedab16b7 in dri3_find_back (draw=draw@entry=0x3313838) at ../../../src/loader/loader_dri3_helper.c:380
#5  0x00007fcbedab2460 in dri3_get_buffer (driDrawable=<optimized out>, draw=0x3313838, buffer_type=loader_dri3_buffer_back, format=4099) at ../../../src/loader/loader_dri3_helper.c:1193
#6  loader_dri3_get_buffers (driDrawable=<optimized out>, format=4099, stamp=0x3314f70, loaderPrivate=0x3313838, buffer_mask=<optimized out>, buffers=0x7ffe81b03dd0)
    at ../../../src/loader/loader_dri3_helper.c:1370
#7  0x00007fcbde5372b3 in dri_image_drawable_get_buffers (statts_count=<optimized out>, statts=<optimized out>, images=<optimized out>, drawable=<optimized out>)
    at ../../../../../src/gallium/state_trackers/dri/dri2.c:279
#8  dri2_allocate_textures (ctx=0x33e1310, drawable=0x3314f70, statts=0x34d5cd0, statts_count=2) at ../../../../../src/gallium/state_trackers/dri/dri2.c:402
#9  0x00007fcbde533f9c in dri_st_framebuffer_validate (stctx=<optimized out>, stfbi=<optimized out>, statts=0x34d5cd0, count=2, out=0x7ffe81b03f60)
    at ../../../../../src/gallium/state_trackers/dri/dri_drawable.c:83
#10 0x00007fcbde45a3be in st_framebuffer_validate (stfb=0x34d5870, st=st@entry=0x34cdae0) at ../../../src/mesa/state_tracker/st_manager.c:202
#11 0x00007fcbde45b680 in st_api_make_current (stapi=<optimized out>, stctxi=0x34cdae0, stdrawi=0x3314f70, streadi=0x3314f70) at ../../../src/mesa/state_tracker/st_manager.c:783
#12 0x00007fcbde533a71 in dri_make_current (cPriv=<optimized out>, driDrawPriv=0x33e0c60, driReadPriv=0x33e0c60) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:245
#13 0x00007fcbde532a46 in driBindContext (pcp=<optimized out>, pdp=<optimized out>, prp=<optimized out>) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:532
#14 0x00007fcbedaace9a in dri3_bind_context (context=0x33152b0, old=<optimized out>, draw=<optimized out>, read=<optimized out>) at ../../../src/glx/dri3_glx.c:214
#15 0x00007fcbeda803b5 in MakeContextCurrent (dpy=0x32fa660, draw=draw@entry=2097155, read=read@entry=2097155, gc_user=gc_user@entry=0x33152b0) at ../../../src/glx/glxcurrent.c:228

X Server stack trace:

#0  0x00007faf254b59e3 in select () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055dadffb0807 in WaitForSomething (pClientsReady=pClientsReady@entry=0x55dae09a9000) at ../../os/WaitFor.c:227
#2  0x000055dadfe54b8e in Dispatch () at ../../dix/dispatch.c:359
#3  0x000055dadfe58dd3 in dix_main (argc=5, argv=0x7fff4ccafac8, envp=<optimized out>) at ../../dix/main.c:300
#4  0x00007faf253d9830 in __libc_start_main (main=0x55dadfe43010 <main>, argc=5, argv=0x7fff4ccafac8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff4ccafab8)
    at ../csu/libc-start.c:291
#5  0x000055dadfe43049 in _start ()
Comment 4 Josep Torra 2016-11-09 12:44:09 UTC
Created attachment 127866 [details]
X Server log with xf86-video-ati master and DRI3
Comment 5 Josep Torra 2016-11-09 23:06:53 UTC
Created attachment 127885 [details]
Script to build mesa an x server from master

We had built mesa and X server master branches with this script and issue is also reproducible.
Comment 6 Josep Torra 2016-11-15 09:50:14 UTC
As per discussion in IRC we tried with radeon.msi=0 and this seems to solve the problem. But this leads to think that the root cause of the problem is in the kernel driver and needs to be addressed there.
Comment 7 Alex Deucher 2016-11-15 14:23:12 UTC
MSI problems tend to be platform problems.  What system is this?  What system chipset?
Comment 8 Josep Torra 2016-11-15 18:52:50 UTC
It's most often reproduced in the Dell T5600 systems with "Intel Corporation C600/X79 series chipset" and FirePro W9000.

But we also reproduced it few other systems based on chipset "Intel Corporation C610/X99 series chipset" and FirePro W600.

Not reproducible so far with an Intel Skylake system, Asus Z170 Pro gaming motherboard, with FirePro W600.

In the Dell T5600 the issue becomes hidden with the drm debug enabled in the following way.

echo 0xf > /sys/module/drm/parameters/debug

It's still triggered with 0xe instead of 0xf.
Comment 9 Martin Peres 2019-11-19 09:19:38 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/753.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.