Bug 71759 - Intel driver fails with "intel_do_flush_locked failed: No such file or directory" if buffer imported with EGL_NATIVE_PIXMAP_KHR
Summary: Intel driver fails with "intel_do_flush_locked failed: No such file or direct...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Martin Peres
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-18 20:24 UTC by Axel Davy
Modified: 2016-10-15 21:59 UTC (History)
26 users (show)

See Also:
i915 platform:
i915 features:


Attachments
INTEL_DEBUG=batch totem somefile.mp4 (92.49 KB, text/plain)
2015-12-09 07:58 UTC, Jan Alexander Steffens (heftig)
Details
dmesg with drm.debug=7 (DRI3) (222.28 KB, text/plain)
2016-02-27 17:45 UTC, Fabrice Bellet
Details
command output (DRI3) (183.83 KB, text/plain)
2016-02-27 17:47 UTC, Fabrice Bellet
Details
dmesg with drm.debug=7 (DRI2) (831.11 KB, text/plain)
2016-02-27 17:53 UTC, Fabrice Bellet
Details
command output (DRI2) (1.96 MB, text/plain)
2016-02-27 17:54 UTC, Fabrice Bellet
Details
intel: use the same bufmgr when opening the same device (1.41 KB, patch)
2016-04-12 14:03 UTC, Fabrice Bellet
Details | Splinter Review
i965: import prime buffers in the current context, not screen (2.68 KB, patch)
2016-08-03 10:15 UTC, Martin Peres
Details | Splinter Review
i965: import prime buffers in the current context, not screen (2.77 KB, patch)
2016-08-03 10:34 UTC, Martin Peres
Details | Splinter Review
Xorg log with the crash. (41.61 KB, text/plain)
2016-08-06 08:18 UTC, Ionut Biru
Details
[WIP] dri3: import prime buffers in the currently-bound screen (5.50 KB, patch)
2016-10-04 10:13 UTC, Martin Peres
Details | Splinter Review

Description Axel Davy 2013-11-18 20:24:23 UTC
Hello,

Glamor enables DDX to create a pixmap from a bo by passing a GEM name.

This is useful for DDX, since they like to have access to the bo to support DRI2 and other features.

I'm using gbm_bo for the XWayland wlglamor DDX.
 
Since I would like wlglamor to work on a render-node, and that render-nodes are not allowed to manipulates GEM names, I was trying to get Glamor to work with passing gbm_bo to create the pixmap, instead of GEM names.

That's in this situation I got the bug I describe below.


When import GEM names, what Glamor do is:
. use eglCreateImageKHR with the EGL_DRM_BUFFER_MESA parameter to get an EGLImage from the name
. create a texture from the image
. use the texture to render.

What I did is
. use eglCreateImageKHR with EGL_NATIVE_PIXMAP_KHR to get an EGLImage from the gbm_bo
. same than before

The only changes are to get the EGLImage.

But with that, the DDX won't work, and will get "intel_do_flush_locked failed: No such file or directory".

I've not debugged enough to have the precise location where it fails, but I know it isn't in eglCreateImageKHR, which returns with a valid image.

I've gone through the code for eglCreateImageKHR with EGL_NATIVE_PIXMAP_KHR,
and I've found nothing incorrect.

My bet is that something that is set when importing a name (perhaps kernel side?), isn't set when we import a gbm_bo (eglCreateImageKHR just duplicates the descriptor of the image contained in the gbm_bo)

I've tested my code with a radeon card, and it worked, so the bug is intel specific.
Comment 1 Axel Davy 2013-12-02 19:05:02 UTC
When doing the same thing conceptually, but a different way,
I don't get problems anymore.

With the Glamor patches to support DRI3, when creating a texture, a gbm_bo is created and imported as EGLImage, and then converted to texture.

This is similar to the use case I described which was hitting the bug, except the DDX was creating the gbm_bo and that here It doesn't have issues.

If I hit the bug again, I'll give more details about it, 
but for now there is no need to fix it (since the glamor DRI3 helpers suppress the need for wlglamor to create textured pixmaps itself).
Comment 2 Igor Gnatenko 2014-07-11 23:05:11 UTC
Hi,

I'm getting
intel_do_flush_locked failed: No such file or directory
when trying load video. now I have latest mesa git, x110drv0intel with enabled DRI3.
Comment 3 trondah 2014-07-29 20:47:13 UTC
Same here, I get:

intel_do_flush_locked failed: No such file or directory

When trying to play movies with totem/snappy.

Intel HD5000
Comment 4 Keith Packard 2014-09-15 14:46:35 UTC
Do you have a small example which fails so that we can try to reproduce it here?
Comment 5 Fabrice Bellet 2014-10-29 21:51:28 UTC
On a thinkpad X220 (Intel HD Graphics 3000), running the latest packages of the upcoming Fedora 21, updates-testing repo enabled, with gstreamer1-vaapi installed, and libva-intel-driver from rpmfusion :

$ wget http://samples.mplayerhq.hu/V-codecs/h264/NeroAVC.mp4
[...]
$ gst-launch-1.0 filesrc location=NeroAVC.mp4 ! qtdemux ! h264parse ! vaapidecode ! videoconvert ! cluttersink
libva info: VA-API version 0.35.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_35
libva info: va_openDriver() returns 0
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Got context from element 'vaapidecode0': gst.vaapi.Display=context, display=(GstVaapiDisplay)NULL;
Pipeline is PREROLLED ...
intel_do_flush_locked failed: No such file or directory

-> stack:

Breakpoint 1, do_flush_locked (brw=<optimized out>) at intel_batchbuffer.c:282
282	      fprintf(stderr, "intel_do_flush_locked failed: %s\n", strerror(-ret));
(gdb) bt
#0  0x00007fffe2c846ba in _intel_batchbuffer_flush (brw=<optimized out>) at intel_batchbuffer.c:282
#1  0x00007fffe2c846ba in _intel_batchbuffer_flush (brw=0x1e98408, file=0x0, 
    file@entry=0x7fffe2e05ca0 "brw_context.c", line=9330936, line@entry=231) at intel_batchbuffer.c:330
#2  0x00007fffe2c84a8f in _intel_batchbuffer_flush (brw=brw@entry=0x1e98408, file=file@entry=0x7fffe2e05ca0 "brw_context.c", line=line@entry=231) at intel_batchbuffer.c:295
#3  0x00007fffe2ca9855 in intel_glFlush (ctx=0x1e98408) at brw_context.c:231
#4  0x00007fffe29f2778 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../src/mesa/main/context.c:1629
#5  0x00007fffe2cab47f in intelUnbindContext (driContextPriv=<optimized out>) at brw_context.c:909
#6  0x00007fffe2c4da95 in driUnbindContext (pcp=0x1e423c0) at dri_util.c:579
#7  0x00007fffed4dd585 in MakeContextCurrent (dpy=0x98b810, draw=73400331, read=73400331, gc_user=0x628630) at glxcurrent.c:229
#8  0x00007fffed2bc5d0 in vaCopySurfaceGLX_impl_libva (ctx=0x1dd0d40, gl_surface=0x2e3aee0, surface=<optimized out>, flags=<optimized out>) at va_glx_impl.c:1060
#9  0x00007fffed75c52f in gst_vaapi_texture_put_surface () at /lib64/libgstvaapi-glx-1.4.so.0
#10 0x00007fffe8675f76 in clutter_gst_gl_texture_upload_upload (sink=0x8e60f8, buffer=0x7fffd80030b0)
    at ./clutter-gst-video-sink.c:1542
#11 0x00007fffe867787e in clutter_gst_source_dispatch (source=0x96d200, callback=0x0, user_data=0x8e60f8)
    at ./clutter-gst-video-sink.c:627
#12 0x00007ffff7385afb in g_main_context_dispatch (context=0x1d87bc0) at gmain.c:3111
#13 0x00007ffff7385afb in g_main_context_dispatch (context=context@entry=0x1d87bc0) at gmain.c:3710
#14 0x00007ffff7385e98 in g_main_context_iterate (context=0x1d87bc0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3781
#15 0x00007ffff73861c2 in g_main_loop_run (loop=0x1dd98e0) at gmain.c:3975
#16 0x00007ffff7b0773a in gst_bus_poll (bus=0x7635c0 [GstBus], events=27, timeout=0) at gstbus.c:1091
#17 0x00000000004046e8 in event_loop (pipeline=0x1dc2150 [GstPipeline], blocking=blocking@entry=0, do_progress=do_progress@entry=1, target_state=target_state@entry=GST_STATE_PLAYING) at gst-launch.c:512
#18 0x000000000040362b in main (argc=13, argv=0x7fffffffe008) at gst-launch.c:1062
(gdb) 

(same pipeline works fine on another Fedora-21 box, with an AMD card, and the vdpau vaapi driver.)
Comment 6 Fabrice Bellet 2014-10-30 15:03:58 UTC
I confirm this is related to DRI3: setting LIBGL_DISABLE_DRI3 environment variable, and it works for me.
Comment 7 Jan Alexander Steffens (heftig) 2015-12-09 07:58:01 UTC
Created attachment 120428 [details]
INTEL_DEBUG=batch totem somefile.mp4

I'm now seeing this when attempting to play videos in Totem with gstreamer-vaapi installed and DRI3 enabled.

totem 3.18.1
gstreamer-vaapi 0.7.0
libva* 1.6.1
gnome-shell 3.18.3
mesa 11.0.6
Xorg 1.18.0
xf86-video-intel 2.99.917-515-gda9ad38
libdrm 2.4.65
linux 4.3
Comment 8 Jan Alexander Steffens (heftig) 2015-12-09 21:20:19 UTC
100% reproducible on two systems with HSW.

gst-launch-1.0 filesink location=somefile.mp4 ! qtdemux ! vaapidecode ! glupload ! fakesink

seems to be the minimal pipeline needed to trigger the crash with a h.264 MP4 file.

Alternatively,

gst-launch-1.0 videotestsrc ! x264enc ! vaapidecode ! glupload ! fakesink

and

gst-launch-1.0 videotestsrc ! vaapiencode_h264 ! vaapidecode ! glupload ! fakesink

also crash.
Comment 9 Fabrice Bellet 2016-02-27 17:45:13 UTC
The bug is still there on a Fedora 23, with the vaapi intel driver. Triggered with this gstreamer pipeline : gst-launch-1.0 filesrc location=~/Downloads/NeroAVC.mp4 ! decodebin ! cluttersink

with NeroAVC.mp4 taken from http://samples.mplayerhq.hu/V-codecs/h264/

It happens when DRI3 is enabled (setting LIBGL_DRI3_DISABLE=1 is a possible workaround, as the code path in the intel driver is different in the DRI2 case, I provide a log in this case if needed).

Here is the output from the pipeline obtained with CLUTTER_DEBUG, COGL_DEBUG, GST_DEBUG set, with INTEL_DEBUG=bat,tex,dri, and with latest mesa from git master.

The problem is caused by a drmioctl returning -2 for command I915_GEM_EXECBUFFER2. I also provide a related log obtained with drm.debug=7.

Hope this helps,
Comment 10 Fabrice Bellet 2016-02-27 17:45:55 UTC
Created attachment 122003 [details]
dmesg with drm.debug=7 (DRI3)
Comment 11 Fabrice Bellet 2016-02-27 17:47:25 UTC
Created attachment 122004 [details]
command output (DRI3)
Comment 12 Fabrice Bellet 2016-02-27 17:53:37 UTC
Created attachment 122005 [details]
dmesg with drm.debug=7 (DRI2)
Comment 13 Fabrice Bellet 2016-02-27 17:54:12 UTC
Created attachment 122006 [details]
command output (DRI2)
Comment 14 Fabrice Bellet 2016-02-27 20:57:18 UTC
The -ENOENT value is returned from i915_gem_execbuffer_relocate_entry() :

Feb 27 21:53:16 bonobo.bellet.info kernel: Call Trace:
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff813b0c9f>] dump_stack+0x44/0x55
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa023bee4>] i915_gem_execbuffer_relocate_entry+0xbb/0x639 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff813da2fd>] ? swiotlb_map_sg_attrs+0x6d/0x130
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa023c4f8>] i915_gem_execbuffer_relocate_vma.isra.23+0x96/0xee [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01bc60a>] ? i915_gem_object_pin+0x3a/0x40 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01ab8c1>] ? i915_gem_execbuffer_reserve_vma.isra.18+0x91/0x150 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01abc9a>] ? i915_gem_execbuffer_reserve.isra.19+0x31a/0x360 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01aca9a>] i915_gem_do_execbuffer.isra.25+0xa5a/0x1310 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff810f96d9>] ? vprintk_default+0x29/0x40
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff811a99d2>] ? printk+0x57/0x73
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01adf82>] i915_gem_execbuffer2+0xb2/0x240 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa0031602>] drm_ioctl+0x152/0x540 [drm]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffffa01aded0>] ? i915_gem_execbuffer+0x310/0x310 [i915]
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff8133baac>] ? selinux_file_ioctl+0x10c/0x1c0
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff8123e7f8>] do_vfs_ioctl+0x298/0x480
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff81146d2b>] ? __audit_syscall_entry+0xab/0xf0
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff81333323>] ? security_file_ioctl+0x43/0x60
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff8123ea59>] SyS_ioctl+0x79/0x90
Feb 27 21:53:16 bonobo.bellet.info kernel:  [<ffffffff8179996e>] entry_SYSCALL_64_fastpath+0x12/0x71
Comment 15 Fabrice Bellet 2016-02-28 11:41:12 UTC
The missed relocation comes from there in userspace:

(gdb) bt
#0  0x00007fffe382e759 in do_bo_emit_reloc (bo=0x555555bfd570, offset=32356, target_bo=0x555555bfeaf0, target_offset=0, read_domains=4, write_domain=0, need_fence=false) at intel_bufmgr_gem.c:1968
#1  0x00007fffe382ea31 in drm_intel_gem_bo_emit_reloc (bo=0x555555bfd570, offset=<optimized out>, target_bo=0x555555bfeaf0, target_offset=<optimized out>, read_domains=<optimized out>, write_domain=<optimized out>) at intel_bufmgr_gem.c:2066
#2  0x00007fffe3829f65 in drm_intel_bo_emit_reloc (bo=<optimized out>, offset=<optimized out>, target_bo=<optimized out>, target_offset=<optimized out>, read_domains=<optimized out>, write_domain=<optimized out>) at intel_bufmgr.c:205
#3  0x00007fffe414923d in brw_update_texture_surface (ctx=0x555555b96b98, unit=0, surf_offset=0x555555bbc440, for_gather=false)	at brw_wm_surface_state.c:388
#4  0x00007fffe4149db1 in update_stage_texture_surfaces (brw=0x555555b96b98, prog=0x555555f39850, stage_state=0x555555bbc410, for_gather=false) at brw_wm_surface_state.c:849
#5  0x00007fffe4149ebd in brw_update_texture_surfaces (brw=0x555555b96b98) at brw_wm_surface_state.c:880
#6  0x00007fffe413fd7f in check_and_emit_atom (brw=0x555555b96b98, state=0x7fffffffd030, atom=0x555555bbcb90) at brw_state_upload.c:771
#7  0x00007fffe414021d in brw_upload_pipeline_state (brw=0x555555b96b98, pipeline=BRW_RENDER_PIPELINE) at brw_state_upload.c:865
#8  0x00007fffe414038d in brw_upload_render_state (brw=0x555555b96b98) at brw_state_upload.c:904
#9  0x00007fffe4120f61 in brw_try_draw_prims (ctx=0x555555b96b98, arrays=0x555555bf4b90, prims=0x555555bf2d78, nr_prims=1, ib=0x0, min_index=0,	max_index=3, indirect=0x0) at brw_draw.c:560
#10 0x00007fffe4121384 in brw_draw_prims (ctx=0x555555b96b98, prims=0x555555bf2d78, nr_prims=1, ib=0x0, index_bounds_valid=1 '\001', min_index=0, max_index=3, unused_tfb_object=0x0, stream=0, indirect=0x0) at brw_draw.c:650
#11 0x00007fffe3ece0f5 in vbo_exec_vtx_flush (exec=0x555555bf2598, keepUnmapped=1 '\001') at vbo/vbo_exec_draw.c:422
#12 0x00007fffe3ec6c6a in vbo_exec_FlushVertices_internal (exec=0x555555bf2598, unmap=1 '\001') at vbo/vbo_exec_api.c:624
#13 0x00007fffe3ec86d4 in vbo_exec_FlushVertices (ctx=0x555555b96b98, flags=1) at vbo/vbo_exec_api.c:1261
#14 0x00007fffe3d3ea21 in enable_texture (ctx=0x555555b96b98, state=0 '\000', texBit=1024) at main/enable.c:228
#15 0x00007fffe3d407a7 in _mesa_set_enable (ctx=0x555555b96b98,	cap=3553, state=0 '\000') at main/enable.c:683
#16 0x00007fffe3d4198b in _mesa_Disable (cap=3553) at main/enable.c:1048
#17 0x00007fffc470f326 in gl_unbind_texture (ts=0x555555e632f0)	at gstvaapiutils_glx.c:569
#18 0x00007fffc4710075 in gl_unbind_pixmap_object (pixo=0x555555e632e0) at gstvaapiutils_glx.c:990
#19 0x00007fffc470d98e in gst_vaapi_texture_glx_put_surface_unlocked (base_texture=0x5555559cf850, surface=0x7fffcc067450, crop_rect=0x7fffffffd420, flags=0) at gstvaapitexture_glx.c:391
#20 0x00007fffc470da85 in gst_vaapi_texture_glx_put_surface (texture=0x5555559cf850, surface=0x7fffcc067450, crop_rect=0x7fffffffd420, flags=0)	at gstvaapitexture_glx.c:413
#21 0x00007fffc4f6206d in gst_vaapi_texture_put_surface	(texture=0x5555559cf850, surface=0x7fffcc067450, crop_rect=0x7fffffffd420, flags=0) at gstvaapitexture.c:373
#22 0x00007fffc5226179 in gst_vaapi_texture_upload (meta=0x7fffbc006978, texture_id=0x7fffffffd4d0) at gstvaapivideometa_texture.c:200
#23 0x00007fffee3f74af in clutter_gst_gl_texture_upload_upload (sink=0x555555b7f8b0 [ClutterGstVideoSink], buffer=0x7fffd8047460) at ./clutter-gst-video-sink.c:1542
#24 0x00007fffee3f5d76 in clutter_gst_source_dispatch (source=0x555555b81f70, callback=0x0, user_data=0x0) at ./clutter-gst-video-sink.c:627
#25 0x00007ffff7376e3a in g_main_context_dispatch (context=0x5555559bf880) at gmain.c:3154
#26 0x00007ffff7376e3a in g_main_context_dispatch (context=context@entry=0x5555559bf880) at gmain.c:3769
#27 0x00007ffff73771d0 in g_main_context_iterate (context=0x5555559bf880, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3840
#28 0x00007ffff73774f2 in g_main_loop_run (loop=0x555555b84470) at gmain.c:4034
#29 0x00007ffff7afbf79 in gst_bus_poll (bus=0x55555578b3e0 [GstBus], events=<optimized out>, timeout=18446744073709551615) at gstbus.c:1153
#30 0x00005555555588b8 in event_loop (pipeline=0x555555b821f0 [GstPipeline], blocking=1, do_progress=1,	target_state=GST_STATE_PAUSED) at gst-launch.c:532
#31 0x0000555555557812 in main (argc=7, argv=0x7fffffffdb78) at gst-launch.c:1072

(gdb) print ((drm_intel_bo_gem *)brw->batch.bo)->name
$50 = 0x7fffe434165c "batchbuffer"
(gdb) print ((drm_intel_bo_gem *)brw->batch.bo)->gem_handle
$51 = 22
(gdb) print ((drm_intel_bo_gem *)mt->bo)->name 
$52 = 0x7fffe383ca93 "prime"
(gdb) print ((drm_intel_bo_gem *)mt->bo)->gem_handle
$53 = 1
Comment 16 Fabrice Bellet 2016-03-01 10:26:41 UTC
More debug hints: two glxcontext are created (one by cogl, and another by gstreamer-vaapi), and two different bufmgr are created. The failing relocation concerns a bufmgr from the other context, when intel_update_image_buffers() calls loader_dri3_get_buffers() -> dri3_get_pixmap_buffer() -> loader_dri3_create_image() -> intel_create_image_from_fds() -> drm_intel_bo_gem_create_from_prime(): the image->bo->bufmgr created there is different from brw->bufmgr.
Comment 17 Fabrice Bellet 2016-04-12 14:03:01 UTC
Created attachment 122880 [details] [review]
intel: use the same bufmgr when opening the same device

This is certainly an ugly fix but it works for me: using the same bufmgr when opening the same device in drm.
Comment 18 Pacho Ramos 2016-05-28 09:04:25 UTC
I am still suffering this bug causing totem to crash but the Fabrice's patch for libdrm fixed it. Thanks! :)

Could it be included please?
Comment 19 Pacho Ramos 2016-06-23 14:24:16 UTC
What is the problem with the patch causing this issue to be still unresolved? :|
Comment 20 Iiro Laiho 2016-07-10 20:26:07 UTC
@Ian Romanick,

You have been marked as "assignee" of this bug. Are you still developing for freedesktop.org? Fabrice seems to have done a quite bit of work troubleshooting this and writing the patch. Could you answer if the patch can be accepted or is there something wrong with it?

I bumped up the importance of this bug a notch, since it now seems to cause actual playback problems of h.264 videos to totem on Fedora:

https://bugzilla.redhat.com/show_bug.cgi?id=1309446
Comment 21 Martin Peres 2016-07-13 17:19:54 UTC
So, I finally took the time to go through the stack today and inspect everything. Thanks to Fabrice for your analysis and initial patch!

So, the goal of my analysis was to check who was opening the different fds, who created different contexts and where buffers would be. Turns out that while mesa gets its fd most of the time from the XServer (except when using PRIME), vaapi opens its own fd (dri2_util.c:198) based on the device name returned by DRI2 (yes, VAAPI does not support DRI3, which is cause for concern for users of the modesetting driver).

Cogl and vaapi then create a ton of contexts (one per texture :o). When comes the time to import in cogl the frame rendered by vaapi, prime is being used by cogl because it got a DRI3 context. The context that creates the texture is the one created for the texture (which has its own bufmgr, because the FDs did not match) but when the import is done by mesa, it is done by the screen's bufmgr ... which is of course not the same one as the one from the texture's context.

Now, here are the million dollars questions:
 - Why is intel_create_image_from_fds is using the screen's bufmgr instead of the current context's?
 - If there is no way around this issue, is this why there is code in libdrm to give away the same bufmgr when the fd is the same? 

If the answer to the second question is YES, I can see why it would work when dealing with mesa only (since the fd is received from the X server). However, this is non-satisfactory for libva which does not use mesa's code for dri2 but instead opens its own fd. Since we have to make sure that GL textures are shared, we need to make sure that the same bufmgr is given for all the contexts for the same GPU. Fabrice's solution is in this regard not complete because it assumes there is only one node exposed per GPU ... which is not true since the render nodes got introduced. On the modesetting driver, card0 is always picked (for DRI2 and DRI3) which means that Fabrice's solution would work, but only on modesetting. On xf86-video-intel, renderD128 is returned for DRI3 instead of card0, so the inode would differ. I will try to fix this tomorrow by using the new functions in libdrm to find the node type we want. In any case, this will have a severe performance impact on context creation time, so I will be sure to actually benchmark this!
Comment 22 Kenneth Graunke 2016-07-13 19:18:45 UTC
(In reply to Martin Peres from comment #21)
> Now, here are the million dollars questions:
>  - Why is intel_create_image_from_fds is using the screen's bufmgr instead
> of the current context's?

They're the same.  In brwCreateContext,

   brw->bufmgr = screen->bufmgr;
Comment 23 Martin Peres 2016-07-13 19:35:22 UTC
(In reply to Kenneth Graunke from comment #22)
> (In reply to Martin Peres from comment #21)
> > Now, here are the million dollars questions:
> >  - Why is intel_create_image_from_fds is using the screen's bufmgr instead
> > of the current context's?
> 
> They're the same.  In brwCreateContext,
> 
>    brw->bufmgr = screen->bufmgr;


Thanks Kenneth! I will resume this work tomorrow, it was getting a little late :)
Comment 24 Martin Peres 2016-07-14 15:20:09 UTC
Didn't get more than half an hour today on this, but Kenneth was right. I made the assumption that there would be only one screen created for all the GL contexts but it turns out that's where I was wrong. Some more tracing showed that the bufmgr allocation was indeed done only in CreateScreen2.

So, we have got 3 different bufmgr. One for totem's clutter GL context, one for VAAPI intel and one for clutter-gst. The rellocation issue happens when sharing from the clutter-gst to totem's clutter context. This means that the current patch would be sort of OK, assuming that both mesa would pick up the same DRI version (would only have been a problem for the intel ddx).

I will cook up a patch to make sure that we share the same bufmgr for both the render node and the normal node, otherwise we are in violation of the GL spec AFAIK. This is a bit nasty though to try to share buffers between two opengl context created from two different X connections :s
Comment 25 Behrang Saeedzadeh 2016-07-19 12:41:37 UTC
For some reason this bug crashes Totem. I think Totem should have instead reported an error (e.g. Cannot playback the current movie due to an internal error).

Also this error happens on openSUSE Tumbleweed, Gnome + X (no Wayland) if certain Intel packages are installed (can't remember for sure).
Comment 26 Iiro Laiho 2016-07-19 17:19:05 UTC
Behrang Saeedzadeh,

Yes, it would definitely be preferable to fail in a more graceful way. Preferably it would pass the error messages from libraries to users. You could consider making a bug report for GNOME project against Totem.

Hopefully Martin will be able to submit a patch for this.
Comment 27 Timo Gurr 2016-07-21 15:43:58 UTC
I don't want to hijack this thread but I think I run into the same error, just not with Totem but with obs-studio version 0.15.2 (https://obsproject.com/). It's reproducible for me when right after starting the application clicking on:

Sources -> Add -> Window Capture (Xcomposite) -> Create new -> click OK.

The application crashes and I can see the following lines on the command line:

info: source 'Fensteraufnahme (Xcomposite)' (xcomposite_input) created
intel_do_flush_locked failed: No such file or directory
QObject::~QObject: Timers cannot be stopped from another thread
Comment 28 Timo Gurr 2016-07-21 16:45:50 UTC
As a quick test I've applied the patch provided in Comment 17 (https://bugs.freedesktop.org/show_bug.cgi?id=71759#c17) to libdrm-2.4.69 and the function in obs-studio mentioned above doesn't crash the application anymore and seems to work for me now.
Comment 29 Martin Peres 2016-07-21 20:47:52 UTC
(In reply to Timo Gurr from comment #28)
> As a quick test I've applied the patch provided in Comment 17
> (https://bugs.freedesktop.org/show_bug.cgi?id=71759#c17) to libdrm-2.4.69
> and the function in obs-studio mentioned above doesn't crash the application
> anymore and seems to work for me now.

Good to know it affects more cases! Can you make an apitrace? That would be really helpful.
Comment 30 Timo Gurr 2016-07-22 11:18:14 UTC
(In reply to Martin Peres from comment #29)
> Good to know it affects more cases! Can you make an apitrace? That would be
> really helpful.

I never did that before, can you please point me to an easily understandable guide? Also feel free to email me directly in this regard to avoid spamming this thread with nonrelated content. Thanks!
Comment 31 Martin Peres 2016-07-22 11:29:52 UTC
(In reply to Timo Gurr from comment #30)
> (In reply to Martin Peres from comment #29)
> > Good to know it affects more cases! Can you make an apitrace? That would be
> > really helpful.
> 
> I never did that before, can you please point me to an easily understandable
> guide? Also feel free to email me directly in this regard to avoid spamming
> this thread with nonrelated content. Thanks!

Oh, then no worries, I just did it myself :)

That looks like another clean open source example. I will try to trace it next week and I should hopefully be able to zero it down and propose a fix. Been a bit busy this past two days.
Comment 32 Martin Peres 2016-08-03 10:15:45 UTC
Created attachment 125507 [details] [review]
i965: import prime buffers in the current context, not screen

Here is a new patch that should be way less hacky. More explanation about the issue is found as a comment in the code.

This patch fixes all the applications mentioned in this bug. I will use the patch a little on my machine before sending it to mesa-dev. Please do so too :)
Comment 33 Martin Peres 2016-08-03 10:34:28 UTC
Created attachment 125508 [details] [review]
i965: import prime buffers in the current context, not screen

v2 with a better comment.
Comment 34 Fabrice Bellet 2016-08-03 19:05:35 UTC
The patch works for me (Fedora 23, Mesa 11.1.0). Thanks a lot!
Comment 35 Ionut Biru 2016-08-06 08:17:18 UTC
(In reply to Martin Peres from comment #33)
> Created attachment 125508 [details] [review] [review]
> i965: import prime buffers in the current context, not screen
> 
> v2 with a better comment.


For me this patch crashes X, I do not use xf86-video-intel, only modesettings.
My setup is simple, Gnome, GDM starts in wayland mode but trying to login my username, with an X session, it crashes. 

If I install xf86-video-intel, X starts working fine.
Comment 36 Ionut Biru 2016-08-06 08:18:07 UTC
Created attachment 125574 [details]
Xorg log with the crash.
Comment 37 Martin Peres 2016-08-06 13:54:59 UTC
(In reply to Ionut Biru from comment #35)
> (In reply to Martin Peres from comment #33)
> > Created attachment 125508 [details] [review] [review] [review]
> > i965: import prime buffers in the current context, not screen
> > 
> > v2 with a better comment.
> 
> 
> For me this patch crashes X, I do not use xf86-video-intel, only
> modesettings.
> My setup is simple, Gnome, GDM starts in wayland mode but trying to login my
> username, with an X session, it crashes. 
> 
> If I install xf86-video-intel, X starts working fine.

Hmm, thanks for testing Ionut! I will take a look at this on monday.
Comment 38 Martin Peres 2016-08-08 10:10:57 UTC
(In reply to Martin Peres from comment #37)
> (In reply to Ionut Biru from comment #35)
> > (In reply to Martin Peres from comment #33)
> > > Created attachment 125508 [details] [review] [review] [review] [review]
> > > i965: import prime buffers in the current context, not screen
> > > 
> > > v2 with a better comment.
> > 
> > 
> > For me this patch crashes X, I do not use xf86-video-intel, only
> > modesettings.
> > My setup is simple, Gnome, GDM starts in wayland mode but trying to login my
> > username, with an X session, it crashes. 
> > 
> > If I install xf86-video-intel, X starts working fine.
> 
> Hmm, thanks for testing Ionut! I will take a look at this on monday.

I honestly tried my best to reproduce the issue. I used the Arch-provided X-server along with 12.0.1-6, as found in archive.archlinux.org, and could not reproduce.

The bug should be gpu-independent, but what GPU did you use?
Comment 39 Ionut Biru 2016-08-12 08:54:29 UTC
It's the one bundled on the CPU, I have Intel(R) Core(TM) i5-3450 .
lspci returns:
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
Comment 40 Iiro Laiho 2016-08-15 15:49:34 UTC
Shouldn't the status be something else than NEW?
Comment 41 Kenneth Graunke 2016-08-17 22:44:25 UTC
(In reply to Ionut Biru from comment #35)
> (In reply to Martin Peres from comment #33)
> > Created attachment 125508 [details] [review] [review] [review]
> > i965: import prime buffers in the current context, not screen
> > 
> > v2 with a better comment.
> 
> 
> For me this patch crashes X, I do not use xf86-video-intel, only
> modesettings.
> My setup is simple, Gnome, GDM starts in wayland mode but trying to login my
> username, with an X session, it crashes. 
> 
> If I install xf86-video-intel, X starts working fine.

I can't reproduce this.  I built Mesa master with Martin's patch applied, replaced my system i965_dri.so with that new one, ran "sudo systemctl gdm start", verified that it was indeed running GDM on Wayland, started a normal GNOME (X-based) session.  It worked fine.  I verified that it was using modesetting (from Archlinux's xorg-server 1.18.4 package).
Comment 42 Luke McKee 2016-09-05 20:27:14 UTC
As for the discussion about this defect not existing having trouble to replicate...

I'm a gentoo user. The Mesa 12.0.1 ebuild has this defect. 
"intel_do_flush_locked failed: No such file or directory", and system call trace showed some kind of locking error before totem borked.

To resolve this issue I used the https://bugs.freedesktop.org/attachment.cgi?id=125508 i965: import prime buffers in the current context, not screen patch from intel and now totem works using va-api acceleration.

Only mesa was rebuilt with the patch to resolve this issue.

This is the hardware I am running.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
stepping	: 3
microcode	: 0x19

00:02.0 0300: 8086:0412 (rev 06) (prog-if 00 [VGA controller])
	Subsystem: 1462:7850
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [a4] PCI Advanced Features
	Kernel driver in use: i915

If you need any more information on my setup to assist in replicating the defect let me know.
Comment 43 Stefan Dirsch 2016-09-19 10:06:57 UTC
Ionut,any chance to debug this issue via gdb on your machine?
Comment 44 Martin Peres 2016-09-19 10:10:44 UTC
(In reply to Stefan Dirsch from comment #43)
> Ionut,any chance to debug this issue via gdb on your machine?

I will have a new version of the patch next week, when XDC is done.

If it still is problematic for Ionut, then we can ask for more testing.

Sorry everyone for the delay, I was on vacation and now I am knee-deep into the organisation of XDC.
Comment 45 Ionut Biru 2016-09-21 11:29:19 UTC
(In reply to Martin Peres from comment #44)
> (In reply to Stefan Dirsch from comment #43)
> > Ionut,any chance to debug this issue via gdb on your machine?
> 
> I will have a new version of the patch next week, when XDC is done.
> 

Alright. Looking forward for a new version.
Comment 46 Martin Peres 2016-10-04 10:13:14 UTC
Created attachment 126988 [details] [review]
[WIP] dri3: import prime buffers in the currently-bound screen

Here is the new version of the patch which takes into account Chad and Kristian's feedback.

Ionut, could you please try it? It fixes the totem and obs case for me.
Comment 47 Timo Aaltonen 2016-10-05 14:28:23 UTC
fwiw, the patch works here too
Comment 48 Ionut Biru 2016-10-06 12:30:30 UTC
(In reply to Martin Peres from comment #46)
> Created attachment 126988 [details] [review] [review]
> [WIP] dri3: import prime buffers in the currently-bound screen
> 
> Here is the new version of the patch which takes into account Chad and
> Kristian's feedback.
> 
> Ionut, could you please try it? It fixes the totem and obs case for me.

Seems to work from what I can see. It doesn't crash X and totem and smplayer+mpv still work. I don't know what else I should test.
Comment 49 Martin Peres 2016-10-08 07:04:28 UTC
(In reply to Ionut Biru from comment #48)
> (In reply to Martin Peres from comment #46)
> > Created attachment 126988 [details] [review] [review] [review]
> > [WIP] dri3: import prime buffers in the currently-bound screen
> > 
> > Here is the new version of the patch which takes into account Chad and
> > Kristian's feedback.
> > 
> > Ionut, could you please try it? It fixes the totem and obs case for me.
> 
> Seems to work from what I can see. It doesn't crash X and totem and
> smplayer+mpv still work. I don't know what else I should test.

Very good. I reworked the patches and landed them in mesa. Thanks a lot everyone, it almost took three years to fix this bug :s
Comment 50 Pacho Ramos 2016-10-08 07:12:36 UTC
Thanks a lot :D

Will it be included in 12.0.4 or we will need to wait for the next major mesa version?

Best regards!
Comment 51 Martin Peres 2016-10-08 07:23:19 UTC
(In reply to Pacho Ramos from comment #50)
> Thanks a lot :D
> 
> Will it be included in 12.0.4 or we will need to wait for the next major
> mesa version?
> 
> Best regards!

I got the patchset reviewed by the release maintainer and he told me to CC: stable. So it should go to 12.0.4.
Comment 52 Damian Dixon 2016-10-12 08:12:42 UTC
(In reply to Behrang Saeedzadeh from comment #25)
> For some reason this bug crashes Totem. I think Totem should have instead
> reported an error (e.g. Cannot playback the current movie due to an internal
> error).
> 
> Also this error happens on openSUSE Tumbleweed, Gnome + X (no Wayland) if
> certain Intel packages are installed (can't remember for sure).

However this is valid if the two context's are in different threads as the safest thread safe way of using Xlib in multiple threads is to use multiple X11 Display connections along with XInitThreads.
Comment 53 Damian Dixon 2016-10-12 08:18:29 UTC
(In reply to Damian Dixon from comment #52)
> (In reply to Behrang Saeedzadeh from comment #25)
> > For some reason this bug crashes Totem. I think Totem should have instead
> > reported an error (e.g. Cannot playback the current movie due to an internal
> > error).
> > 
> > Also this error happens on openSUSE Tumbleweed, Gnome + X (no Wayland) if
> > certain Intel packages are installed (can't remember for sure).
> 
> However this is valid if the two context's are in different threads as the
> safest thread safe way of using Xlib in multiple threads is to use multiple
> X11 Display connections along with XInitThreads.

Sorry the above comment was actually against:


> I will cook up a patch to make sure that we share the same bufmgr for both the
> render node and the normal node, otherwise we are in violation of the GL spec
> AFAIK. This is a bit nasty though to try to share buffers between two opengl
> context created from two different X connections :s
Comment 54 Martin Peres 2016-10-12 08:24:54 UTC
(In reply to Damian Dixon from comment #53)
> (In reply to Damian Dixon from comment #52)
> > (In reply to Behrang Saeedzadeh from comment #25)
> > > For some reason this bug crashes Totem. I think Totem should have instead
> > > reported an error (e.g. Cannot playback the current movie due to an internal
> > > error).
> > > 
> > > Also this error happens on openSUSE Tumbleweed, Gnome + X (no Wayland) if
> > > certain Intel packages are installed (can't remember for sure).
> > 
> > However this is valid if the two context's are in different threads as the
> > safest thread safe way of using Xlib in multiple threads is to use multiple
> > X11 Display connections along with XInitThreads.
> 
> Sorry the above comment was actually against:
> 
> 
> > I will cook up a patch to make sure that we share the same bufmgr for both the
> > render node and the normal node, otherwise we are in violation of the GL spec
> > AFAIK. This is a bit nasty though to try to share buffers between two opengl
> > context created from two different X connections :s

Yeah, this is not what the pach I landed does. We are actually importing the buffer in the screen that the application set as current.
Comment 55 Matt Turner 2016-10-15 21:59:41 UTC
(In reply to Martin Peres from comment #51)
> (In reply to Pacho Ramos from comment #50)
> > Thanks a lot :D
> > 
> > Will it be included in 12.0.4 or we will need to wait for the next major
> > mesa version?
> > 
> > Best regards!
> 
> I got the patchset reviewed by the release maintainer and he told me to CC:
> stable. So it should go to 12.0.4.

commit a599b1c2037ac8aca6c92350c8a7b3e42c81deaa
Author: Martin Peres <martin.peres@linux.intel.com>
Date:   Thu Oct 6 17:10:35 2016 +0300

    loader/dri3: import prime buffers in the currently-bound screen


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.