Bug 106337

Summary: eglWaitClient() doesn't work as documented using DRI2 backend
Product: Mesa Reporter: Mike Gorchak <mgorchak>
Component: EGLAssignee: Tapani Pälli <lemody>
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: normal    
Priority: medium    
Version: 18.0   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: fix attempt
fix attempt v2

Description Mike Gorchak 2018-05-01 17:23:40 UTC
According to EGL 1.4 specification eglWaitClient() should be equivalent of glFinish() call, but according to the function code of dri2_wait_client() it does just flush() without waiting for any pending operations on drawable surface.
Comment 1 Tapani Pälli 2018-05-02 04:50:46 UTC
(In reply to mgorchak@qnx.com from comment #0)
> According to EGL 1.4 specification eglWaitClient() should be equivalent of
> glFinish() call, but according to the function code of dri2_wait_client() it
> does just flush() without waiting for any pending operations on drawable
> surface.

Do you see issues with some particular driver? The flush() implementation should take care that pending operations are done.
Comment 2 Mike Gorchak 2018-05-02 09:11:54 UTC
Yes, it is i965 DRI driver on ApolloLake (BXT) hardware.

The test was following: BO object (LINEAR) is shared between compositor and GLES app (used as KHRImage for native pixmap). GLES app does fill operation for this BO object and compositor reads back content of BO object using CPU (mapped as WC memory). Very often appears that fill operation is not complete and compositor reads back old content or semi-filled content.

Performance of fill operation using eglWaitClient() for full hd image size is about 1300 fills per second, when glFinish() is used it is about 700 FPS and content of BO object is always correct.
Comment 3 Tapani Pälli 2018-05-02 10:16:13 UTC
Yeah, this is the case. We could change this to use flush_with_flags and have a new flag that causes us to wait for last batchbuffer. I can give this a shot.
Comment 4 Tapani Pälli 2018-05-02 11:29:56 UTC
Created attachment 139271 [details] [review]
fix attempt

Would this fix things for you?
Comment 5 Mike Gorchak 2018-05-02 11:46:19 UTC
Will test within 3-4 hours and let you know. Thank you very much for the prompt fix!
Comment 6 Mike Gorchak 2018-05-02 16:41:45 UTC
I was able to test your changes and had to add following addition to the intel_screen.c:

@@ -171,7 +176,7 @@
 }

 static const struct __DRI2flushExtensionRec intelFlushExtension = {
-    .base = { __DRI2_FLUSH, 4 },
+    .base = { __DRI2_FLUSH, 5 },

     .flush              = intel_dri2_flush,
     .invalidate         = dri2InvalidateDrawable,

Now I can confirm that it flushes all data to drawable surface and waits for it properly. Speed has been decreases dramatically, only a bit better than glFinish(). I think we cannot do too much with it.

Another "issue", which I'm not sure if it is issue or expected behavior, related to this topic: when FBO is used together with surfaceless contexts. 

eglWaitClient() bails out with error if surfaceless contexts are in use to draw to FBO. Is this expected behavior?

Specification says: "All rendering calls for the currently bound context, for the current rendering API, made prior to eglWaitClient are guaranteed to be executed before native rendering calls made after eglWaitClient." and it doesn't mention "surfaces", only "contexts".

Usually most people create dummy 1x1 pbuffers for FBO rendering, but surfaceless contexts are more convenient.
Comment 7 Tapani Pälli 2018-05-02 18:31:54 UTC
(In reply to Mike Gorchak from comment #6)
> I was able to test your changes and had to add following addition to the
> intel_screen.c:
> 
> @@ -171,7 +176,7 @@
>  }
> 
>  static const struct __DRI2flushExtensionRec intelFlushExtension = {
> -    .base = { __DRI2_FLUSH, 4 },
> +    .base = { __DRI2_FLUSH, 5 },

sorry did not remember to bump the version
 
>      .flush              = intel_dri2_flush,
>      .invalidate         = dri2InvalidateDrawable,
> 
> Now I can confirm that it flushes all data to drawable surface and waits for
> it properly. Speed has been decreases dramatically, only a bit better than
> glFinish(). I think we cannot do too much with it.

OK, yeah depending on the usecase you could perhaps do multibuffering
 
> Another "issue", which I'm not sure if it is issue or expected behavior,
> related to this topic: when FBO is used together with surfaceless contexts. 
> 
> eglWaitClient() bails out with error if surfaceless contexts are in use to
> draw to FBO. Is this expected behavior?
> 
> Specification says: "All rendering calls for the currently bound context,
> for the current rendering API, made prior to eglWaitClient are guaranteed to
> be executed before native rendering calls made after eglWaitClient." and it
> doesn't mention "surfaces", only "contexts".

Right, this feels like another bug.
Comment 8 Tapani Pälli 2018-05-03 05:53:30 UTC
Created attachment 139292 [details] [review]
fix attempt v2

I thought about this more and I guess ideally we should just call glFinish(), this way we will ensure same behaviour. Mike, I would appreciate if you can test this one too.
Comment 9 Mike Gorchak 2018-05-03 13:03:38 UTC
Just checked generic path - all works fine, here is very minor fix to the provided fix:

+      _eglLog(_EGL_WARNING, "DRI2: failed to find glFlush entry point");

should be changed glFlush to glFinish :)

Am I understand right that this fix is not supposed to fix FBOs + surfaceless context issue? It happens that in case of surfaceless context dri2_wait_client() is not called at all and aborted at _eglWaitClientCommon() function with following check:

   /* let bad current context imply bad current surface */
   if (_eglGetContextHandle(ctx) == EGL_NO_CONTEXT ||
       _eglGetSurfaceHandle(ctx->DrawSurface) == EGL_NO_SURFACE)
      RETURN_EGL_ERROR(disp, EGL_BAD_CURRENT_SURFACE, EGL_FALSE);
Comment 10 Tapani Pälli 2018-05-03 13:05:40 UTC
(In reply to Mike Gorchak from comment #9)
> Just checked generic path - all works fine, here is very minor fix to the
> provided fix:
> 
> +      _eglLog(_EGL_WARNING, "DRI2: failed to find glFlush entry point");
> 
> should be changed glFlush to glFinish :)

Thanks, I'll send this one to the list.

> Am I understand right that this fix is not supposed to fix FBOs +
> surfaceless context issue? It happens that in case of surfaceless context
> dri2_wait_client() is not called at all and aborted at
> _eglWaitClientCommon() function with following check:
> 
>    /* let bad current context imply bad current surface */
>    if (_eglGetContextHandle(ctx) == EGL_NO_CONTEXT ||
>        _eglGetSurfaceHandle(ctx->DrawSurface) == EGL_NO_SURFACE)
>       RETURN_EGL_ERROR(disp, EGL_BAD_CURRENT_SURFACE, EGL_FALSE);

Yes that is a separate issue, please file another bug about that one.
Comment 11 Mike Gorchak 2018-05-03 13:23:59 UTC
> Yes that is a separate issue, please file another bug about that one.

Done! Bug 106377.

Thank you!
Comment 12 Tapani Pälli 2018-05-17 07:53:15 UTC
There has been some discussion in the mesa-dev mailing list, consensus there was one should rather use glFinish on the client side in this case.

Mesa-dev discussion:
https://lists.freedesktop.org/archives/mesa-dev/2018-May/194808.html

Khronos discussion:
https://gitlab.khronos.org/egl/API/issues/12
Comment 13 GitLab Migration User 2019-09-18 18:07:34 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/162.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.