Bug 91082 - Crash/Fail when calling glFenceSync
Summary: Crash/Fail when calling glFenceSync
Status: RESOLVED INVALID
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 98293 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-06-24 11:45 UTC by Damian Dixon
Modified: 2017-02-10 22:39 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
drm.debug=0xe dmesg output (380.57 KB, text/plain)
2016-10-10 11:17 UTC, Damian Dixon
Details
aub dump (15.51 KB, application/x-bzip)
2016-10-10 12:33 UTC, Damian Dixon
Details

Description Damian Dixon 2015-06-24 11:45:39 UTC
Mesa Version: 10.2.7
CentOS: 7 updated as of today (24 June 2015).


Crashes when calling:

170	    // Insert a fence to mark when this chunk is available for reuse.
171	    chunk->m_fence = glFenceSync( GL_SYNC_GPU_COMMANDS_COMPLETE, 0 );


partial stack trace:

#14 0x00007ffff4736e49 in __run_exit_handlers () from /lib64/libc.so.6
#15 0x00007ffff4736e95 in exit () from /lib64/libc.so.6
#16 0x00007fffd44bf2e6 in _intel_batchbuffer_flush.part.3 () from /usr/lib64/dri/i965_dri.so
#17 0x00007fffd42ee809 in _mesa_FenceSync () from /usr/lib64/dri/i965_dri.so
#18 0x00007ffff753d2c2 in GLHelpers::RingBuffer::returnChunk (this=0x11eb090, chunk=0x1203930)
    at /projects/maplink_ogl_vector/SDK/OpenGLDrawingSurface/core/utility/ringbuffer.cpp:171


The following is reported on the console:

intel_do_flush_locked failed: No such file or directory


The code works on Windows when using NVIDIA and AMD drivers. Not that that is much help other than to indicate that the code we've written at least works elsewhere...

I have to return the hardware by the end of the week.
Comment 1 Damian Dixon 2015-06-24 12:35:58 UTC
Further testing has shown that this problem occurs when I have multiple OpenGL contexts in different threads.

This approach works on Windows 7/8.1 and on Linux with AMD and NVIDIA drivers. It also works on Windows 8.1 with the intel drivers.

This is a major limitation of the intel driver as this stops applications from being able to upload and draw in a back-ground thread while maintaining a responsive GUI interface.
Comment 2 Kenneth Graunke 2015-06-24 22:55:10 UTC
Mesa 10.2 is pretty ancient - the 10.2 series is originally from June 2014.  The last bug fixes to that branch were in September 2014.

Could you try with Mesa 10.6 (the most recent release)?
Comment 3 Eero Tamminen 2016-09-12 13:06:26 UTC
(In reply to Damian Dixon from comment #0)
> The following is reported on the console:
> 
> intel_do_flush_locked failed: No such file or directory

I.e. flushing batch failed and libdrm returned error from kernel ioctl().

"No such file or directory" means that user-space gave kernel an invalid handle.

To find out what exactly is the problem, you need to add "drm.debug=0xe" to kernel command line and provide "dmesg" output after the problem.


(In reply to Damian Dixon from comment #1)
> Further testing has shown that this problem occurs when I have multiple
> OpenGL contexts in different threads.
> 
> This approach works on Windows 7/8.1 and on Linux with AMD and NVIDIA
> drivers. It also works on Windows 8.1 with the intel drivers.

My first guess would be that something in your program had already closed the handle when glFenceSync() was called.

On which Intel HW this happened?


(In reply to Kenneth Graunke from comment #2)
> Mesa 10.2 is pretty ancient - the 10.2 series is originally from June 2014. 
> The last bug fixes to that branch were in September 2014.
> 
> Could you try with Mesa 10.6 (the most recent release)?

Some time has passed since, Mesa 11.2 is now common in distros.  Could test that, or later, or provide minimized version of your use-case for testing?
Comment 4 Yury Zhuravlev 2016-10-07 21:41:49 UTC
Hello!

I have a similar problem. 
When KWin call glFenceSync I get crash and message:
No provider of glFenceSync found

Really new problem because I am syncing mesa from master every week or month. 

PS Haswell
Comment 5 Damian Dixon 2016-10-08 08:51:15 UTC
The problem still happens for me with 12.1 ON HD4600.
Comment 6 Damian Dixon 2016-10-08 09:14:05 UTC
Creating a small sample would not be easy. I tried using apitrace but it did not like the Mesa code calling exit. I'll try this again after commenting out the exit.
Comment 7 Yury Zhuravlev 2016-10-08 16:44:16 UTC
Sorry about my post I have found true bug and solution:
https://github.com/anholt/libepoxy/issues/25
Comment 8 Damian Dixon 2016-10-10 11:17:39 UTC
Created attachment 127163 [details]
drm.debug=0xe dmesg output
Comment 9 Damian Dixon 2016-10-10 11:19:02 UTC
I have added the dmesg output after enabling drm.debug.

The application works as follows:

In the main thread we create an OpenGL context. We then start a background thread with its own X11 connections and then create an OpenGL context in this thread with sharing of textures turned on.

The main thread and the background thread will not be terminated until the application is terminated. Neither thread will use the others X11 connection nor its OpenGL context.

The background threads draws into an offscreen texture. This texture is passed to the foreground thread which will then display the texture on the screen using two triangles.

The idea is to ensure that the GUI is always responsive regardless of the complexity of the rendering.
Comment 10 Damian Dixon 2016-10-10 11:24:53 UTC
(In reply to Damian Dixon from comment #6)
> Creating a small sample would not be easy. I tried using apitrace but it did
> not like the Mesa code calling exit. I'll try this again after commenting
> out the exit.

I tried apitrace again and it does not allow me to replicate the issue.
Comment 11 Damian Dixon 2016-10-10 12:33:47 UTC
Created attachment 127164 [details]
aub dump

Running with the latest intel-gup-tools in git this morning:

SimpleGLSurfaceSample]$ intel_aubdump  --output=trace.aub -v ./simpleglsample maps/NaturalEarthRaster/NaturalEarthRaster.map
[intel_aubdump: intercept drm ioctl on fd 13]
[intel_aubdump: running, output file trace.aub, chipset id 0x0412, gen 7]
drm_intel_gem_bo_context_exec 0, 0x9e5800 0x9d1220 1
drm_intel_gem_bo_context_exec 0, 0x9e5c80 0x9d1220 1
Max Vertex Uniform Components   16384
Max Geometry Uniform Components 16384
Max Fragment Uniform Components 16384
Max Uniform Locations           98304
[intel_aubdump: intercept drm ioctl on fd 15]
Max Vertex Uniform Components   16384
Max Geometry Uniform Components 16384
Max Fragment Uniform Components 16384
Max Uniform Locations           98304
Dbg::lastError : code 2002 : category : 4 : msg : 
[intel_aubdump: intercept drm ioctl on fd 13]
Bus error
Comment 12 Damian Dixon 2016-10-10 12:38:33 UTC
The version of Mesa I am currently using is Mesa 12.1.0-devel (git-a602601)

git clone -b i965-fp64-gen7-scalar-vec4-rc2 https://github.com/Igalia/mesa.git .


Basically we are tracking the changes for fp64 as we require FP64 and transform feedback 2.

If you need me to use a different version I can do so.
Comment 13 Damian Dixon 2016-10-17 12:31:18 UTC
It appears that one of the patches attached to:

https://bugs.freedesktop.org/show_bug.cgi?id=71759

fixes this problem for me.

Thanks to Mark Young and Greg Fischer at LunarG for pointing me to this.

The patch that worked is as follows:

Fabrice Bellet, change to libDRM:
intel/intel_bufmgr_gem.c
https://bugs.freedesktop.org/attachment.cgi?id=122880


The other patch does not help in this case though it does not make things worse.

https://bugs.freedesktop.org/attachment.cgi?id=126988
Comment 14 Damian Dixon 2016-10-17 13:10:04 UTC
I tried to move this to libdrm but failed.

I've created a new bug on libdrm pointing back to this and the patch that worked for me.

https://bugs.freedesktop.org/show_bug.cgi?id=98293
Comment 15 Matt Turner 2016-11-02 06:49:31 UTC
The commit (a599b1c2037ac8aca6c92350c8a7b3e42c81deaa) that fixed bug 71759 is in mesa-13.0.0. Please test that and mark the bug as REOPENED
if you can reproduce and RESOLVED/* if you cannot reproduce.
Comment 16 Matt Turner 2016-11-02 06:49:44 UTC
*** Bug 98293 has been marked as a duplicate of this bug. ***
Comment 17 Annie 2017-02-10 22:39:01 UTC
Dear Reporter,

This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested.

Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.