Bug 55282

Summary: Crash in drm_intel_gem_bo_unreference() in intel_bufmgr_gem.c
Product: libva Reporter: Gautam <manamgautam>
Component: intelAssignee: haihao <haihao.xiang>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: medium CC: ben, chris, daniel, greaterd, jbarnes, remidesmarais, seanvk
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: sample application to play videos[MP4 with H264 codec] from current directory. It has scrpit file to build and run.
Kernel configuration file.
Patch for mutex in gen6_mfd_free_avc_surface
updated patch

Description Gautam 2012-09-24 12:54:23 UTC
Created attachment 67629 [details]
sample application to play videos[MP4 with H264 codec] from current directory. It has scrpit file to build and run.

Hardware:-
 Using x86_64 kernel and user land.
 CPU and GPU: Intel(R) Core(TM) i3-2105 CPU i965 chipset
 linux 3.5.0
Packages used :-
  packages are used from  http://intellinuxgraphics.org/2012.07.html .
  Driver name: intel-driver
  Driver source code repository: http://cgit.freedesktop.org/vaapi/intel-driver/
  Driver version: 1.0.18 (latest stable)

Crash Details:-
  when we run the sample application[wall] to play videos with hw accelerated support for decoding [gst-vaapi] and render through cluttersink the application crashes randomly with the following error
wall: intel_bufmgr_gem.c:1116: drm_intel_gem_bo_unreference: Assertion`((&bo_gem->refcount)->atomic) > 0' failed.

Sample application takes .mp4 files with [H264 codec] from the current directory and plays one by one repeatedly.

Reproducibility :-80% [sometimes within 5 to 15 mins]

Observation:-
             The problem was observed as a double free triggered by a race condition in gen6_mfd_free_avc_surface. The two functions i965_PutSurface() and i965_EndPicture() from i965_drv_video.c directly/indirectly calls gen6_mfd_free_avc_surface() simultanelously from two threads.
            
One of the functions unreferences the variable "gen6_avc_surface->dmv_top" and at the same time other function gets assertion saying reference count as zero.

If we bypass this assertion, after null check for the variable "gen6_avc_surface" in function gen6_mfd_free_avc_surface() both i965_PutSurface() and i965_EndPicture() functions try to free the same pointer.

Solution:-
           We fixed this issue by adding mutex in the function gen6_mfd_free_avc_surface().
Comment 1 Gautam 2012-09-26 13:34:46 UTC
Created attachment 67728 [details]
Kernel configuration file.

3.5.0 kernel configuration file.
Comment 2 Gautam 2012-09-26 13:35:55 UTC
OS: vanilla 3.5.0 linux kernel
No external patch were applied to the kernel We are using 3rd party drivers for which we have full source code.
Comment 3 haihao 2012-09-27 07:31:08 UTC
(In reply to comment #0)
> Solution:-
>            We fixed this issue by adding mutex in the function
> gen6_mfd_free_avc_surface().

Where is the patch ?
Comment 4 Gautam 2012-09-27 09:29:22 UTC
Created attachment 67759 [details]
Patch for mutex in gen6_mfd_free_avc_surface

patch contains mutex added in the gen6_mfd_free_avc_surface function.
Comment 5 Gautam 2012-10-12 09:07:36 UTC
Created attachment 68477 [details]
updated patch

Fix was incomplete in previous patch . so updated it
Comment 6 Gautam 2012-10-12 09:13:09 UTC
The packages used are

clutter-gst: 1.5.6
gstreamer-0.10.36
gst-plugins-base-0.10.36
gst-plugins-bad-0.10.23
gst-plugins-good-0.10.31
gst-plugins-ugly-0.10.19
gstreamer-vaapi:0.10 verison 0.3.7
libva-1.1.0
intel-driver 1.0.18
xorg-server-1.12.1
linux 3.5.0
Comment 7 haihao 2012-10-24 08:31:14 UTC
The patch looks good to me although I still can't reproduce the issue.  I pushed your patch with some modification to fix the same issue on other platforms

Thanks a lot.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.