Bug 90730 - The value of cl_event->ref_n is 2 initially, this causes a memory leak
Summary: The value of cl_event->ref_n is 2 initially, this causes a memory leak
Status: RESOLVED NOTABUG
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: All All
: medium normal
Assignee: Zhigang Gong
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-28 22:30 UTC by TÖRÖK Attila
Modified: 2015-05-29 01:25 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description TÖRÖK Attila 2015-05-28 22:30:14 UTC
The cl_event_new function in cl_event.c returns an event with an initial ref_n value of 2, which causes it to never be deleted, when used correctly.

The problem is that upon initialization, the ref_n field gets a value of 1, but in the end the "constructor" will call the cl_event_add_ref on the event, which will increase the value of ref_n to 2, and this is obviously not correct.

So either the value to which it is initialized in the beginning should be 0, or the cl_event_add_ref function should not be called on it in cl_event_new.

The bug can be demonstrated with this little code for example: https://gist.github.com/torokati44/ca6da2112c0d784f4e79
It does nothing, but executes an empty kernel a whole lot of times in succession, and it seems like there is nothing wrong with it.

After around 65000 (which is a value that made me think of an overflow) executions, the result of clEnqueueNDRangeKernel will be CL_OUT_OF_RESOURCES, but only if the context is created with profiling enabled.

This is because the timestamp buffer of the internal gpgpu event (which is allocated only if profiling is enabled) will not be freed for any of the created events. The event structures themselves leak too, but those are not so noticeable.

Applying either one of the two fixes suggested above makes the problem disappear.

So yeah, please fix it, I had a wonderful 3 days hunting it down. :)

Attila
Comment 1 TÖRÖK Attila 2015-05-29 01:16:33 UTC
Well, further testing has shown that simply applying one of my suggestions causes some other, quite strange errors in cases that are more complicated than the linked little example. (Like a segmentation fault upon entering the clWaitForEvents function, but before executing its first line.)
Also, some utests fail.
I'm not sure if it is just a local build error, but you will surely know better.
And the leakage should still be taken care of somehow.
Thank you in advance.
Comment 2 Zhigang Gong 2015-05-29 01:21:05 UTC
(In reply to TÖRÖK Attila from comment #0)
> The cl_event_new function in cl_event.c returns an event with an initial
> ref_n value of 2, which causes it to never be deleted, when used correctly.
> 
> The problem is that upon initialization, the ref_n field gets a value of 1,
> but in the end the "constructor" will call the cl_event_add_ref on the
> event, which will increase the value of ref_n to 2, and this is obviously
> not correct.
> 
> So either the value to which it is initialized in the beginning should be 0,
> or the cl_event_add_ref function should not be called on it in cl_event_new.
> 
> The bug can be demonstrated with this little code for example:
> https://gist.github.com/torokati44/ca6da2112c0d784f4e79
> It does nothing, but executes an empty kernel a whole lot of times in
> succession, and it seems like there is nothing wrong with it.
> 
> After around 65000 (which is a value that made me think of an overflow)
> executions, the result of clEnqueueNDRangeKernel will be
> CL_OUT_OF_RESOURCES, but only if the context is created with profiling
> enabled.
> 
> This is because the timestamp buffer of the internal gpgpu event (which is
> allocated only if profiling is enabled) will not be freed for any of the
> created events. The event structures themselves leak too, but those are not
> so noticeable.
> 
> Applying either one of the two fixes suggested above makes the problem
> disappear.
> 
> So yeah, please fix it, I had a wonderful 3 days hunting it down. :)
> 
> Attila

This is not a bug, you need to release the event after the clWaitForEvents.
Comment 3 TÖRÖK Attila 2015-05-29 01:25:27 UTC
Oh, yes. You are right. How embarrassing. Sorry for wasting your time. I guess I'll have to RTFM next time.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.