Bug 102655

Summary: [CI][HSW] igt@gem_flink_race@flink_close - Failed assertion: obj_count == 0
Product: DRI Reporter: Martin Peres <martin.peres>
Component: IGTAssignee: Default DRI bug account <dri-devel>
Status: CLOSED FIXED QA Contact:
Severity: critical    
Priority: high CC: intel-gfx-bugs, marta.lofstedt
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: HSW i915 features: GEM/Other

Description Martin Peres 2017-09-11 10:20:08 UTC
On CI_DRM_3063, the machine shard-hsw hits the following assert when running igt@gem_flink_race@flink_close:

(gem_flink_race:1665) CRITICAL: Test assertion failure function test_flink_close, file gem_flink_race.c:180:
(gem_flink_race:1665) CRITICAL: Failed assertion: obj_count == 0
(gem_flink_race:1665) CRITICAL: Last errno: 9, Bad file descriptor
(gem_flink_race:1665) CRITICAL: error: -1 != 0

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3063/shard-hsw3/igt@gem_flink_race@flink_close.html
Comment 1 Chris Wilson 2017-09-11 10:25:42 UTC
(gem_flink_race:1665) INFO: leaked -1 objects

If only all of our tests were that efficient! No more oom!
Comment 2 Chris Wilson 2017-09-11 10:35:10 UTC
Given all the floating references held on objects, I'm not sure if we can do a "stable_obj_count" without the loop and sleeping. I'm not sure of all the places that we do have such objects in a timer, so getting the interval right or adding a flush control is hard. E.g. one thing we are not flushing before counting are the kms workqueues which hold a reference on the old object until finished.
Comment 3 Chris Wilson 2017-09-13 15:55:57 UTC
*** Bug 102696 has been marked as a duplicate of this bug. ***
Comment 4 Marta Löfstedt 2017-10-13 05:57:07 UTC
Also, CI_DRM_3223 HSW-shards 

(prime_self_import:1525) CRITICAL: Test assertion failure function test_export_close_race, file prime_self_import.c:363:
(prime_self_import:1525) CRITICAL: Failed assertion: obj_count == 0
(prime_self_import:1525) CRITICAL: Last errno: 9, Bad file descriptor
(prime_self_import:1525) CRITICAL: error: -32 != 0
Subtest export-vs-gem_close-race failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3223/shard-hsw4/igt@prime_self_import@export-vs-gem_close-race.html

and: 
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_339/shard-hsw3/igt@prime_self_import@export-vs-gem_close-race.html
Comment 5 Chris Wilson 2017-10-17 09:48:11 UTC
Had a thought:
drm/i915: Flush the idle-worker for debugfs/i915_drop_caches
https://patchwork.freedesktop.org/patch/183116/
Comment 6 Chris Wilson 2017-10-19 08:10:46 UTC
This should help:

commit 8d03573de74ebd38d1047131a698a2068605efed
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 18 13:28:14 2017 +0100

    lib: Flush the driver's internal cache of objects before counting
    
    As the driver itself keeps a cache of objects, these too need to be
    flushed prior to producing a stable count of objects.
    

there is still an open for dropping framebuffers and their ilk, but unlikely to be affecting this test. Treating as fixed, we will know if it reoccurs again.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.