Bug 103182

Summary: [CI][BAT] igt@kms_busy@basic-flip-a / igt@prime_vgem@basic-fence-flip - fail - Failed assertion: nanosleep(&tv, NULL) == -1
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Juha-Pekka Heikkilä <juhapekka.heikkila>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features: display/Other

Description Marta Löfstedt 2017-10-10 07:58:59 UTC
CI_DRM_3200 APL-shards igt@prime_vgem@basic-fence-flip

Fail:
(prime_vgem:1818) CRITICAL: Test assertion failure function flip_to_vgem, file prime_vgem.c:681:
(prime_vgem:1818) CRITICAL: Failed assertion: nanosleep(&tv, NULL) == -1
(prime_vgem:1818) CRITICAL: flip to busy front blocked
Subtest basic-fence-flip failed.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3200/shard-apl1/igt@prime_vgem@basic-fence-flip.html
Comment 1 Marta Löfstedt 2017-11-06 06:47:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3312/fi-bwr-2160/igt@kms_busy@basic-flip-b.html


(kms_busy:2060) CRITICAL: Test assertion failure function flip_to_fb, file kms_busy.c:132:
(kms_busy:2060) CRITICAL: Failed assertion: nanosleep(&tv, NULL) == -1
(kms_busy:2060) CRITICAL: flip to fb[1] blocked waiting for busy fbSubtest basic-flip-B failed.
Comment 2 Marta Löfstedt 2018-02-09 07:21:20 UTC
Last seen CI_DRM_3605: 2018-01-08 / 222 runs ago
Comment 3 Marta Löfstedt 2018-03-14 06:41:12 UTC
Re-opened for CNL:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3921/fi-cnl-drrs/igt@kms_busy@basic-flip-a.html

(kms_busy:3606) CRITICAL: Test assertion failure function flip_to_fb, file ../tests/kms_busy.c:132:
(kms_busy:3606) CRITICAL: Failed assertion: nanosleep(&tv, NULL) == -1
(kms_busy:3606) CRITICAL: flip to fb[1] blocked waiting for busy fbSubtest basic-flip-A failed.
Comment 4 Marta Löfstedt 2018-03-23 12:45:22 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3972/fi-blb-e6850/igt@prime_vgem@basic-fence-flip.html

(prime_vgem:3594) CRITICAL: Test assertion failure function flip_to_vgem, file ../tests/prime_vgem.c:681:
(prime_vgem:3594) CRITICAL: Failed assertion: nanosleep(&tv, NULL) == -1
(prime_vgem:3594) CRITICAL: flip to busy back blocked
Subtest basic-fence-flip failed.
Comment 5 Martin Peres 2018-04-27 14:03:16 UTC
Also seen on BWR: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8824/fi-bwr-2160/igt@prime_vgem@basic-fence-flip.html

I moved the platform to ALL, because both BLB and CNL saw the problem, which pretty much means all Intel platforms are affected.
Comment 6 Martin Peres 2018-11-15 15:26:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5144/fi-gdg-551/igt@prime_vgem@basic-fence-flip.html

Starting subtest: basic-fence-flip
(prime_vgem:1355) CRITICAL: Test assertion failure function flip_to_vgem, file ../tests/prime_vgem.c:722:
(prime_vgem:1355) CRITICAL: Failed assertion: nanosleep(&tv, NULL) == -1
(prime_vgem:1355) CRITICAL: flip to busy back blocked
Subtest basic-fence-flip failed.

<5> [280.491987] Asynchronous wait on fence vgem:unbound:1 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Comment 7 Abdiel Janulgue 2018-11-29 09:44:10 UTC
Test does test use prime buffers, but the assert is caused by drmModePageFlip being blocked or something. I suggest maybe display folks investigate that first why is that the case in the first place?
Comment 8 Francesco Balestrieri 2018-11-29 09:46:31 UTC
Assigning to Jani for further dispatching.
Comment 9 Lakshmi 2019-02-26 08:29:24 UTC
JP, any updates here?
Comment 10 Juha-Pekka Heikkilä 2019-02-26 09:11:57 UTC
(In reply to Lakshmi from comment #9)
> JP, any updates here?

I'm currently looking into this, I just started with it.
Comment 11 Juha-Pekka Heikkilä 2019-02-28 09:13:12 UTC
I tried reproducing on HSW which is one of the listed platforms but it doesn't reproduce for me. GTG seems to be most likely candidate for reproducing this, I need to look around if I find such old box anywhere to try on.
Comment 12 Lakshmi 2019-03-29 13:28:56 UTC
The impact of this bug is that flipping would be slow which can't be noticed on normal usage.

Setting the priority to Medium.
Comment 13 Chris Wilson 2019-04-01 07:42:25 UTC
This trims about 1s off the runtime -- from looking at the gdg results, it was borderline even on success. Restructuring the wait should avoid false positives.


commit 20f8f52498b9c382076cbc85079df44079e5500f (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Mar 30 23:46:36 2019 +0000

    kms_busy: Use igt_waitchildren_timeout()
    
    Replace the convoluted raising of SIGALRM from the child with an
    interruptible sleep in the parent with the equivalent and far more
    natural igt_waitchildren_timeout().
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103182
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 14 Chris Wilson 2019-04-02 09:03:32 UTC
commit f539e21e934019f0196fee646f351b4e30a8c341 (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 1 08:55:20 2019 +0100

    prime_vgem: Replace nanosleep with igt_waitchildren_timeout
    
    We want to use a child in order to detect an uninterruptable sleep (a
    potential bug we might hit), but we can use igt_waitchildren_timeout()
    to replace our risky self-signaling + nanosleep.
    
    v2: Remove the now redundant signal() setup.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103182
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 15 Martin Peres 2019-08-28 11:51:16 UTC
(In reply to Chris Wilson from comment #14)
> commit f539e21e934019f0196fee646f351b4e30a8c341 (HEAD, upstream/master)
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Apr 1 08:55:20 2019 +0100
> 
>     prime_vgem: Replace nanosleep with igt_waitchildren_timeout
>     
>     We want to use a child in order to detect an uninterruptable sleep (a
>     potential bug we might hit), but we can use igt_waitchildren_timeout()
>     to replace our risky self-signaling + nanosleep.
>     
>     v2: Remove the now redundant signal() setup.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103182
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Looks good, thanks!
Comment 16 CI Bug Log 2019-08-28 11:51:54 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.