Bug 109225

Summary: [CI][DRMTIP] igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Last errno: 28, No space left on device
Product: DRI Reporter: Martin Peres <martin.peres>
Component: IGTAssignee: Stanislav Lisovskiy <stanislav.lisovskiy>
Status: CLOSED FIXED QA Contact:
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: HSW, ICL, KBL, SKL i915 features: display/Other

Description Martin Peres 2019-01-04 14:11:59 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_168/fi-skl-6700k2/igt@kms_atomic_transition@plane-all-modeset-transition.html

Starting subtest: plane-all-modeset-transition
(kms_atomic_transition:1188) igt_kms-CRITICAL: Test assertion failure function do_display_commit, file ../lib/igt_kms.c:3320:
(kms_atomic_transition:1188) igt_kms-CRITICAL: Failed assertion: ret == 0
(kms_atomic_transition:1188) igt_kms-CRITICAL: Last errno: 28, No space left on device
(kms_atomic_transition:1188) igt_kms-CRITICAL: error: -28 != 0
Subtest plane-all-modeset-transition failed.
Comment 1 CI Bug Log 2019-01-04 14:12:59 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SKL: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Last errno: 28, No space left on device
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_11112/shard-kbl7/igt@kms_atomic_transition@plane-all-transition-nonblocking.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_174/fi-skl-6700k2/igt@kms_atomic_transition@plane-all-transition-nonblocking.html
Comment 2 CI Bug Log 2019-01-29 07:19:04 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Last errno: 28, No space left on device -}
{+ SKL: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Last errno: 28, No space left on device +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_177/fi-skl-6700k2/igt@kms_atomic_transition@plane-use-after-nonblocking-unbind-fencing.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_191/fi-skl-6700k2/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_194/fi-skl-6700k2/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html
Comment 3 Stanislav Lisovskiy 2019-01-30 14:19:48 UTC
After some struggles with kms_atomic_transition I've got a feeling I know what it can be related to.
Comment 4 Stanislav Lisovskiy 2019-02-07 12:57:19 UTC
All the recent issues seem to be either skips or dmesg-warn due to FIFO underrun:

http://gfx-ci.fi.intel.com/cibuglog-ng/issue/1015/history

For example:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_209/fi-icl-u3/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html

Looks like it is a wrong filter or something as for example in most of those, there were no -ENOSPC error:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_209/fi-icl-u2/igt@kms_atomic_transition@plane-all-transition-nonblocking.html
Comment 5 Stanislav Lisovskiy 2019-02-12 11:07:09 UTC
The actual reason for the "No space left on device" issue is in the IGT itself.

I've noticed there are "EDID invalid" messaged in attached dmesg, always when issue happens:

915 0000:00:02.0: DP-2: EDID is invalid:
<4>[   46.742040] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742041] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742043] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742044] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742045] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742046] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742047] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<4>[   46.742049] 	[00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<7>[   46.742399] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:107:DP-2] probed modes :
<7>[   46.742403] [drm:drm_mode_debug_printmodeline] Modeline 165:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa

This causes drm to fallback to fixed mode 1024x768, is it could not read displat EDID(see intel_dp.c and drm_edid.c). This resolution is lower than typical resolution used in the test case. Then there is a problem that run_transition_test case doesn't clean up plane size property which it sets, during the test run. In next test case we create a framebuffer which is of size of current output mode->hdisplay, mode->vdisplay. Usually it is of same size, however due to this EDID it happens to be smaller(1024x768) which causes that everytime proper cleanup wasn't done after previous run_transition_test call, we are then attempting to set plane size bigger than the framebuffer size, which causes -ENOSPC to be returned from drm_atomic_plane_check. 

To fix that we need to cleanup all the plane properties associated to this output before we proceed with the next test case, otherwise IGT seems to commit those in the first commit when we associate output with pipe. I've simulated this by either making drm intentionally return wrong EDID, so that fixed mode is used or by simply decreasing mode->hdisplay/vdisplay second time the function is called => this always results in ENOSPC. Simple cure is just add igt_plane_set_size in the cleanup, so that plane size is reduced to 0 or disabled.

So this issue is actually an IGT issue. Another problem is "Invalid EDID" being read from display, however drm seems to act here as expected.
Comment 6 CI Bug Log 2019-02-20 12:11:01 UTC
A CI Bug Log filter associated to this bug has been updated:

{- SKL: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Last errno: 28, No space left on device -}
{+ All machines: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Failed assertion: ret == 0, Last errno: 28, No space left on device +}

 No new failures caught with the new filter
Comment 7 CI Bug Log 2019-02-20 12:11:24 UTC
A CI Bug Log filter associated to this bug has been updated:

{- All machines: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Failed assertion: ret == 0, Last errno: 28, No space left on device -}
{+ All machines: igt@kms_atomic_transition@plane-all-modeset-transition* - fail - Failed assertion: ret == 0, Last errno: 28, No space left on device +}

 No new failures caught with the new filter
Comment 8 Martin Peres 2019-02-20 12:13:35 UTC
(In reply to Stanislav Lisovskiy from comment #4)
> All the recent issues seem to be either skips or dmesg-warn due to FIFO
> underrun:
> 
> http://gfx-ci.fi.intel.com/cibuglog-ng/issue/1015/history
> 
> For example:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_209/fi-icl-u3/
> igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html
> 
> Looks like it is a wrong filter or something as for example in most of
> those, there were no -ENOSPC error:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_209/fi-icl-u2/
> igt@kms_atomic_transition@plane-all-transition-nonblocking.html

Sorry, the filter was definitely not filed properly... It was matching every failure happening on any of the igt@kms_atomic_transition@plane-all-modeset-transition* tests...

Sorry about that!
Comment 9 Petri Latvala 2019-03-06 12:57:45 UTC
commit 91908d36d0d5c90eea86e29736d2748d5ec55335
Author: Stanislav Lisovskiy <stanislav.lisovskiy@gmail.com>
Date:   Tue Feb 19 11:38:00 2019 +0200

    igt/tests: Fix error checking in kms_atomic_transition
Comment 10 Martin Peres 2019-03-06 18:54:22 UTC
(In reply to Petri Latvala from comment #9)
> commit 91908d36d0d5c90eea86e29736d2748d5ec55335
> Author: Stanislav Lisovskiy <stanislav.lisovskiy@gmail.com>
> Date:   Tue Feb 19 11:38:00 2019 +0200
> 
>     igt/tests: Fix error checking in kms_atomic_transition

Still happening pretty much every single drmtip run:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-skl-6700k2/igt@kms_atomic_transition@plane-all-transition-nonblocking.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-hsw-4770r/igt@kms_atomic_transition@plane-all-transition-nonblocking.html
Comment 11 Stanislav Lisovskiy 2019-03-07 07:50:59 UTC
This happens only on machines which have this "i915 0000:00:02.0: HDMI-A-1: EDID is invalid:" message, which makes it fallback to 1024x768 mode(which means fb smaller than usual). Probably there is still a bug in the tests, with the same root cause(-ENOSPC is returned only in case when plane size happens to be more than fb).
Comment 12 Stanislav Lisovskiy 2019-03-13 11:15:19 UTC
(In reply to Stanislav Lisovskiy from comment #11)
> This happens only on machines which have this "i915 0000:00:02.0: HDMI-A-1:
> EDID is invalid:" message, which makes it fallback to 1024x768 mode(which
> means fb smaller than usual). Probably there is still a bug in the tests,
> with the same root cause(-ENOSPC is returned only in case when plane size
> happens to be more than fb).

(In reply to Martin Peres from comment #10)
> (In reply to Petri Latvala from comment #9)
> > commit 91908d36d0d5c90eea86e29736d2748d5ec55335
> > Author: Stanislav Lisovskiy <stanislav.lisovskiy@gmail.com>
> > Date:   Tue Feb 19 11:38:00 2019 +0200
> > 
> >     igt/tests: Fix error checking in kms_atomic_transition
> 
> Still happening pretty much every single drmtip run:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-skl-6700k2/
> igt@kms_atomic_transition@plane-all-transition-nonblocking.html
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-hsw-4770r/
> igt@kms_atomic_transition@plane-all-transition-nonblocking.html

Looks like this was without that change still, I checked the latest runs are not failing on this machine and the line number where commit fails, doesn't seem to (In reply to Martin Peres from comment #10)
> (In reply to Petri Latvala from comment #9)
> > commit 91908d36d0d5c90eea86e29736d2748d5ec55335
> > Author: Stanislav Lisovskiy <stanislav.lisovskiy@gmail.com>
> > Date:   Tue Feb 19 11:38:00 2019 +0200
> > 
> >     igt/tests: Fix error checking in kms_atomic_transition
> 
> Still happening pretty much every single drmtip run:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-skl-6700k2/
> igt@kms_atomic_transition@plane-all-transition-nonblocking.html
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_236/fi-hsw-4770r/
> igt@kms_atomic_transition@plane-all-transition-nonblocking.html


I think those are not happening anymore. Also in those crashes, it looks like my IGT patch was not applied, because the line number 501(igt_display_commit2 which fails with ENOSPC) in the stack trace, corresponds to older code version, i.e it can't be 501 with the latest change(line 501 with last commit corresponds to completely different code).
Comment 13 Stanislav Lisovskiy 2019-03-18 07:17:15 UTC
Ping. See above message once again - proposing to close this bug.
Comment 14 Lakshmi 2019-03-18 08:56:22 UTC
This issue used to appear in every drmtip run. Last seen drmtip_236 (1 week, 6 days / 187 runs ago). 
Lets wait for 2 more runs of drmtip and close if no failures are seen.
Comment 15 Stanislav Lisovskiy 2019-03-18 09:03:07 UTC
I think it stopped appearing right after my patch, in the link which Martin posted stack trace is still with old igt code(with my changes, there is no igt_display_commit at line 501). So I'm afraid, there was no reason to reopen it :D
Comment 16 Martin Peres 2019-03-18 13:55:13 UTC
(In reply to Stanislav Lisovskiy from comment #15)
> I think it stopped appearing right after my patch, in the link which Martin
> posted stack trace is still with old igt code(with my changes, there is no
> igt_display_commit at line 501). So I'm afraid, there was no reason to
> reopen it :D

Here is the patch from Stan:

commit 91908d36d0d5c90eea86e29736d2748d5ec55335
Author:     Stanislav Lisovskiy <stanislav.lisovskiy@gmail.com>
AuthorDate: Tue Feb 19 11:38:00 2019 +0200
Commit:     Petri Latvala <petri.latvala@intel.com>
CommitDate: Wed Mar 6 14:53:51 2019 +0200

    igt/tests: Fix error checking in kms_atomic_transition
    
    There is no guarantee that error return value will be
    always EINVAL, made a check more general as it can be
    ERANGE, ENOSPC, EINVAL and probably others, which all
    mean the same in context of this test case: i.e this sprite
    size is not valid.
    
    v2: Added macro to make check look a bit nicer.
    v3: Removed redundant debug line.
    v4: Added assertion if error is not EINVAL as expected,
        other errors except EINVAL are considered now a failures.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109225
    Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
    Reviewed-by: Stuart Summers <stuart.summers@intel.com>


And the last failure was:

commit 27027cf078e5e8c4ced3b7d941890659e4adf1cd
Author:     Nischala Yelchuri <nischala.yelchuri@intel.com>
AuthorDate: Fri Mar 1 11:49:00 2019 -0800
Commit:     Chris Wilson <chris@chris-wilson.co.uk>
CommitDate: Sat Mar 2 20:25:43 2019 +0000

    tests/kms_cursor_legacy: Add missing munmap
    
    Added munmap and replaced hard-coded values with PAGE_SIZE macro.
    
    Cc: Easwar Hariharan <easwar.hariharan@intel.com>
    Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
    Signed-off-by: Nischala Yelchuri <nischala.yelchuri@intel.com>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Given that we had a 100% reproduction rate on drmtip and that the last failure was seen 10 runs ago (drmtip_236 and we are now at drmtip_246), I think it is safe to close it again!

Sorry for the noise, Stan!
Comment 17 CI Bug Log 2019-03-18 13:55:34 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.