Bug 104097 - [CI][SHARDS] igt@pm_rpm@i2c - fail - Test assertion failure function test_i2c - Failed assertion: diff <= vga_outputs && diff >= 0
Summary: [CI][SHARDS] igt@pm_rpm@i2c - fail - Test assertion failure function test_i2c...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-05 08:00 UTC by Marta Löfstedt
Modified: 2019-06-11 05:40 UTC (History)
3 users (show)

See Also:
i915 platform: BDW, BSW/CHT, BYT, CFL, CNL, HSW, KBL, SKL
i915 features: power/runtime PM


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-12-05 08:00:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3449/shard-kbl1/igt@pm_rpm@i2c.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3450/shard-kbl7/igt@pm_rpm@i2c.html

(pm_rpm:1747) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:1747) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1747) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 1 Octavio 2018-03-14 17:18:25 UTC
The test also fails on CFL QA 

 igt@pm_rpm@i2c

======================================
        Graphic stack
======================================

======================================
             Software
======================================
kernel version              : 4.16.0-rc5-drm-intel-qa-ww11-commit-307515c+
hostname                    : CFL-2
architecture                : x86_64
os version                  : Ubuntu 17.10
os codename                 : artful
kernel driver               : i915
bios revision               : 118.9
bios release date           : 01/12/2018
ksc                         : 1.5
hardware acceleration       : disabled
swap partition              : enabled on (/dev/sda2)

======================================
        Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm                      : 2.4.91
intel-gpu-tools (tag)       : intel-gpu-tools-1.21-211-g1bb3995e
intel-gpu-tools (commit)    : 1bb3995e

======================================
             Hardware
======================================
motherboard model          : CoffeeLakeClientPlatform
motherboard id             : CoffeeLakeSUDIMMRVP
form factor                : Desktop
manufacturer               : IntelCorporation
cpu family                 : Other
cpu family id              : 6
cpu information            : Genuine Intel(R) CPU 0000 @ 3.60GHz
gpu card                   : Intel Corporation Device 3e92 (prog-if 00 [VGA controller])
memory ram                 : 15.57 GB
max memory ram             : 32 GB
cpu thread                 : 12
cpu core                   : 6
cpu model                  : 158
cpu stepping               : 10
socket                     : Other
hard drive                 : 447GiB (480GB)
current cd clock frequency : 337500 kHz
maximum cd clock frequency : 675000 kHz
displays connected         : eDP-1 DP-1 DP-2

======================================
             Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.4
guc fw loaded             : fetch SUCCESS, load SUCCESS
guc version wanted        : wanted 9.39, found 9.39
guc version found         : wanted 9.39, found 9.39

======================================
             kernel parameters
======================================
quiet drm.debug=0x1e auto panic=1 nmi_watchdog=panic intel_iommu=igfx_off fsck.repair=yes i915.enable_guc=-1 i915.alpha_support=1 resume=/dev/sda2 fastboot


====================================== 
             output
======================================

(pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
**** DEBUG ****
(pm_rpm:11326) DEBUG: Test requirement passed: modprobe("i2c-dev") == 0
(pm_rpm:11326) DEBUG: Test requirement passed: i2c_dev_files
(pm_rpm:11326) DEBUG: Test requirement passed: enable_one_screen_with_type(data, SCREEN_TYPE_ANY)
(pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 0ms
(pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 95ms
(pm_rpm:11326) DEBUG: i2c edids:2 drm edids:3 vga outputs:0
(pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address
(pm_rpm:11326) igt-core-INFO: Stack trace:
(pm_rpm:11326) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(pm_rpm:11326) igt-core-INFO:   #1 [main+0x1376]
(pm_rpm:11326) igt-core-INFO:   #2 [__libc_start_main+0xf1]
(pm_rpm:11326) igt-core-INFO:   #3 [_start+0x29]
(pm_rpm:11326) igt-core-INFO:   #4 [<unknown>+0x29]
****  END  ****
Comment 2 Marta Löfstedt 2018-03-16 08:59:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-kbl-7500u/igt@pm_rpm@i2c.html

(pm_rpm:1319) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1319) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1319) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 3 Marta Löfstedt 2018-03-19 09:07:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-cfl-s2/igt@pm_rpm@i2c.html

(pm_rpm:1429) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1429) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1429) CRITICAL: Last errno: 121, Remote I/O error
Subtest i2c failed.
Comment 4 Marta Löfstedt 2018-03-19 15:21:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-bdw-5557u/igt@pm_rpm@i2c.html

(pm_rpm:1384) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1384) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1384) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 5 Marta Löfstedt 2018-03-27 09:45:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-cfl-s3/igt@pm_rpm@i2c.html

(pm_rpm:2044) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2044) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2044) CRITICAL: Last errno: 110, Connection timed out
Subtest i2c failed.
Comment 6 Marta Löfstedt 2018-04-03 12:53:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-cnl-y3/igt@pm_rpm@i2c.html

(pm_rpm:2578) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2578) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2578) CRITICAL: Last errno: 121, Remote I/O error
Comment 8 Martin Peres 2018-05-03 16:22:04 UTC
Also seen on HSW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-hsw-4770r/igt@pm_rpm@i2c.html

(pm_rpm:2016) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2016) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2016) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 9 Martin Peres 2018-11-16 15:54:08 UTC
Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5150/shard-iclb4/igt@pm_rpm@i2c.html

Starting subtest: i2c
(pm_rpm:2372) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:681:
(pm_rpm:2372) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2372) CRITICAL: Last errno: 121, Remote I/O error
Subtest i2c failed.
Comment 10 CI Bug Log 2019-02-08 14:31:30 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 -}
{+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_211/fi-byt-clapper/igt@pm_rpm@i2c.html
Comment 11 CI Bug Log 2019-03-13 10:32:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 -}
{+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_242/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html
Comment 12 Imre Deak 2019-04-05 17:11:37 UTC
On fi-skl-lmem and fi-hsw-4770r the EDID is consistently read corrupted in the same way from the monitor. So I think the EDID memory is corrupted, we could change the monitor or try to fix and reflash the EDID.
Comment 13 Jani Saarinen 2019-04-22 14:55:29 UTC
Seen also on ICL quite continuously, Imre, please check
Comment 14 James Ausmus 2019-04-24 23:47:30 UTC
Not a customer impacting bug on ICL - DRM is successfully getting the proper EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM has the necessary EDID info, so no user impact on the functionality of i915.
Comment 15 Arek Hiler 2019-04-30 05:10:02 UTC
Indeed, on ICL we have seen only the (i2c edids = drm edids - 1) failure mode, e.g.:
    (i915_pm_rpm:12284) DEBUG: i2c edids:1 drm edids:2 vga outputs:0

On other machines we see:
   i915_pm_rpm:1186) DEBUG: i2c edids:1 drm edids:0 vga outputs:0

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_270/fi-hsw-4770r/igt@i915_pm_rpm@i2c.html

Current implementation is rather naive, and goes through /dev/i2c-* trying to read out EDID:
         while ((dirent = readdir(dir))) {
                 if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
                         sprintf(full_name, "/dev/%s", dirent->d_name);
                         fd = open(full_name, O_RDWR);
                         igt_assert_neq(fd, -1);
                         if (i2c_edid_is_valid(fd))
                                 ret++;
                         close(fd);
                 }
         }

The drmModeRes part is as sophisticated:
         while ((dirent = readdir(dir))) {
                 if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
                         sprintf(full_name, "/dev/%s", dirent->d_name);
                         fd = open(full_name, O_RDWR);
                         igt_assert_neq(fd, -1);
                         if (i2c_edid_is_valid(fd))
                                 ret++;
                         close(fd);
                 }
         }


I think we should change implementation a bit and add extra logging around this, i.e. use /sys/class/drm/card?-*/i2c-* to do readouts, and compare EDIDs on a connector basis, printing out both in case one is corrupted. This may tell us something about particular screen or connectors that are being troublesome.
Comment 16 Oleg Vasilev 2019-05-07 11:04:44 UTC
I've submitted a patch with more verbose output, as Arek suggested

https://patchwork.freedesktop.org/series/60357/
Comment 17 Arek Hiler 2019-06-11 05:40:56 UTC
(In reply to James Ausmus from comment #14)
> Not a customer impacting bug on ICL - DRM is successfully getting the proper
> EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM
> has the necessary EDID info, so no user impact on the functionality of i915.

Indeed on ICL we only see:
DEBUG: i2c edids:1 drm edids:2 vga outputs:0  

Also the bug has not been seen on ICL in 3 weeks (since CI_DRM_6085). Prior to that reproduction rate was ~ 1 every 5 CI_DRMs. We are now at CI_DRM_6225, which is more than 5x10=50 runs since last occurence, taking ICL tag out.


On everything else we have some occurrences happening the other way aroudn:
(i915_pm_rpm:1086) DEBUG: i2c edids:1 drm edids:0 vga outputs:0

The patches made by Oleg will allow us to get more details on this by tying the i2c devices to connectors on the kernel side, and then the test logging more information about the failure - which connector, what part has failed (readout? do we have a mismatch?) and dump the raw values.

Keeping the bug high, as not having you monitor EDID read out correctly by DRM is a serious problem for users.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.