Bug 104097 - [CI][SHARDS] igt@pm_rpm@i2c - fail - Failed assertion: diff <= vga_outputs && diff >= 0 / !(edid_mistmach_i2c_vs_drm)
Summary: [CI][SHARDS] igt@pm_rpm@i2c - fail - Failed assertion: diff <= vga_outputs &&...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high normal
Assignee: Imre Deak
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-05 08:00 UTC by Marta Löfstedt
Modified: 2019-07-19 12:44 UTC (History)
3 users (show)

See Also:
i915 platform: BDW, BSW/CHT, BYT, CFL, CNL, HSW, KBL, SKL
i915 features: power/runtime PM


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-12-05 08:00:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3449/shard-kbl1/igt@pm_rpm@i2c.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3450/shard-kbl7/igt@pm_rpm@i2c.html

(pm_rpm:1747) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:1747) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1747) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 1 Octavio 2018-03-14 17:18:25 UTC
The test also fails on CFL QA 

 igt@pm_rpm@i2c

======================================
        Graphic stack
======================================

======================================
             Software
======================================
kernel version              : 4.16.0-rc5-drm-intel-qa-ww11-commit-307515c+
hostname                    : CFL-2
architecture                : x86_64
os version                  : Ubuntu 17.10
os codename                 : artful
kernel driver               : i915
bios revision               : 118.9
bios release date           : 01/12/2018
ksc                         : 1.5
hardware acceleration       : disabled
swap partition              : enabled on (/dev/sda2)

======================================
        Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm                      : 2.4.91
intel-gpu-tools (tag)       : intel-gpu-tools-1.21-211-g1bb3995e
intel-gpu-tools (commit)    : 1bb3995e

======================================
             Hardware
======================================
motherboard model          : CoffeeLakeClientPlatform
motherboard id             : CoffeeLakeSUDIMMRVP
form factor                : Desktop
manufacturer               : IntelCorporation
cpu family                 : Other
cpu family id              : 6
cpu information            : Genuine Intel(R) CPU 0000 @ 3.60GHz
gpu card                   : Intel Corporation Device 3e92 (prog-if 00 [VGA controller])
memory ram                 : 15.57 GB
max memory ram             : 32 GB
cpu thread                 : 12
cpu core                   : 6
cpu model                  : 158
cpu stepping               : 10
socket                     : Other
hard drive                 : 447GiB (480GB)
current cd clock frequency : 337500 kHz
maximum cd clock frequency : 675000 kHz
displays connected         : eDP-1 DP-1 DP-2

======================================
             Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.4
guc fw loaded             : fetch SUCCESS, load SUCCESS
guc version wanted        : wanted 9.39, found 9.39
guc version found         : wanted 9.39, found 9.39

======================================
             kernel parameters
======================================
quiet drm.debug=0x1e auto panic=1 nmi_watchdog=panic intel_iommu=igfx_off fsck.repair=yes i915.enable_guc=-1 i915.alpha_support=1 resume=/dev/sda2 fastboot


====================================== 
             output
======================================

(pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
**** DEBUG ****
(pm_rpm:11326) DEBUG: Test requirement passed: modprobe("i2c-dev") == 0
(pm_rpm:11326) DEBUG: Test requirement passed: i2c_dev_files
(pm_rpm:11326) DEBUG: Test requirement passed: enable_one_screen_with_type(data, SCREEN_TYPE_ANY)
(pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 0ms
(pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 95ms
(pm_rpm:11326) DEBUG: i2c edids:2 drm edids:3 vga outputs:0
(pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650:
(pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address
(pm_rpm:11326) igt-core-INFO: Stack trace:
(pm_rpm:11326) igt-core-INFO:   #0 [__igt_fail_assert+0x101]
(pm_rpm:11326) igt-core-INFO:   #1 [main+0x1376]
(pm_rpm:11326) igt-core-INFO:   #2 [__libc_start_main+0xf1]
(pm_rpm:11326) igt-core-INFO:   #3 [_start+0x29]
(pm_rpm:11326) igt-core-INFO:   #4 [<unknown>+0x29]
****  END  ****
Comment 2 Marta Löfstedt 2018-03-16 08:59:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-kbl-7500u/igt@pm_rpm@i2c.html

(pm_rpm:1319) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1319) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1319) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 3 Marta Löfstedt 2018-03-19 09:07:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-cfl-s2/igt@pm_rpm@i2c.html

(pm_rpm:1429) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1429) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1429) CRITICAL: Last errno: 121, Remote I/O error
Subtest i2c failed.
Comment 4 Marta Löfstedt 2018-03-19 15:21:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-bdw-5557u/igt@pm_rpm@i2c.html

(pm_rpm:1384) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:1384) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:1384) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 5 Marta Löfstedt 2018-03-27 09:45:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-cfl-s3/igt@pm_rpm@i2c.html

(pm_rpm:2044) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2044) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2044) CRITICAL: Last errno: 110, Connection timed out
Subtest i2c failed.
Comment 6 Marta Löfstedt 2018-04-03 12:53:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-cnl-y3/igt@pm_rpm@i2c.html

(pm_rpm:2578) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2578) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2578) CRITICAL: Last errno: 121, Remote I/O error
Comment 8 Martin Peres 2018-05-03 16:22:04 UTC
Also seen on HSW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-hsw-4770r/igt@pm_rpm@i2c.html

(pm_rpm:2016) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650:
(pm_rpm:2016) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2016) CRITICAL: Last errno: 6, No such device or address
Subtest i2c failed.
Comment 9 Martin Peres 2018-11-16 15:54:08 UTC
Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5150/shard-iclb4/igt@pm_rpm@i2c.html

Starting subtest: i2c
(pm_rpm:2372) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:681:
(pm_rpm:2372) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0
(pm_rpm:2372) CRITICAL: Last errno: 121, Remote I/O error
Subtest i2c failed.
Comment 10 CI Bug Log 2019-02-08 14:31:30 UTC
A CI Bug Log filter associated to this bug has been updated:

{- HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 -}
{+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_211/fi-byt-clapper/igt@pm_rpm@i2c.html
Comment 11 CI Bug Log 2019-03-13 10:32:22 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 -}
{+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 +}

New failures caught by the filter:

* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html
* https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_242/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html
Comment 12 Imre Deak 2019-04-05 17:11:37 UTC
On fi-skl-lmem and fi-hsw-4770r the EDID is consistently read corrupted in the same way from the monitor. So I think the EDID memory is corrupted, we could change the monitor or try to fix and reflash the EDID.
Comment 13 Jani Saarinen 2019-04-22 14:55:29 UTC
Seen also on ICL quite continuously, Imre, please check
Comment 14 James Ausmus 2019-04-24 23:47:30 UTC
Not a customer impacting bug on ICL - DRM is successfully getting the proper EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM has the necessary EDID info, so no user impact on the functionality of i915.
Comment 15 Arek Hiler 2019-04-30 05:10:02 UTC
Indeed, on ICL we have seen only the (i2c edids = drm edids - 1) failure mode, e.g.:
    (i915_pm_rpm:12284) DEBUG: i2c edids:1 drm edids:2 vga outputs:0

On other machines we see:
   i915_pm_rpm:1186) DEBUG: i2c edids:1 drm edids:0 vga outputs:0

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_270/fi-hsw-4770r/igt@i915_pm_rpm@i2c.html

Current implementation is rather naive, and goes through /dev/i2c-* trying to read out EDID:
         while ((dirent = readdir(dir))) {
                 if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
                         sprintf(full_name, "/dev/%s", dirent->d_name);
                         fd = open(full_name, O_RDWR);
                         igt_assert_neq(fd, -1);
                         if (i2c_edid_is_valid(fd))
                                 ret++;
                         close(fd);
                 }
         }

The drmModeRes part is as sophisticated:
         while ((dirent = readdir(dir))) {
                 if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
                         sprintf(full_name, "/dev/%s", dirent->d_name);
                         fd = open(full_name, O_RDWR);
                         igt_assert_neq(fd, -1);
                         if (i2c_edid_is_valid(fd))
                                 ret++;
                         close(fd);
                 }
         }


I think we should change implementation a bit and add extra logging around this, i.e. use /sys/class/drm/card?-*/i2c-* to do readouts, and compare EDIDs on a connector basis, printing out both in case one is corrupted. This may tell us something about particular screen or connectors that are being troublesome.
Comment 16 Oleg Vasilev 2019-05-07 11:04:44 UTC
I've submitted a patch with more verbose output, as Arek suggested

https://patchwork.freedesktop.org/series/60357/
Comment 17 Arek Hiler 2019-06-11 05:40:56 UTC
(In reply to James Ausmus from comment #14)
> Not a customer impacting bug on ICL - DRM is successfully getting the proper
> EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM
> has the necessary EDID info, so no user impact on the functionality of i915.

Indeed on ICL we only see:
DEBUG: i2c edids:1 drm edids:2 vga outputs:0  

Also the bug has not been seen on ICL in 3 weeks (since CI_DRM_6085). Prior to that reproduction rate was ~ 1 every 5 CI_DRMs. We are now at CI_DRM_6225, which is more than 5x10=50 runs since last occurence, taking ICL tag out.


On everything else we have some occurrences happening the other way aroudn:
(i915_pm_rpm:1086) DEBUG: i2c edids:1 drm edids:0 vga outputs:0

The patches made by Oleg will allow us to get more details on this by tying the i2c devices to connectors on the kernel side, and then the test logging more information about the failure - which connector, what part has failed (readout? do we have a mismatch?) and dump the raw values.

Keeping the bug high, as not having you monitor EDID read out correctly by DRM is a serious problem for users.
Comment 18 Arek Hiler 2019-06-18 06:41:05 UTC
Having the history a bit de-cluttered by ICLs being seemingly fixed we can take a look
at the new landscape of failures. There are two machines that are the most consistent
with failing (every [[idle run]]).


[[fi-hsw-4770r]]:
connector 76: type DP-1, status: connected
        physical dimensions: 0x0mm
        subpixel order: Unknown
        CEA rev: 0
        DPCD rev: 11
        audio support: no
        DP branch device present: yes
                Type: VGA
                ID: DpVga
                HW: 5.6
                SW: 1.48
        modes:
                "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
                "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
                "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5
                "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5
                "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa

This one seems to have VGA dongle connected but it does not have native VGA.
Because of that the test fails to realize its VGA and does not ignore it.
The dongle is the most naive one possible, just a few pins connected through
resistors, so the EDID that DRM sees is faked by the DpVga HW to reflect,
the default VGA modes. There is nothing on the other side when using i2c
directly. The test has to be fixed so that it is aware of DpVga.


[[fi-skl-lmem]]:
connector 86: type DP-1, status: connected
        physical dimensions: 0x0mm
        subpixel order: Unknown
        CEA rev: 0
        DPCD rev: 12
        audio support: no
        DP branch device present: yes
                Type: HDMI
                ID: 175IB0
                HW: 1.0
                SW: 7.32
                Max TMDS clock: 600000 kHz
                Max bpc: 12
        modes:
                "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
                "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
                "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5
                "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5
                "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa

This one has a HDMI dummy connected to non-native (LSPcon) HDMI on the board.
The suspicious thing here are the modes - they are the default VGA ones.
The HDMI dummy may be faulty one and fails to talk i2c and the LSPcon HW is
faking an EDID. I would advice replacing the dongle.


[idle run]:     https://intel-gfx-ci.01.org/#idle-runs
[fi-hsw-4770r]: https://intel-gfx-ci.01.org/hardware.html#fi-hsw-4770r
[fi-skl-lmem]:  https://intel-gfx-ci.01.org/hardware.html#fi-skl-lmem
Comment 19 CI Bug Log 2019-06-19 08:45:21 UTC
A CI Bug Log filter associated to this bug has been updated:

{- BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 -}
{+ all machines : igt@i915_pm_rpm@i2c - Failed assertion: diff &lt;= vga_outputs &amp;&amp; diff &gt;= 0 / !(edid_mistmach_i2c_vs_drm) +}


  No new failures caught with the new filter
Comment 20 Arek Hiler 2019-06-25 08:30:08 UTC
1. the patch has landed, more data on affected connectors incoming :-)
2. updated the bug title to reflect the new assertion
3. fi-skl-lmem HDMI dummy replaced, hopefully it will get better
4. Oleg was looking into getting information on DP downstream port exposed to the userspace, so we can ignore DpVga akin to how we ignore native VGA

No progress on fixing, but a lot of progress on getting more data. Thanks Oleg!
Comment 21 Arek Hiler 2019-07-01 07:15:49 UTC
skl-lmem haven't seen the issue again since changing the HDMI dummy.

Now, the biggest source of those failures are the HSW shards.

We can observe 3 failure modes:

1. corrupted EDID
(i915_pm_rpm:15800) CRITICAL: Detected EDID mismatch on connector HDMI-A-1
(i915_pm_rpm:15800) CRITICAL: i2c: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x05 0xe3 0xcd 0x0c 0x07 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 
(i915_pm_rpm:15800) CRITICAL: drm: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x24 0xf4 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d 0x01 0x03 0x80 0x34 0x1e 0x78 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x21 0x08 0x00 0xd1 0xc0 0x81 0xc0 0x61 0x40 0x45 0x40 0x31 0x40 0x01 0x01 0x01 0x01 0x01 0x01 0x02 0x3a 0x80 0x18 0x71 0x38 0x2d 0x40 0x58 0x2c 0x45 0x00 0x08 0x2c 0x21 0x00 0x00 0x18 0x00 0x00 0x00 0xfd 0x00 0x3b 0x3d 0x42 0x44 0x0f 0x00 0x0a 0x20 0x20 0x20 0x20 0x20 0x20 0x00 0x00 0x00 0xfc 0x00 0x49 0x47 0x54 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x0a 


2. failed to read EDID over i2c
(i915_pm_rpm:11717) DEBUG: I2C access failed with errno 6, No such device or address
(i915_pm_rpm:11717) CRITICAL: Detected EDID mismatch on connector HDMI-A-2
(i915_pm_rpm:11717) CRITICAL: i2c: NULL
(i915_pm_rpm:11717) CRITICAL: drm: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x24 0xf4 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d 0x01 0x03 0x80 0x34 0x1e 0x78 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x21 0x08 0x00 0xd1 0xc0 0x81 0xc0 0x61 0x40 0x45 0x40 0x31 0x40 0x01 0x01 0x01 0x01 0x01 0x01 0x02 0x3a 0x80 0x18 0x71 0x38 0x2d 0x40 0x58 0x2c 0x45 0x00 0x08 0x2c 0x21 0x00 0x00 0x18 0x00 0x00 0x00 0xfd 0x00 0x3b 0x3d 0x42 0x44 0x0f 0x00 0x0a 0x20 0x20 0x20 0x20 0x20 0x20 0x00 0x00 0x00 0xfc 0x00 0x49 0x47 0x54 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x0a 


3. DRM failed to get EDID:
(i915_pm_rpm:14052) CRITICAL: Detected EDID mismatch on connector HDMI-A-1
(i915_pm_rpm:14052) CRITICAL: i2c: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x05 0xe3 0xcd 0x0c 0x07 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 
(i915_pm_rpm:14052) CRITICAL: drm: NULL
<4> [3217.152221] i915 0000:00:02.0: HDMI-A-1: EDID is invalid:
<4> [3217.152223] 	[00] BAD  00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff
<4> [3217.152225] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152226] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152227] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152228] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152229] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152230] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
<4> [3217.152231] 	[00] BAD  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff


On shard-hsw1 we often see EDID being full of 0xff both for DRM and manual i2c read (cases 1 and 3), for other shards we quite often we fail to read i2c with no such device or address (case 2). I would advise switching dongles between machine hsw1 and one other hsw to see whether the other machine becomes plagued with 0xffs and HSW get the sporadic no such device or address. This would help us attribute the issue either to the machine or to the faulty dongle.
Comment 22 Arek Hiler 2019-07-01 11:20:29 UTC
shard-hsw1 has now a new HDMI dongle and is supressed, we'll seen in a few days whether that solves any issues
Comment 23 Arek Hiler 2019-07-19 12:44:36 UTC
shard-hsw1 seems to be better now, and behaves just like the other HSW.

The biggest contributor to the failures is VGA over DP. Oleg has proposed a series of patches that expose DP's downstream port type to the userspace: https://patchwork.freedesktop.org/series/63027/

With this in place we will be able to treat DP->VGA as VGA (i.e. ignore them). Then we can focus on all the other failures. I have a funny feeling that HSWs are going to be quite problematic.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.