Summary: | [CI][SHARDS] igt@pm_rpm@i2c - fail - Failed assertion: diff <= vga_outputs && diff >= 0 / !(edid_mistmach_i2c_vs_drm) | ||
---|---|---|---|
Product: | DRI | Reporter: | Marta Löfstedt <marta.lofstedt> |
Component: | DRM/Intel | Assignee: | Imre Deak <imre.deak> |
Status: | RESOLVED MOVED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | arkadiusz.hiler, intel-gfx-bugs, sudeep.dutt |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | BDW, BSW/CHT, BYT, CFL, CNL, HSW, KBL, SKL | i915 features: | power/runtime PM |
Description
Marta Löfstedt
2017-12-05 08:00:28 UTC
The test also fails on CFL QA igt@pm_rpm@i2c ====================================== Graphic stack ====================================== ====================================== Software ====================================== kernel version : 4.16.0-rc5-drm-intel-qa-ww11-commit-307515c+ hostname : CFL-2 architecture : x86_64 os version : Ubuntu 17.10 os codename : artful kernel driver : i915 bios revision : 118.9 bios release date : 01/12/2018 ksc : 1.5 hardware acceleration : disabled swap partition : enabled on (/dev/sda2) ====================================== Graphic drivers ====================================== grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory libdrm : 2.4.91 intel-gpu-tools (tag) : intel-gpu-tools-1.21-211-g1bb3995e intel-gpu-tools (commit) : 1bb3995e ====================================== Hardware ====================================== motherboard model : CoffeeLakeClientPlatform motherboard id : CoffeeLakeSUDIMMRVP form factor : Desktop manufacturer : IntelCorporation cpu family : Other cpu family id : 6 cpu information : Genuine Intel(R) CPU 0000 @ 3.60GHz gpu card : Intel Corporation Device 3e92 (prog-if 00 [VGA controller]) memory ram : 15.57 GB max memory ram : 32 GB cpu thread : 12 cpu core : 6 cpu model : 158 cpu stepping : 10 socket : Other hard drive : 447GiB (480GB) current cd clock frequency : 337500 kHz maximum cd clock frequency : 675000 kHz displays connected : eDP-1 DP-1 DP-2 ====================================== Firmware ====================================== dmc fw loaded : yes dmc version : 1.4 guc fw loaded : fetch SUCCESS, load SUCCESS guc version wanted : wanted 9.39, found 9.39 guc version found : wanted 9.39, found 9.39 ====================================== kernel parameters ====================================== quiet drm.debug=0x1e auto panic=1 nmi_watchdog=panic intel_iommu=igfx_off fsck.repair=yes i915.enable_guc=-1 i915.alpha_support=1 resume=/dev/sda2 fastboot ====================================== output ====================================== (pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650: (pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address Subtest i2c failed. **** DEBUG **** (pm_rpm:11326) DEBUG: Test requirement passed: modprobe("i2c-dev") == 0 (pm_rpm:11326) DEBUG: Test requirement passed: i2c_dev_files (pm_rpm:11326) DEBUG: Test requirement passed: enable_one_screen_with_type(data, SCREEN_TYPE_ANY) (pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 0ms (pm_rpm:11326) igt-pm-DEBUG: igt_get_runtime_pm_status() == status took 95ms (pm_rpm:11326) DEBUG: i2c edids:2 drm edids:3 vga outputs:0 (pm_rpm:11326) CRITICAL: Test assertion failure function test_i2c, file pm_rpm.c:650: (pm_rpm:11326) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:11326) CRITICAL: Last errno: 6, No such device or address (pm_rpm:11326) igt-core-INFO: Stack trace: (pm_rpm:11326) igt-core-INFO: #0 [__igt_fail_assert+0x101] (pm_rpm:11326) igt-core-INFO: #1 [main+0x1376] (pm_rpm:11326) igt-core-INFO: #2 [__libc_start_main+0xf1] (pm_rpm:11326) igt-core-INFO: #3 [_start+0x29] (pm_rpm:11326) igt-core-INFO: #4 [<unknown>+0x29] **** END **** https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_1/fi-kbl-7500u/igt@pm_rpm@i2c.html (pm_rpm:1319) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:1319) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:1319) CRITICAL: Last errno: 6, No such device or address Subtest i2c failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_2/fi-cfl-s2/igt@pm_rpm@i2c.html (pm_rpm:1429) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:1429) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:1429) CRITICAL: Last errno: 121, Remote I/O error Subtest i2c failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_4/fi-bdw-5557u/igt@pm_rpm@i2c.html (pm_rpm:1384) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:1384) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:1384) CRITICAL: Last errno: 6, No such device or address Subtest i2c failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-cfl-s3/igt@pm_rpm@i2c.html (pm_rpm:2044) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:2044) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:2044) CRITICAL: Last errno: 110, Connection timed out Subtest i2c failed. https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-cnl-y3/igt@pm_rpm@i2c.html (pm_rpm:2578) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:2578) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:2578) CRITICAL: Last errno: 121, Remote I/O error mo machines: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-bsw-n3050/igt@pm_rpm@i2c.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-glk-1/igt@pm_rpm@i2c.html https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-skl-guc/igt@pm_rpm@i2c.html Also seen on HSW: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_30/fi-hsw-4770r/igt@pm_rpm@i2c.html (pm_rpm:2016) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:650: (pm_rpm:2016) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:2016) CRITICAL: Last errno: 6, No such device or address Subtest i2c failed. Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5150/shard-iclb4/igt@pm_rpm@i2c.html Starting subtest: i2c (pm_rpm:2372) CRITICAL: Test assertion failure function test_i2c, file ../tests/pm_rpm.c:681: (pm_rpm:2372) CRITICAL: Failed assertion: diff <= vga_outputs && diff >= 0 (pm_rpm:2372) CRITICAL: Last errno: 121, Remote I/O error Subtest i2c failed. A CI Bug Log filter associated to this bug has been updated: {- HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 -} {+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_211/fi-byt-clapper/igt@pm_rpm@i2c.html A CI Bug Log filter associated to this bug has been updated: {- BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 -} {+ BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_241/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_242/fi-hsw-peppy/igt@i915_pm_rpm@i2c.html On fi-skl-lmem and fi-hsw-4770r the EDID is consistently read corrupted in the same way from the monitor. So I think the EDID memory is corrupted, we could change the monitor or try to fix and reflash the EDID. Seen also on ICL quite continuously, Imre, please check Not a customer impacting bug on ICL - DRM is successfully getting the proper EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM has the necessary EDID info, so no user impact on the functionality of i915. Indeed, on ICL we have seen only the (i2c edids = drm edids - 1) failure mode, e.g.: (i915_pm_rpm:12284) DEBUG: i2c edids:1 drm edids:2 vga outputs:0 On other machines we see: i915_pm_rpm:1186) DEBUG: i2c edids:1 drm edids:0 vga outputs:0 https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_270/fi-hsw-4770r/igt@i915_pm_rpm@i2c.html Current implementation is rather naive, and goes through /dev/i2c-* trying to read out EDID: while ((dirent = readdir(dir))) { if (strncmp(dirent->d_name, "i2c-", 4) == 0) { sprintf(full_name, "/dev/%s", dirent->d_name); fd = open(full_name, O_RDWR); igt_assert_neq(fd, -1); if (i2c_edid_is_valid(fd)) ret++; close(fd); } } The drmModeRes part is as sophisticated: while ((dirent = readdir(dir))) { if (strncmp(dirent->d_name, "i2c-", 4) == 0) { sprintf(full_name, "/dev/%s", dirent->d_name); fd = open(full_name, O_RDWR); igt_assert_neq(fd, -1); if (i2c_edid_is_valid(fd)) ret++; close(fd); } } I think we should change implementation a bit and add extra logging around this, i.e. use /sys/class/drm/card?-*/i2c-* to do readouts, and compare EDIDs on a connector basis, printing out both in case one is corrupted. This may tell us something about particular screen or connectors that are being troublesome. I've submitted a patch with more verbose output, as Arek suggested https://patchwork.freedesktop.org/series/60357/ (In reply to James Ausmus from comment #14) > Not a customer impacting bug on ICL - DRM is successfully getting the proper > EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM > has the necessary EDID info, so no user impact on the functionality of i915. Indeed on ICL we only see: DEBUG: i2c edids:1 drm edids:2 vga outputs:0 Also the bug has not been seen on ICL in 3 weeks (since CI_DRM_6085). Prior to that reproduction rate was ~ 1 every 5 CI_DRMs. We are now at CI_DRM_6225, which is more than 5x10=50 runs since last occurence, taking ICL tag out. On everything else we have some occurrences happening the other way aroudn: (i915_pm_rpm:1086) DEBUG: i2c edids:1 drm edids:0 vga outputs:0 The patches made by Oleg will allow us to get more details on this by tying the i2c devices to connectors on the kernel side, and then the test logging more information about the failure - which connector, what part has failed (readout? do we have a mismatch?) and dump the raw values. Keeping the bug high, as not having you monitor EDID read out correctly by DRM is a serious problem for users. Having the history a bit de-cluttered by ICLs being seemingly fixed we can take a look at the new landscape of failures. There are two machines that are the most consistent with failing (every [[idle run]]). [[fi-hsw-4770r]]: connector 76: type DP-1, status: connected physical dimensions: 0x0mm subpixel order: Unknown CEA rev: 0 DPCD rev: 11 audio support: no DP branch device present: yes Type: VGA ID: DpVga HW: 5.6 SW: 1.48 modes: "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5 "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5 "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5 "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa This one seems to have VGA dongle connected but it does not have native VGA. Because of that the test fails to realize its VGA and does not ignore it. The dongle is the most naive one possible, just a few pins connected through resistors, so the EDID that DRM sees is faked by the DpVga HW to reflect, the default VGA modes. There is nothing on the other side when using i2c directly. The test has to be fixed so that it is aware of DpVga. [[fi-skl-lmem]]: connector 86: type DP-1, status: connected physical dimensions: 0x0mm subpixel order: Unknown CEA rev: 0 DPCD rev: 12 audio support: no DP branch device present: yes Type: HDMI ID: 175IB0 HW: 1.0 SW: 7.32 Max TMDS clock: 600000 kHz Max bpc: 12 modes: "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5 "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5 "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5 "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa This one has a HDMI dummy connected to non-native (LSPcon) HDMI on the board. The suspicious thing here are the modes - they are the default VGA ones. The HDMI dummy may be faulty one and fails to talk i2c and the LSPcon HW is faking an EDID. I would advice replacing the dongle. [idle run]: https://intel-gfx-ci.01.org/#idle-runs [fi-hsw-4770r]: https://intel-gfx-ci.01.org/hardware.html#fi-hsw-4770r [fi-skl-lmem]: https://intel-gfx-ci.01.org/hardware.html#fi-skl-lmem A CI Bug Log filter associated to this bug has been updated: {- BYT HSW BSW BDW SKL KBL CFL GLK CNL : pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 -} {+ all machines : igt@i915_pm_rpm@i2c - Failed assertion: diff <= vga_outputs && diff >= 0 / !(edid_mistmach_i2c_vs_drm) +} No new failures caught with the new filter 1. the patch has landed, more data on affected connectors incoming :-) 2. updated the bug title to reflect the new assertion 3. fi-skl-lmem HDMI dummy replaced, hopefully it will get better 4. Oleg was looking into getting information on DP downstream port exposed to the userspace, so we can ignore DpVga akin to how we ignore native VGA No progress on fixing, but a lot of progress on getting more data. Thanks Oleg! skl-lmem haven't seen the issue again since changing the HDMI dummy. Now, the biggest source of those failures are the HSW shards. We can observe 3 failure modes: 1. corrupted EDID (i915_pm_rpm:15800) CRITICAL: Detected EDID mismatch on connector HDMI-A-1 (i915_pm_rpm:15800) CRITICAL: i2c: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x05 0xe3 0xcd 0x0c 0x07 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff (i915_pm_rpm:15800) CRITICAL: drm: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x24 0xf4 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d 0x01 0x03 0x80 0x34 0x1e 0x78 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x21 0x08 0x00 0xd1 0xc0 0x81 0xc0 0x61 0x40 0x45 0x40 0x31 0x40 0x01 0x01 0x01 0x01 0x01 0x01 0x02 0x3a 0x80 0x18 0x71 0x38 0x2d 0x40 0x58 0x2c 0x45 0x00 0x08 0x2c 0x21 0x00 0x00 0x18 0x00 0x00 0x00 0xfd 0x00 0x3b 0x3d 0x42 0x44 0x0f 0x00 0x0a 0x20 0x20 0x20 0x20 0x20 0x20 0x00 0x00 0x00 0xfc 0x00 0x49 0x47 0x54 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x0a 2. failed to read EDID over i2c (i915_pm_rpm:11717) DEBUG: I2C access failed with errno 6, No such device or address (i915_pm_rpm:11717) CRITICAL: Detected EDID mismatch on connector HDMI-A-2 (i915_pm_rpm:11717) CRITICAL: i2c: NULL (i915_pm_rpm:11717) CRITICAL: drm: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x24 0xf4 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d 0x01 0x03 0x80 0x34 0x1e 0x78 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x21 0x08 0x00 0xd1 0xc0 0x81 0xc0 0x61 0x40 0x45 0x40 0x31 0x40 0x01 0x01 0x01 0x01 0x01 0x01 0x02 0x3a 0x80 0x18 0x71 0x38 0x2d 0x40 0x58 0x2c 0x45 0x00 0x08 0x2c 0x21 0x00 0x00 0x18 0x00 0x00 0x00 0xfd 0x00 0x3b 0x3d 0x42 0x44 0x0f 0x00 0x0a 0x20 0x20 0x20 0x20 0x20 0x20 0x00 0x00 0x00 0xfc 0x00 0x49 0x47 0x54 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x0a 3. DRM failed to get EDID: (i915_pm_rpm:14052) CRITICAL: Detected EDID mismatch on connector HDMI-A-1 (i915_pm_rpm:14052) CRITICAL: i2c: 0x00 0xff 0xff 0xff 0xff 0xff 0xff 0x00 0x05 0xe3 0xcd 0x0c 0x07 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff (i915_pm_rpm:14052) CRITICAL: drm: NULL <4> [3217.152221] i915 0000:00:02.0: HDMI-A-1: EDID is invalid: <4> [3217.152223] [00] BAD 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff <4> [3217.152225] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152226] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152227] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152228] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152229] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152230] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <4> [3217.152231] [00] BAD ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff On shard-hsw1 we often see EDID being full of 0xff both for DRM and manual i2c read (cases 1 and 3), for other shards we quite often we fail to read i2c with no such device or address (case 2). I would advise switching dongles between machine hsw1 and one other hsw to see whether the other machine becomes plagued with 0xffs and HSW get the sporadic no such device or address. This would help us attribute the issue either to the machine or to the faulty dongle. shard-hsw1 has now a new HDMI dongle and is supressed, we'll seen in a few days whether that solves any issues shard-hsw1 seems to be better now, and behaves just like the other HSW. The biggest contributor to the failures is VGA over DP. Oleg has proposed a series of patches that expose DP's downstream port type to the userspace: https://patchwork.freedesktop.org/series/63027/ With this in place we will be able to treat DP->VGA as VGA (i.e. ignore them). Then we can focus on all the other failures. I have a funny feeling that HSWs are going to be quite problematic. This issue is still every day. Last occurence was 2 days ago. Seen on skl, icl, hsw and tgl. EDID is compared from 2 sources i2c and drm. in most failures either the i2c edid is read as all zeroes or the DRM edid is NULL. the mismatch causes the failure. Failure to get correct EDID would mean improper display configuration. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/68. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.