Summary: | [CI][BAT] igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> | ||||
Component: | DRM/Intel | Assignee: | Anshuman Gupta <anshuman.gupta> | ||||
Status: | RESOLVED MOVED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | high | CC: | anshuman.gupta, intel-gfx-bugs, james.ausmus | ||||
Version: | XOrg git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | ReadyForDev | ||||||
i915 platform: | SKL | i915 features: | display/LSPCON | ||||
Attachments: |
|
Description
Martin Peres
2018-10-22 07:40:33 UTC
Is this perhaps related to the "[drm:drm_mode_config_cleanup] *ERROR* connector HDMI-A-1 leaked!" earlier in the dmesg? It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and on drm internal latest. sudo ./tests/pm_rpm --run-subtest module-reload IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64) Runtime PM support: 1 PC8 residency support: 0 DMC: fw loaded: yes Starting subtest: module-reload Reloading i915 with disable_display=1 mmio_debug=-1 Runtime PM support: 1 PC8 residency support: 0 DMC: fw loaded: yes Reloading i915 with mmio_debug=-1 Runtime PM support: 1 PC8 residency support: 0 PS: When tested on latest drm-tip, test is getting killed due to "unable to handle kernel paging request" at kernel, which doesn't look like a similar issue. [ 246.498589] BUG: unable to handle kernel paging request at ffffffffc03802f8 [ 246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0 [ 246.498599] Oops: 0000 [#1] SMP PTI [ 246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G R 4.20.0-rc1+ #1 (In reply to Anshuman Gupta from comment #2) > It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and > on drm internal latest. > > sudo ./tests/pm_rpm --run-subtest module-reload > IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64) > Runtime PM support: 1 > PC8 residency support: 0 > DMC: fw loaded: yes > Starting subtest: module-reload > Reloading i915 with disable_display=1 mmio_debug=-1 > > Runtime PM support: 1 > PC8 residency support: 0 > DMC: fw loaded: yes > Reloading i915 with mmio_debug=-1 > > Runtime PM support: 1 > PC8 residency support: 0 > > PS: When tested on latest drm-tip, test is getting killed due to "unable to > handle kernel paging request" at kernel, which doesn't look like a similar > issue. > > [ 246.498589] BUG: unable to handle kernel paging request at > ffffffffc03802f8 > [ 246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0 > [ 246.498599] Oops: 0000 [#1] SMP PTI > [ 246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G R > 4.20.0-rc1+ #1 The reproduction rate has been low, to say the least. So don't assume this is fixed :s I would say there are easier bugs to get started with: https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and pick one you like! (In reply to Martin Peres from comment #3) > (In reply to Anshuman Gupta from comment #2) > > It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and > > on drm internal latest. > > > > sudo ./tests/pm_rpm --run-subtest module-reload > > IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64) > > Runtime PM support: 1 > > PC8 residency support: 0 > > DMC: fw loaded: yes > > Starting subtest: module-reload > > Reloading i915 with disable_display=1 mmio_debug=-1 > > > > Runtime PM support: 1 > > PC8 residency support: 0 > > DMC: fw loaded: yes > > Reloading i915 with mmio_debug=-1 > > > > Runtime PM support: 1 > > PC8 residency support: 0 > > > > PS: When tested on latest drm-tip, test is getting killed due to "unable to > > handle kernel paging request" at kernel, which doesn't look like a similar > > issue. > > > > [ 246.498589] BUG: unable to handle kernel paging request at > > ffffffffc03802f8 > > [ 246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0 > > [ 246.498599] Oops: 0000 [#1] SMP PTI > > [ 246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G R > > 4.20.0-rc1+ #1 > > The reproduction rate has been low, to say the least. So don't assume this > is fixed :s > > I would say there are easier bugs to get started with: > https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and > pick one you like! I have tried to reproduced the issue, but not get any luck to reproduce it. However, i had analyzed the logs and able to conclude it logically from the logs. After resuming there is hot plug event from hdmi and it is reading the invalid EDID and tries to read EDID after enabling GPIO bit banging, there also it has failed to read EDID and HDMI connector status changed from connected to disconnected. <7> [343.557663] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] <7> [343.557756] [drm:intel_hdmi_detect [i915]] [CONNECTOR:125:HDMI-A-2] <7> [343.569930] [drm:intel_runtime_resume [i915]] Resuming device -------------------------------------------------------------------------- <4> [343.632835] i915 0000:00:02.0: HDMI-A-2: EDID is invalid: <4> [343.632841] \x09[00] BAD 00 ff ff ff ff ff ff 00 05 e3 79 28 2b 09 00 00 <4> [343.632844] \x09[00] BAD 32 1b 01 03 80 3e 22 78 2a 08 a5 a2 57 4f a2 28 <4> [343.632847] \x09[00] BAD 0f 50 54 bf ef 00 d1 c0 b3 00 95 00 81 80 80 40 <4> [343.632849] \x09[00] BAD 81 c0 01 01 01 01 4d d0 00 a0 f0 70 3e 80 30 20 <4> [343.632852] \x09[00] BAD 20 00 2d 55 21 00 00 1a a3 66 00 a0 f0 70 1f 80 <4> [343.632854] \x09[00] BAD 30 20 35 00 6d 55 21 00 00 1a 00 00 00 fc 00 55 <4> [343.632857] \x09[00] BAD 32 38 37 39 47 36 0a 20 20 20 20 20 00 00 00 fd <4> [343.632859] \x09[00] BAD 00 17 50 1e 8c 3c 00 0a 20 20 20 20 20 20 01 88 <7> [343.632933] [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging <7> [343.632989] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus dpb. force bit now 1 <7> [343.704493] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpb <7> [343.704595] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging on i915 gmbus dpb. force bit now 0 <7> [343.705207] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1) <7> [343.705259] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK on first message, retry <7> [343.705817] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1) -------------------------------------------------------------------------------- <7> [343.710557] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] status updated from connected to disconnected <7> [343.710563] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] disconnected HDMI EDID read may get failed due to poor device or poor hdmi cable while resuming, which caused mismatch in connector presuspend width and during suspend. I had a discussion with Shashank, this is really specific to HDMI DRM policy to continue with modeset despite an Invalid EDID. (In reply to Martin Peres from comment #3) > (In reply to Anshuman Gupta from comment #2) > > It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and > > on drm internal latest. > > > > sudo ./tests/pm_rpm --run-subtest module-reload > > IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64) > > Runtime PM support: 1 > > PC8 residency support: 0 > > DMC: fw loaded: yes > > Starting subtest: module-reload > > Reloading i915 with disable_display=1 mmio_debug=-1 > > > > Runtime PM support: 1 > > PC8 residency support: 0 > > DMC: fw loaded: yes > > Reloading i915 with mmio_debug=-1 > > > > Runtime PM support: 1 > > PC8 residency support: 0 > > > > PS: When tested on latest drm-tip, test is getting killed due to "unable to > > handle kernel paging request" at kernel, which doesn't look like a similar > > issue. > > > > [ 246.498589] BUG: unable to handle kernel paging request at > > ffffffffc03802f8 > > [ 246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0 > > [ 246.498599] Oops: 0000 [#1] SMP PTI > > [ 246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G R > > 4.20.0-rc1+ #1 > > The reproduction rate has been low, to say the least. So don't assume this > is fixed :s > > I would say there are easier bugs to get started with: > https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and > pick one you like! I have tried to reproduced the issue, but not get any luck to reproduce it. However, i had analyzed the logs and able to conclude it logically from the logs. After resuming there is hot plug event from hdmi and it is reading the invalid EDID and tries to read EDID after enabling GPIO bit banging, there also it has failed to read EDID and HDMI connector status changed from connected to disconnected. <7> [343.557663] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] <7> [343.557756] [drm:intel_hdmi_detect [i915]] [CONNECTOR:125:HDMI-A-2] <7> [343.569930] [drm:intel_runtime_resume [i915]] Resuming device -------------------------------------------------------------------------- <4> [343.632835] i915 0000:00:02.0: HDMI-A-2: EDID is invalid: <4> [343.632841] \x09[00] BAD 00 ff ff ff ff ff ff 00 05 e3 79 28 2b 09 00 00 <4> [343.632844] \x09[00] BAD 32 1b 01 03 80 3e 22 78 2a 08 a5 a2 57 4f a2 28 <4> [343.632847] \x09[00] BAD 0f 50 54 bf ef 00 d1 c0 b3 00 95 00 81 80 80 40 <4> [343.632849] \x09[00] BAD 81 c0 01 01 01 01 4d d0 00 a0 f0 70 3e 80 30 20 <4> [343.632852] \x09[00] BAD 20 00 2d 55 21 00 00 1a a3 66 00 a0 f0 70 1f 80 <4> [343.632854] \x09[00] BAD 30 20 35 00 6d 55 21 00 00 1a 00 00 00 fc 00 55 <4> [343.632857] \x09[00] BAD 32 38 37 39 47 36 0a 20 20 20 20 20 00 00 00 fd <4> [343.632859] \x09[00] BAD 00 17 50 1e 8c 3c 00 0a 20 20 20 20 20 20 01 88 <7> [343.632933] [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging <7> [343.632989] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus dpb. force bit now 1 <7> [343.704493] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpb <7> [343.704595] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging on i915 gmbus dpb. force bit now 0 <7> [343.705207] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1) <7> [343.705259] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK on first message, retry <7> [343.705817] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1) -------------------------------------------------------------------------------- <7> [343.710557] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] status updated from connected to disconnected <7> [343.710563] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] disconnected HDMI EDID read may get failed due to poor device or poor hdmi cable while resuming, which caused mismatch in connector presuspend width and during suspend. I had a discussion with Shashank, this is really specific to HDMI DRM policy to continue with modeset despite an Invalid EDID. Is this real display or dummy on GLK? https://intel-gfx-ci.01.org/hardware/fi-glk-j4005/i915_display_info.txt (In reply to Jani Saarinen from comment #6) > Is this real display or dummy on GLK? > https://intel-gfx-ci.01.org/hardware/fi-glk-j4005/i915_display_info.txt Hard to tell, https://intel-gfx-ci.01.org/hardware.html#fi-glk-j4005 is not saying so I guess it means we just can't get the EDID(?). A CI Bug Log filter associated to this bug has been updated: {- GLK: igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth -} {+ SKL GLK: igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth +} No new failures caught with the new filter Created attachment 143223 [details]
attachment-14057-0.html
I will be Out of Office from WW04.1 to WW01.3. I will be having limited access for emails, please accept deply in response.
I will be reachable on phone 973924638.
Thanks ,
Anshuman
+91-9739274638
A CI Bug Log filter associated to this bug has been updated: {- SKL GLK: igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth -} {+ SKL GLK: igt@pm_rpm@module-reload - fail / dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5478/fi-skl-6770hq/igt@pm_rpm@module-reload.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5479/fi-skl-6770hq/igt@pm_rpm@module-reload.html * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5483/fi-skl-6770hq/igt@pm_rpm@module-reload.html A CI Bug Log filter associated to this bug has been updated: {- SKL GLK: igt@pm_rpm@module-reload - fail / dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth -} {+ SKL GLK: igt@pm_rpm@(module-reload|drm-resources-equal) - fail / dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_206/fi-skl-6770hq/igt@pm_rpm@drm-resources-equal.html * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_208/fi-skl-6770hq/igt@pm_rpm@drm-resources-equal.html (In reply to CI Bug Log from comment #11) > A CI Bug Log filter associated to this bug has been updated: > > {- SKL GLK: igt@pm_rpm@module-reload - fail / dmesg-fail - Failed assertion: > c1->mmWidth == c2->mmWidth -} > {+ SKL GLK: igt@pm_rpm@(module-reload|drm-resources-equal) - fail / > dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth +} > > New failures caught by the filter: > > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_206/fi-skl-6770hq/ > igt@pm_rpm@drm-resources-equal.html > * > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_208/fi-skl-6770hq/ > igt@pm_rpm@drm-resources-equal.html Setup fi-skl-6770hq (https://intel-gfx-ci.01.org/hardware.html) to have only one DP display connected but after analyzing logs, it seems that there are downstream ports are present. Can somebody confirm about this setup whether it is using any Downstream ports. As this FDO is reported recently with same setup multiple times. I'm sorry, but can you explain me what 'Downstream port' is in this context? Anshuman unable to reproduce this issue locally. Further investigation is needed. Priority is dropped to high. Not able to reproduce this task locally, run this igt test for continuous three days. This bug has been seen on Can we get access to this particular fi-skl-6770hq/ CI machine. @Lakshmi can we get access to this machine ? This bug hasn't been seen in months and should be closed unless there are any objections. Anshuman Gupta : Do you have any objection to closing this bug or do you still require more information? yes we can close this as i was unable to reproduce this issue. Reopening, this issue is still happening very regularly and history can be found from CI bug log. This issue is only has seen with CI H/W fi-skl-6770hq, this CI H/W uses LSPCON. https://intel-gfx-ci.01.org/hardware/fi-skl-6770hq/i915_display_info.txt tells that DP "branch device present: yes" and device type is HDMI. Probable root cause is due to DP-1 connector is disconnected when IGT module-reload test probe DP1 connector modes, which resulted in mismatch of DP1 connector physical width. <7> [275.707285] [drm:intel_runtime_suspend [i915]] Suspending device <7> [275.710274] [drm:intel_runtime_suspend [i915]] Device suspended <7> [275.795462] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:86:DP-1] ------------------------------------------------------------------------------------------------------------------------------- <7> [276.074575] [drm:lspcon_wait_mode [i915]] Current LSPCON mode PCON <7> [276.075459] [drm:intel_dp_read_dpcd [i915]] DPCD: 12 14 c4 01 01 15 01 81 02 01 04 01 0f 00 01 <7> [276.076282] [drm:drm_dp_read_desc] DP branch: OUI 00-1c-f8 dev-ID 175IB0 HW-rev 1.0 SW-rev 7.64 quirks 0x0000 <7> [276.077021] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:86:DP-1] status updated from connected to disconnected <7> [276.077024] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:86:DP-1] disconnected <7> [276.077204] [drm:intel_dp_detect [i915]] [CONNECTOR:86:DP-1] This error is occuring on SKL 6770HQ CI machine every 4-5 runs for the past two months. It didn't occur from August 15 - Sept 17, 2019. Also, the suspected reason for failure doesn't seem to be occurring on the recent runs - DP-1 connects/reconnects after device resumes from suspend. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/178. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.