Bug 108511 - [CI][BAT] igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWidth == c2->mmWidth
Summary: [CI][BAT] igt@pm_rpm@module-reload - dmesg-fail - Failed assertion: c1->mmWid...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Anshuman Gupta
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-22 07:40 UTC by Martin Peres
Modified: 2018-11-28 15:52 UTC (History)
3 users (show)

See Also:
i915 platform: GLK
i915 features: power/runtime PM


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-10-22 07:40:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4685/fi-glk-j4005/igt@pm_rpm@module-reload.html

Starting subtest: module-reload
(pm_rpm:3294) CRITICAL: Test assertion failure function assert_drm_connectors_equal, file ../tests/pm_rpm.c:510:
(pm_rpm:3294) CRITICAL: Failed assertion: c1->mmWidth == c2->mmWidth
(pm_rpm:3294) CRITICAL: error: 620 != 0
Subtest module-reload failed.
Comment 1 James Ausmus 2018-10-24 16:48:13 UTC
Is this perhaps related to the "[drm:drm_mode_config_cleanup] *ERROR* connector HDMI-A-1 leaked!" earlier in the dmesg?
Comment 2 Anshuman Gupta 2018-11-15 12:05:17 UTC
It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and on drm internal latest.

sudo ./tests/pm_rpm --run-subtest module-reload
IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64)
Runtime PM support: 1
PC8 residency support: 0
DMC: fw loaded: yes
Starting subtest: module-reload
Reloading i915 with disable_display=1 mmio_debug=-1

Runtime PM support: 1
PC8 residency support: 0
DMC: fw loaded: yes
Reloading i915 with mmio_debug=-1

Runtime PM support: 1
PC8 residency support: 0

PS: When tested on latest drm-tip, test is getting killed due to "unable to handle kernel paging request" at kernel, which doesn't look like a similar issue.

[  246.498589] BUG: unable to handle kernel paging request at ffffffffc03802f8
[  246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0
[  246.498599] Oops: 0000 [#1] SMP PTI
[  246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G  R               4.20.0-rc1+ #1
Comment 3 Martin Peres 2018-11-15 15:38:10 UTC
(In reply to Anshuman Gupta from comment #2)
> It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and
> on drm internal latest.
> 
> sudo ./tests/pm_rpm --run-subtest module-reload
> IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64)
> Runtime PM support: 1
> PC8 residency support: 0
> DMC: fw loaded: yes
> Starting subtest: module-reload
> Reloading i915 with disable_display=1 mmio_debug=-1
> 
> Runtime PM support: 1
> PC8 residency support: 0
> DMC: fw loaded: yes
> Reloading i915 with mmio_debug=-1
> 
> Runtime PM support: 1
> PC8 residency support: 0
> 
> PS: When tested on latest drm-tip, test is getting killed due to "unable to
> handle kernel paging request" at kernel, which doesn't look like a similar
> issue.
> 
> [  246.498589] BUG: unable to handle kernel paging request at
> ffffffffc03802f8
> [  246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0
> [  246.498599] Oops: 0000 [#1] SMP PTI
> [  246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G  R              
> 4.20.0-rc1+ #1

The reproduction rate has been low, to say the least. So don't assume this is fixed :s

I would say there are easier bugs to get started with: https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and pick one you like!
Comment 4 Anshuman Gupta 2018-11-28 13:34:41 UTC
(In reply to Martin Peres from comment #3)
> (In reply to Anshuman Gupta from comment #2)
> > It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and
> > on drm internal latest.
> > 
> > sudo ./tests/pm_rpm --run-subtest module-reload
> > IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64)
> > Runtime PM support: 1
> > PC8 residency support: 0
> > DMC: fw loaded: yes
> > Starting subtest: module-reload
> > Reloading i915 with disable_display=1 mmio_debug=-1
> > 
> > Runtime PM support: 1
> > PC8 residency support: 0
> > DMC: fw loaded: yes
> > Reloading i915 with mmio_debug=-1
> > 
> > Runtime PM support: 1
> > PC8 residency support: 0
> > 
> > PS: When tested on latest drm-tip, test is getting killed due to "unable to
> > handle kernel paging request" at kernel, which doesn't look like a similar
> > issue.
> > 
> > [  246.498589] BUG: unable to handle kernel paging request at
> > ffffffffc03802f8
> > [  246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0
> > [  246.498599] Oops: 0000 [#1] SMP PTI
> > [  246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G  R              
> > 4.20.0-rc1+ #1
> 
> The reproduction rate has been low, to say the least. So don't assume this
> is fixed :s
> 
> I would say there are easier bugs to get started with:
> https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and
> pick one you like!

I have tried to reproduced the issue, but not get any luck  to reproduce it.

However, i had analyzed the logs and able to conclude it logically from the logs.

After resuming there is hot plug event from hdmi and it is reading the invalid EDID and tries to read EDID after enabling GPIO bit banging, there also it has failed to read EDID and  HDMI connector status changed from connected to disconnected.

<7> [343.557663] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2]
<7> [343.557756] [drm:intel_hdmi_detect [i915]] [CONNECTOR:125:HDMI-A-2]
<7> [343.569930] [drm:intel_runtime_resume [i915]] Resuming device
--------------------------------------------------------------------------
<4> [343.632835] i915 0000:00:02.0: HDMI-A-2: EDID is invalid:
<4> [343.632841] \x09[00] BAD  00 ff ff ff ff ff ff 00 05 e3 79 28 2b 09 00 00
<4> [343.632844] \x09[00] BAD  32 1b 01 03 80 3e 22 78 2a 08 a5 a2 57 4f a2 28
<4> [343.632847] \x09[00] BAD  0f 50 54 bf ef 00 d1 c0 b3 00 95 00 81 80 80 40
<4> [343.632849] \x09[00] BAD  81 c0 01 01 01 01 4d d0 00 a0 f0 70 3e 80 30 20
<4> [343.632852] \x09[00] BAD  20 00 2d 55 21 00 00 1a a3 66 00 a0 f0 70 1f 80
<4> [343.632854] \x09[00] BAD  30 20 35 00 6d 55 21 00 00 1a 00 00 00 fc 00 55
<4> [343.632857] \x09[00] BAD  32 38 37 39 47 36 0a 20 20 20 20 20 00 00 00 fd
<4> [343.632859] \x09[00] BAD  00 17 50 1e 8c 3c 00 0a 20 20 20 20 20 20 01 88
<7> [343.632933] [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging
<7> [343.632989] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus dpb. force bit now 1
<7> [343.704493] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpb
<7> [343.704595] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging on i915 gmbus dpb. force bit now 0
<7> [343.705207] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1)
<7> [343.705259] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK on first message, retry
<7> [343.705817] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1)
--------------------------------------------------------------------------------
<7> [343.710557] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] status updated from connected to disconnected
<7> [343.710563] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] disconnected


HDMI EDID read may get failed due to poor device or poor hdmi cable while resuming, which caused mismatch in connector presuspend width and during suspend.  

I had a discussion with Shashank,  this is really specific to HDMI DRM policy to continue with modeset despite an Invalid EDID.
Comment 5 Anshuman Gupta 2018-11-28 13:35:11 UTC
(In reply to Martin Peres from comment #3)
> (In reply to Anshuman Gupta from comment #2)
> > It is not reproducible anymore, i have tested it on drm-tip 4.19.0-rc8 and
> > on drm internal latest.
> > 
> > sudo ./tests/pm_rpm --run-subtest module-reload
> > IGT-Version: 1.23-gea37f9c (x86_64) (Linux: 4.19.0-rc8-custom x86_64)
> > Runtime PM support: 1
> > PC8 residency support: 0
> > DMC: fw loaded: yes
> > Starting subtest: module-reload
> > Reloading i915 with disable_display=1 mmio_debug=-1
> > 
> > Runtime PM support: 1
> > PC8 residency support: 0
> > DMC: fw loaded: yes
> > Reloading i915 with mmio_debug=-1
> > 
> > Runtime PM support: 1
> > PC8 residency support: 0
> > 
> > PS: When tested on latest drm-tip, test is getting killed due to "unable to
> > handle kernel paging request" at kernel, which doesn't look like a similar
> > issue.
> > 
> > [  246.498589] BUG: unable to handle kernel paging request at
> > ffffffffc03802f8
> > [  246.498594] PGD 17440c067 P4D 17440c067 PUD 17440e067 PMD 177950067 PTE 0
> > [  246.498599] Oops: 0000 [#1] SMP PTI
> > [  246.498603] CPU: 1 PID: 2429 Comm: pm_rpm Tainted: G  R              
> > 4.20.0-rc1+ #1
> 
> The reproduction rate has been low, to say the least. So don't assume this
> is fixed :s
> 
> I would say there are easier bugs to get started with:
> https://intel-gfx-ci.01.org/cibuglog/ <-- order by reproduction rate and
> pick one you like!

I have tried to reproduced the issue, but not get any luck  to reproduce it.

However, i had analyzed the logs and able to conclude it logically from the logs.

After resuming there is hot plug event from hdmi and it is reading the invalid EDID and tries to read EDID after enabling GPIO bit banging, there also it has failed to read EDID and  HDMI connector status changed from connected to disconnected.

<7> [343.557663] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2]
<7> [343.557756] [drm:intel_hdmi_detect [i915]] [CONNECTOR:125:HDMI-A-2]
<7> [343.569930] [drm:intel_runtime_resume [i915]] Resuming device
--------------------------------------------------------------------------
<4> [343.632835] i915 0000:00:02.0: HDMI-A-2: EDID is invalid:
<4> [343.632841] \x09[00] BAD  00 ff ff ff ff ff ff 00 05 e3 79 28 2b 09 00 00
<4> [343.632844] \x09[00] BAD  32 1b 01 03 80 3e 22 78 2a 08 a5 a2 57 4f a2 28
<4> [343.632847] \x09[00] BAD  0f 50 54 bf ef 00 d1 c0 b3 00 95 00 81 80 80 40
<4> [343.632849] \x09[00] BAD  81 c0 01 01 01 01 4d d0 00 a0 f0 70 3e 80 30 20
<4> [343.632852] \x09[00] BAD  20 00 2d 55 21 00 00 1a a3 66 00 a0 f0 70 1f 80
<4> [343.632854] \x09[00] BAD  30 20 35 00 6d 55 21 00 00 1a 00 00 00 fc 00 55
<4> [343.632857] \x09[00] BAD  32 38 37 39 47 36 0a 20 20 20 20 20 00 00 00 fd
<4> [343.632859] \x09[00] BAD  00 17 50 1e 8c 3c 00 0a 20 20 20 20 20 20 01 88
<7> [343.632933] [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging
<7> [343.632989] [drm:intel_gmbus_force_bit [i915]] enabling bit-banging on i915 gmbus dpb. force bit now 1
<7> [343.704493] [drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter i915 gmbus dpb
<7> [343.704595] [drm:intel_gmbus_force_bit [i915]] disabling bit-banging on i915 gmbus dpb. force bit now 0
<7> [343.705207] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1)
<7> [343.705259] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK on first message, retry
<7> [343.705817] [drm:do_gmbus_xfer [i915]] GMBUS [i915 gmbus dpb] NAK for addr: 0040 w(1)
--------------------------------------------------------------------------------
<7> [343.710557] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] status updated from connected to disconnected
<7> [343.710563] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:125:HDMI-A-2] disconnected


HDMI EDID read may get failed due to poor device or poor hdmi cable while resuming, which caused mismatch in connector presuspend width and during suspend.  

I had a discussion with Shashank,  this is really specific to HDMI DRM policy to continue with modeset despite an Invalid EDID.
Comment 6 Jani Saarinen 2018-11-28 15:40:09 UTC
Is this real display or dummy on GLK? 
https://intel-gfx-ci.01.org/hardware/fi-glk-j4005/i915_display_info.txt
Comment 7 Martin Peres 2018-11-28 15:52:03 UTC
(In reply to Jani Saarinen from comment #6)
> Is this real display or dummy on GLK? 
> https://intel-gfx-ci.01.org/hardware/fi-glk-j4005/i915_display_info.txt

Hard to tell, https://intel-gfx-ci.01.org/hardware.html#fi-glk-j4005 is not saying so I guess it means we just can't get the EDID(?).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.