Bug 111180 - linux-5.1.19: i915 when ext HDMI is connected triggers too many interrupts and high load
Summary: linux-5.1.19: i915 when ext HDMI is connected triggers too many interrupts an...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-21 17:21 UTC by mmokrejs@fold.natur.cuni.cz
Modified: 2019-09-09 14:51 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: display/HDMI


Attachments
dmesg-5.1.19 (65.50 KB, text/plain)
2019-07-21 17:28 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details
dmesg-5.1.19__with_drm_debug.txt (3.98 MB, text/plain)
2019-07-22 15:26 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details
dmesg-5.1.19__with_drm_debug2.txt (3.95 MB, text/plain)
2019-07-22 15:28 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details
dmesg-5.1.19__HDMI_cable_pulled_out.txt (3.93 MB, text/plain)
2019-07-22 23:33 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details
dmesg-5.3.0-rc1-drm-tip.txt (3.19 MB, text/plain)
2019-07-23 12:40 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details
dmesg-5.3.0-rc1-drm-tip__HDMI_cable_unplugged.txt (3.94 MB, text/plain)
2019-07-24 06:27 UTC, mmokrejs@fold.natur.cuni.cz
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mmokrejs@fold.natur.cuni.cz 2019-07-21 17:21:47 UTC
Hi,
  for many years I am using this Dell Vostro 3550 laptop (A12 BIOS version), with Sandybridge CPU and bundled intel graphics. Since a few days, the CPU fan runs at max speed if I connect (as before) a HDMI cable.

On the kernel commandline I had "i915.i915_enable_rc6=1" but I also tried to append "i915.enable_fbc=1" (without any effect).

[    2.506643] [drm] Disabling ppGTT for VT-d support
[    2.506734] [drm] VT-d active for gfx access
[    2.506809] i915 0000:00:02.0: vgaarb: deactivate vga console
[    2.507340] Console: switching to colour dummy device 80x25
[    2.507512] [drm] DMAR active, disabling use of stolen memory
[    2.510243] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.510248] [drm] Driver supports precise vblank timestamp query.
[    2.513861] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.532685] [drm] Initialized i915 1.6.0 20190207 for 0000:00:02.0 on minor 0


$ xrandr
Screen 0: minimum 8 x 8, current 1920 x 1200, maximum 32767 x 32767
LVDS1 connected primary (normal left inverted right x axis y axis)
   1366x768      60.05 +  40.01  
   1280x720      59.74  
   1024x768      60.00  
   1024x576      60.00    59.90    59.82  
   960x540       60.00    59.63    59.82  
   800x600       60.32    56.25  
   864x486       60.00    59.92    59.57  
   640x480       59.94  
   720x405       59.51    60.00    58.99  
   680x384       60.00  
   640x360       59.84    59.32    60.00  
DP1 disconnected (normal left inverted right x axis y axis)
HDMI1 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 620mm x 340mm
   2048x1152     60.00  
   1920x1200     59.95* 
   1920x1080     60.00    60.00    50.00    59.94  
   1920x1080i    60.00    50.00    59.94  
   1600x1200     60.00  
   1680x1050     59.88  
   1600x900      60.00  
   1280x1024     75.02    60.02  
   1280x800      59.91  
   1152x864      75.00  
   1280x720      60.00    50.00    59.94  
   1024x768      75.03    70.07    60.00  
   832x624       74.55  
   800x600       72.19    75.00    60.32    56.25  
   720x576       50.00  
   720x576i      50.00  
   720x480       60.00    59.94  
   720x480i      60.00    59.94  
   640x480       75.00    72.81    66.67    60.00    59.94  
   720x400       70.08  
VGA1 disconnected (normal left inverted right x axis y axis)
VIRTUAL1 disconnected (normal left inverted right x axis y axis)
$

$ top

top - 19:14:38 up 52 min,  1 user,  load average: 1,80, 1,84, 1,94
Tasks: 179 total,   2 running, 177 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,7 us,  1,0 sy,  0,0 ni, 96,7 id,  0,2 wa,  1,3 hi,  0,2 si,  0,0 st
MiB Mem :  15979,9 total,  13080,3 free,   1618,2 used,   1281,4 buff/cache
MiB Swap:  26224,1 total,  26224,1 free,      0,0 used.  13707,9 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                 
 9308 root      20   0       0      0      0 I   1,0   0,0   0:01.34 kworker/u8:1-i915-dp                                                                                                                                                    
   50 root      20   0       0      0      0 R   0,7   0,0   0:16.96 kworker/1:1+events                                                                                                                                                      
                                                                
There are no processes really requiring CPU processing, the high load is supposedly only due to the many interrupts triggered.


$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 13378072 129548 1200160    0    0   135    41 2011 4429  4  3 89  3  0
 0  0      0 13378332 129564 1200200    0    0     0    76 3711 7258  0  2 95  4  0
 0  0      0 13378332 129564 1200200    0    0     0     0 3707 7380  0  2 98  1  0
 0  0      0 13378332 129564 1200200    0    0     0     0 4362 8916  0  3 97  0  0
 0  0      0 13378332 129564 1200200    0    0     0     0 4026 7785  0  1 99  0  0
 0  0      0 13378332 129564 1200200    0    0     0     0 3623 6965  0  1 99  0  0
 1  1      0 13378332 129572 1200192    0    0     0    76 3576 6627  0  2 96  2  0
 1  0      0 13378332 129572 1200200    0    0     0     0 3531 6535  1  3 95  1  0
 0  0      0 13378332 129572 1200200    0    0     0     0 3600 6947  1  2 97  0  0



# cat /proc/interrupts     
           CPU0       CPU1       
  0:          6          0  IR-IO-APIC   2-edge      timer
  1:         17          0  IR-IO-APIC   1-edge      i8042
  8:          0          1  IR-IO-APIC   8-edge      rtc0
  9:          0          4  IR-IO-APIC   9-fasteoi   acpi
 12:          0          0  IR-IO-APIC  12-edge      i8042
 16:         82          0  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 18:          0          0  IR-IO-APIC  18-fasteoi   i801_smbus
 23:          0         64  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0  DMAR-MSI   0-edge      dmar0
 25:          0          0  DMAR-MSI   1-edge      dmar1
 30:          0          0  IR-PCI-MSI 473088-edge      pciehp
 31:          0    7695604  IR-PCI-MSI 32768-edge      i915
 32:          0          0  IR-PCI-MSI 360448-edge      mei_me
 33:          0      26111  IR-PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 34:      14883          0  IR-PCI-MSI 2621440-edge      p1p1
 35:          0      35186  IR-PCI-MSI 5767168-edge      xhci_hcd
 36:          0          0  IR-PCI-MSI 5767169-edge      xhci_hcd
 37:          0          0  IR-PCI-MSI 5767170-edge      xhci_hcd
 38:          0          0  IR-PCI-MSI 5767171-edge      xhci_hcd
 39:          0          0  IR-PCI-MSI 5767172-edge      xhci_hcd
 40:          0          0  IR-PCI-MSI 442368-edge      snd_hda_intel:card0
 41:          0          1  IR-PCI-MSI 4718592-edge      iwlwifi
NMI:          0          0   Non-maskable interrupts
LOC:     541295    4096293   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:         16        642   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:     643529     256795   Rescheduling interrupts
CAL:      58108      32453   Function call interrupts
TLB:     197656     133555   TLB shootdowns
TRM:       3820       3820   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         11         12   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event
#

# w
 19:17:11 up 55 min,  1 user,  load average: 1.48, 1.76, 1.90


# lspci -vvv -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Dell 2nd Generation Core Processor Family Integrated Graphics Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 31
	Region 0: Memory at f6800000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915


I tried older kernels but I cannot find one with which the issue would disappear. Maybe it was caused by some xfce4 or drm driver update (instead of kernel update) but I cannot figure out which one would that be. At the moment I have x11-drivers/xf86-video-intel-2.99.917_p20190301 and x11-libs/libdrm-2.4.99 (Gentoo Linux).

The fan goes off (and the interrupt rate drops) if I unplug the HDMI cable from the DisplayPort.

The issue is somewhat moderate if I quit X11 and stay in framebuffer console.

Thank you for any clues
Comment 1 mmokrejs@fold.natur.cuni.cz 2019-07-21 17:28:28 UTC
Created attachment 144835 [details]
dmesg-5.1.19
Comment 2 Lakshmi 2019-07-22 14:02:25 UTC
(In reply to mmokrejs@fold.natur.cuni.cz from comment #1)
> Created attachment 144835 [details]
> dmesg-5.1.19

Can you please attach dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M. This way we know more information about the issue. 

Btw, have you tried to verify the issue with drmtip?

What is the impact as an user due to interrupts?
Comment 3 mmokrejs@fold.natur.cuni.cz 2019-07-22 15:26:48 UTC
Created attachment 144842 [details]
dmesg-5.1.19__with_drm_debug.txt
Comment 4 mmokrejs@fold.natur.cuni.cz 2019-07-22 15:28:29 UTC
Created attachment 144843 [details]
dmesg-5.1.19__with_drm_debug2.txt

A bit more collected over the time. Uptime 11 minutes.
Comment 5 mmokrejs@fold.natur.cuni.cz 2019-07-22 15:33:31 UTC
(In reply to Lakshmi from comment #2)
> 
> Btw, have you tried to verify the issue with drmtip?

No.

> What is the impact as an user due to interrupts?

Overheating CPU, fan at max speed, system responding a bit more slowly. But the external HDMI works (in 2D at least) as expected, and I can switch back to internal LCD-only state.
Comment 6 mmokrejs@fold.natur.cuni.cz 2019-07-22 23:33:12 UTC
Created attachment 144847 [details]
dmesg-5.1.19__HDMI_cable_pulled_out.txt

In this dmesg you cann see the errors and somewhere you can find where I pulled out the HDMI cable. Then, withing a few seconds CPU fan speeds went down, and I assume temperature as well.

$ cat /proc/interrupts 
           CPU0       CPU1       
  0:          6          0  IR-IO-APIC   2-edge      timer
  1:         17          0  IR-IO-APIC   1-edge      i8042
  8:          0          1  IR-IO-APIC   8-edge      rtc0
  9:          0          4  IR-IO-APIC   9-fasteoi   acpi
 12:          0          0  IR-IO-APIC  12-edge      i8042
 16:         82          0  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 18:          0          0  IR-IO-APIC  18-fasteoi   i801_smbus
 23:          0         65  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0  DMAR-MSI   0-edge      dmar0
 25:          0          0  DMAR-MSI   1-edge      dmar1
 30:          0          0  IR-PCI-MSI 473088-edge      pciehp
 31:          0   34727931  IR-PCI-MSI 32768-edge      i915
 32:          0          0  IR-PCI-MSI 360448-edge      mei_me
 33:          0      97197  IR-PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 34:     114753          0  IR-PCI-MSI 2621440-edge      p1p1
 35:          0     207816  IR-PCI-MSI 5767168-edge      xhci_hcd
 36:          0          0  IR-PCI-MSI 5767169-edge      xhci_hcd
 37:          0          0  IR-PCI-MSI 5767170-edge      xhci_hcd
 38:          0          0  IR-PCI-MSI 5767171-edge      xhci_hcd
 39:          0          0  IR-PCI-MSI 5767172-edge      xhci_hcd
 40:          0          0  IR-PCI-MSI 442368-edge      snd_hda_intel:card0
 41:          0          1  IR-PCI-MSI 4718592-edge      iwlwifi
NMI:          0          0   Non-maskable interrupts
LOC:   12587184   10611394   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:        144       7182   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:    4603868    2600550   Rescheduling interrupts
CAL:     182044     122334   Function call interrupts
TLB:     488674     456196   TLB shootdowns
TRM:      37098      37101   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         50         51   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event
$

Uptime 4hrs 9min.
Comment 7 mmokrejs@fold.natur.cuni.cz 2019-07-22 23:35:05 UTC
And, I rebooted the laptop into Win7 and there is no such a problem with HDMI connection triggering high interrupt load and hot CPU. So, it is not a faulty hardware IMHO but a Linux issue.
Comment 8 Lakshmi 2019-07-23 07:04:32 UTC
Can you reproduce this issue with drmtip (https://cgit.freedesktop.org/drm-tip) ?
Comment 9 mmokrejs@fold.natur.cuni.cz 2019-07-23 12:40:32 UTC
Created attachment 144854 [details]
dmesg-5.3.0-rc1-drm-tip.txt

The CPU fan runs at lower speed although still is not ideal. It can be turned off completely (actually get turned off if I unplug the HDMI cable). I still see the errors in the dmesg output so I am not surprised the system is bothered by interrupts to some extent. But it is definitely better than 5.1.19 and the other kernels I tried.

What I tested now is:

# git log | head
commit efc5709a1e0f5050b197b51c3fce4cbcc95fe319
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 23 11:11:03 2019 +0200

    drm-tip: 2019y-07m-23d-09h-10m-12s UTC integration manifest



# cat /proc/interrupts 
           CPU0       CPU1       
  0:          6          0  IR-IO-APIC   2-edge      timer
  1:         17          0  IR-IO-APIC   1-edge      i8042
  8:          0          1  IR-IO-APIC   8-edge      rtc0
  9:          0          4  IR-IO-APIC   9-fasteoi   acpi
 12:          0          0  IR-IO-APIC  12-edge      i8042
 16:         82          0  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 18:          0          0  IR-IO-APIC  18-fasteoi   i801_smbus
 23:          0         64  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0  DMAR-MSI   0-edge      dmar0
 25:          0          0  DMAR-MSI   1-edge      dmar1
 30:          0          0  IR-PCI-MSI 473088-edge      pciehp
 31:          0     229235  IR-PCI-MSI 32768-edge      i915
 32:          0          0  IR-PCI-MSI 360448-edge      mei_me
 33:          0      20477  IR-PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 34:        842          0  IR-PCI-MSI 2621440-edge      p1p1
 35:          0       5606  IR-PCI-MSI 5767168-edge      xhci_hcd
 36:          0          0  IR-PCI-MSI 5767169-edge      xhci_hcd
 37:          0          0  IR-PCI-MSI 5767170-edge      xhci_hcd
 38:          0          0  IR-PCI-MSI 5767171-edge      xhci_hcd
 39:          0          0  IR-PCI-MSI 5767172-edge      xhci_hcd
 40:          0          0  IR-PCI-MSI 442368-edge      snd_hda_intel:card0
 41:          0          1  IR-PCI-MSI 4718592-edge      iwlwifi
NMI:          0          0   Non-maskable interrupts
LOC:     133679      48455   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          1        349   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:      46357      32603   Rescheduling interrupts
CAL:      18434       7871   Function call interrupts
TLB:      17100      17878   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          2          3   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event
# w
 14:37:53 up 8 min,  2 users,  load average: 0.44, 0.62, 0.37
...

# lspci -vvv -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Dell 2nd Generation Core Processor Family Integrated Graphics Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 31
	Region 0: Memory at f6800000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
Comment 10 mmokrejs@fold.natur.cuni.cz 2019-07-24 06:27:14 UTC
Created attachment 144864 [details]
dmesg-5.3.0-rc1-drm-tip__HDMI_cable_unplugged.txt

Likewise, you can see what happened after I unplugged the HDMI cable, and screen resolution adjusted to 1366x768 was adjusted (laptop LCD screen).
Comment 11 Ville Syrjala 2019-09-09 14:51:45 UTC
You can try something like:
echo 15 > /sys/kernel/debug/dri/0/i915_hpd_storm_ctl

which should at least turn off the hpd irq for a few seconds after it's detected 15 spurious hotplug interrupts within one second. Obviously since the machine is generating these things constantly it'll ping-pong between the two states, but should at least let it sleep for a few seconds every now and then.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.