Bug 98501 - [i915][HSW] ACPI GPE06 storm
Summary: [i915][HSW] ACPI GPE06 storm
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: low minor
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-30 13:01 UTC by Pierre Moreau
Modified: 2018-09-10 18:37 UTC (History)
4 users (show)

See Also:
i915 platform: HSW
i915 features: display/Other


Attachments
dmesg (82.64 KB, text/plain)
2016-10-30 13:01 UTC, Pierre Moreau
no flags Details
Dissassembled DSDT table (233.42 KB, text/plain)
2016-10-30 13:04 UTC, Pierre Moreau
no flags Details
Perf output on 4.10.1 (77.29 KB, text/plain)
2017-04-05 19:25 UTC, Pierre Moreau
no flags Details
Perf output on drm-tip (25 Jul. 2017) (76.33 KB, text/plain)
2017-07-25 21:47 UTC, Pierre Moreau
no flags Details
Dmesg of drm-tip with drm.debug=0xe (151.13 KB, text/plain)
2017-07-26 18:09 UTC, Pierre Moreau
no flags Details
Dmesg of drm-tip with drm.debug=0x1e (258.53 KB, text/plain)
2018-09-10 18:37 UTC, Pierre Moreau
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Moreau 2016-10-30 13:01:53 UTC
Created attachment 127623 [details]
dmesg

Hardware: MacBook Pro 11,3 with i7-4850HQ, an Intel Iris Pro 5200 and an NVIDIA GeForce GT 750M
Architecture: x86_64
Kernel version: 4.8.5
Distribution: ArchLinux
Most likely related: https://bugs.freedesktop.org/show_bug.cgi?id=90014

I need to patch my kernel to actually be able to use the integrated GPU, as otherwise the EFI firmware turns it off (see http://lists.gnu.org/archive/html/grub-devel/2014-11/msg00034.html).

With the IGD activated and the i915 driver loaded, there is one kworker using one core at 70% all the time, and acpi/interrupts/gpe06 growing continuously increasing. With the IGD activated but i915 has not been loaded, then gpe06 remains at 0 interrupts and there are no kworker using more than 5%.
Comment 1 Pierre Moreau 2016-10-30 13:04:16 UTC
Created attachment 127624 [details]
Dissassembled DSDT table
Comment 2 Pierre Moreau 2016-10-30 13:07:00 UTC
Lukas Wunner did investigate it a bit, so here are his comments:

Okay I've just looked at an acpidump I have here (I think yours is an
MBP11,3, rather than an MBP11,2), found this in the DSDT:

    Scope (\_GPE)
    {
    ...
        Method (_L06, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
        {
            If (LAnd (\_SB.PCI0.IGPU.GSSE, LNot (GSMI)))
            {
                \_SB.PCI0.IGPU.GSCI ()
            }
            Else
            {
                Store (0x00, \_SB.PCI0.IGPU.GEFC)
                Store (0x01, SCIS) /* \SCIS */
                Store (0x00, \_SB.PCI0.IGPU.GSSE)
                Store (0x00, \_SB.PCI0.IGPU.SCIE)
            }
        }

This method is executed every time GPE 0x06 fires.  It's clearly
related to the Intel GPU but I don't know why it's generating an
interrupt storm.  The above method might give a hint:  The GSSE
bit queried in the if-condition and set in the else-branch is
defined further up:

                OperationRegion (IGDP, PCI_Config, 0x40, 0xC0)
                Field (IGDP, AnyAcc, NoLock, Preserve)
                {
                ...
                    Offset (0xA8), 
                    GSSE,   1, 
                    GSSB,   14, 
                    GSES,   1, 
                ...
                }

So GSSE is a bit in the PCI configuration space at offset
0x40 + 0xa8 = 0xe8.  You can see the current value of the bit
with "lspci -vvvvxxxx -s 0000:00:02.0".  But the question is
what this bit means.  The PRMs for this Haswell GPU are here:

https://01.org/linuxgraphics/documentation/hardware-specification-prms/2013-intel-core-processor-family

Here's the PRM documenting the registers in PCI configuration space:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-pcie-config-registers_0.pdf

If you go to page 172 of that manual you'll find there's a 16 bit
register at offset 0xe8 which is laid out like this:

Bit 15		SMI or SCI event select (SMISCISEL)
Bit 14:1	Software scratch bits (SCISB)
Bit 0		Software SCI Event (GSSCIE)

Based on the name I would guess that GSSE in the DSDT corresponds to
"GSSCIE" and GSES corresponds to "SMISCISEL".

There's some more details in the manual but SCI means System Control
Interrupt and SMI means System Management Interrupt.  There's a hidden
firmware on these Intel machines and if the CPU gets such an interrupt
it briefly stops the operating system, switches into System Management
Mode and executes an interrupt handler in the firmware.  It then switches
back to the OS and the OS has no idea anything happened.

With Thunderbolt we know that on non-Macs, the PCI tunnels are set up
by the firmware in System Management Mode.  Apple didn't like this for
some reason and steers the controller natively from the OS using a
dedicated driver.

Apparently these Intel GPUs also have some firmware code.  Apple only
supports macOS and Windows on these machines.  With the patch to make
the Intel GPU visible, the OS identifies as macOS to the EFI firmware.
Conceivably, the SCI/SMI stuff is set up differently by the EFI firmware
for Windows versus macOS and the Linux i915 driver isn't prepared for
that.



The output of `lspci -vvvvxxxx -s 0000:00:02.0` is:

00:02.0 VGA compatible controller: Intel Corporation Crystal Well Integrated Graphics Controller (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Apple Inc. Device 012f
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at c1400000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at b0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 2000 [size=64]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00018  Data: 0000
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915
00: 86 80 26 0d 07 04 90 00 08 00 00 03 00 00 00 00
10: 04 00 40 c1 00 00 00 00 0c 00 00 b0 00 00 00 00
20: 01 20 00 00 00 00 00 00 00 00 00 00 6b 10 2f 01
30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00
40: 09 00 0c 01 89 21 00 42 d0 00 4c 74 00 00 00 00
50: 11 02 00 00 3d 00 00 00 00 00 00 00 01 00 a0 7b
60: 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 05 d0 01 00 18 00 e0 fe 00 00 00 00 00 00 00 00
a0: 00 00 00 00 13 00 06 03 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 01 a4 22 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 08 00 90 81 d1 7a
Comment 3 Jari Tahvanainen 2017-04-04 13:23:26 UTC
Pierre - I'm sorry that we have neglected this bug this long.

By reusing https://bugs.freedesktop.org/show_bug.cgi?id=90014#c3 I say: can you please execute 'perf record -g -a sleep 60; perf report | head -500' and attach the output ?
Comment 4 Pierre Moreau 2017-04-05 19:25:38 UTC
Created attachment 130707 [details]
Perf output on 4.10.1

No worries, I could have pinged but forgot to do that. Here is the perf data on 4.10.1.

I was planning to try with a drm-nightly, but I haven’t done that yet. I could get the perf data on a nightly if needed.
Comment 5 Pierre Moreau 2017-05-25 09:50:53 UTC
This is still an issue with 4.11.2. Results from perf are similar to the results from 4.10.1.
Comment 6 Elizabeth 2017-07-25 17:30:25 UTC
(In reply to Pierre Moreau from comment #5)
> This is still an issue with 4.11.2. Results from perf are similar to the
> results from 4.10.1.

Hello Pierre,
Any change with 4.12 or higher and/or latest drm-tip?
Thanks.
Comment 7 Pierre Moreau 2017-07-25 18:40:03 UTC
(In reply to Elizabeth from comment #6)
> (In reply to Pierre Moreau from comment #5)
> > This is still an issue with 4.11.2. Results from perf are similar to the
> > results from 4.10.1.
> 
> Hello Pierre,
> Any change with 4.12 or higher and/or latest drm-tip?
> Thanks.

Hello Elizabeth,

I did try 4.12 yesterday but ended it with no screen at all nor keyboard. Since I’ll need to redo my kernel config and recompile a new kernel, which version would be better? I can do a 4.12 + drm-tip, to check that no regression occurred with 4.12.
Comment 8 Pierre Moreau 2017-07-25 19:53:26 UTC
I can confirm the issue is still present on 4.12.3. I’ll try on drm-tip as well.
Comment 9 Elizabeth 2017-07-25 20:38:28 UTC
(In reply to Pierre Moreau from comment #7)
> (In reply to Elizabeth from comment #6)
> > (In reply to Pierre Moreau from comment #5)
> > >, which
> version would be better? I can do a 4.12 + drm-tip, to check that no
> regression occurred with 4.12.
Hello Pierre, 
Drm-tip would be better since it includes the latest changes made.

(In reply to Pierre Moreau from comment #8)
> I can confirm the issue is still present on 4.12.3. I’ll try on drm-tip as
> well.

Thanks for testing, if the problem is still present on drm-tip could you please add new logs with the latest kernel?
Thank you.
Comment 10 Pierre Moreau 2017-07-25 21:47:39 UTC
Created attachment 132979 [details]
Perf output on drm-tip (25 Jul. 2017)

Same issue still on drm-tip. Here are the perf results on that kernel.
I’ll be happy to provide additional logs/testing.
Comment 11 Elizabeth 2017-07-26 14:43:16 UTC
(In reply to Pierre Moreau from comment #10)
> Created attachment 132979 [details]
> Perf output on drm-tip (25 Jul. 2017)
> 
> Same issue still on drm-tip. Here are the perf results on that kernel.
> I’ll be happy to provide additional logs/testing.

Thanks Pierre, 
Could you please add dmesg with drm.debug=0xe parameter on grub with that kernel also.
Thank you.
Comment 12 Pierre Moreau 2017-07-26 18:09:20 UTC
Created attachment 132998 [details]
Dmesg of drm-tip with drm.debug=0xe

Here is the requested dmesg output.
Comment 13 Elizabeth 2017-08-03 00:11:43 UTC
(In reply to Pierre Moreau from comment #12)
> Created attachment 132998 [details]
> Dmesg of drm-tip with drm.debug=0xe
> 
> Here is the requested dmesg output.
Thank you for the information. If more logs are needed for this bug it will be commented below.
Comment 14 René Rebe 2017-08-06 09:05:54 UTC
Can confirm, happens on my MacBookPro11,3, too.
Comment 15 Pierre Moreau 2017-10-02 08:24:18 UTC
Friendly ping, any updates on this?
Comment 16 Jani Nikula 2018-01-22 11:45:25 UTC
Shot in the dark, try reverting 11825b0dba78 ("drm/i915: Enable GSE interrupt on BDW+").
Comment 17 Jani Nikula 2018-01-22 11:48:03 UTC
(In reply to Jani Nikula from comment #16)
> Shot in the dark, try reverting 11825b0dba78 ("drm/i915: Enable GSE
> interrupt on BDW+").

Oh, the bug's reported against hsw, so this shouldn't make a difference anyway.

I'm afraid debugging and reverse engineering mac curiosities isn't high on the list of priorities. Sorry.
Comment 18 Jani Saarinen 2018-03-29 07:10:21 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 19 Pierre Moreau 2018-03-29 07:27:58 UTC
(In reply to Jani Nikula from comment #17)
> I'm afraid debugging and reverse engineering mac curiosities isn't high on
> the list of priorities. Sorry.

Oops, somewhat completely missed your comments. I can understand that it’s not high on your priority list. I might have a look at it when I get time.


(In reply to Jani Saarinen from comment #18)
> First of all. Sorry about spam.
> This is mass update for our bugs. 
> 
> Sorry if you feel this annoying but with this trying to understand if bug
> still valid or not.
> If bug investigation still in progress, please ignore this and I apologize!
> 
> If you think this is not anymore valid, please comment to the bug that can
> be closed.
> If you haven't tested with our latest pre-upstream tree(drm-tip), can you do
> that also to see if issue is valid there still and if you cannot see issue
> there, please comment to the bug.

No worries, I understand about mass checking the bugs for validation; we should do it again for Nouveau some day.

I’ll test drm-tip as soon as I can, but as of 4.16-rc7, the issue is still present.
Comment 20 Jani Saarinen 2018-04-23 07:51:12 UTC
Ok, thanks for the feedback.
Comment 21 Lakshmi 2018-09-08 22:35:45 UTC
Pierre, do you still have the issue? If not I can close this bug.
Comment 22 Lakshmi 2018-09-10 16:45:33 UTC
No feedback from many months, closing as resolved/ works for me.
Please re-open if issue persists with latest drm-tip https://cgit.freedesktop.org/drm-tip and send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?
Comment 23 Pierre Moreau 2018-09-10 18:02:47 UTC
Hello Lakshmi,

I have been experiencing the issue since the last update back in March, and still am as of 4.18. I’ll fire a build of drm-tip, try it and update the issue with the information.
Comment 24 Pierre Moreau 2018-09-10 18:37:57 UTC
Created attachment 141512 [details]
Dmesg of drm-tip with drm.debug=0x1e

Still experiencing the issue on 8c3078ff800467d16aea5622b2fa325826d167c2 (current drm-tip HEAD; also applied the kernel patch for advertising as macOS (see #1 for details)).
I added the output from dmesg, as requested.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.