Bug 102765 - HP Spectre x360 internal monitor intermittently flickering & freezing
Summary: HP Spectre x360 internal monitor intermittently flickering & freezing
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-14 22:16 UTC by Nick Coghlan
Modified: 2018-09-10 16:44 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: display/eDP


Attachments
dmesg output without drm.debug set (oops) (76.70 KB, text/plain)
2017-09-14 22:16 UTC, Nick Coghlan
no flags Details
dmesg logs for healthy system state (265.17 KB, text/plain)
2017-10-31 12:34 UTC, Nick Coghlan
no flags Details
Dmesg logs for unhealthy system state (987.43 KB, text/plain)
2017-10-31 12:56 UTC, Nick Coghlan
no flags Details

Description Nick Coghlan 2017-09-14 22:16:49 UTC
Created attachment 134246 [details]
dmesg output without drm.debug set (oops)

This is a follow-up to BZ#101380, where I'm getting similar screen flickering, freezing, and whiting out symptoms to those described there, but it had previously been closed for lack of the requested debugging information. (If they'd be of value, I have some additional videos of the misbehaviour that I could upload, but the video already linked from that bug gives a pretty decent example of the visible symptoms.

The attached dmesg log was generated with drm.debug=0xe in the boot options, just after the internal monitoring had gone through an episode of freezing and whiting out (and while it was still flickering a bit).

The most suspicious looking lines are these ones:

==============
[   14.459261] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[   14.459268] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459279] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459338] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[   14.459342] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459349] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459404] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[   14.459408] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459415] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459474] input: HP WMI hotkeys as /devices/virtual/input/input27
[   14.459637] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[   14.459642] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459650] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459692] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[   14.459695] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[   14.459702] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
==============

Software details:

- Fedora 26
- KDE Plasma Desktop 5.10.1
- Display server: xorg-x11-server-Xorg-1.19.3-4.fc26.x86_64
- Driver: xorg-x11-drv-intel-2.99.917-28.20160929.fc26.x86_64

Hardware details:

$ sudo lspci -nn
[sudo] password for ncoghlan: 
00:00.0 Host bridge [0600]: Intel Corporation Broadwell-U Host Bridge -OPI [8086:1604] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 5500 [8086:1616] (rev 09)
00:03.0 Audio device [0403]: Intel Corporation Broadwell-U Audio Controller [8086:160c] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI Controller [8086:9cb1] (rev 03)
00:16.0 Communication controller [0780]: Intel Corporation Wildcat Point-LP MEI Controller #1 [8086:9cba] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation Wildcat Point-LP High Definition Audio Controller [8086:9ca0] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #2 [8086:9c92] (rev e3)
00:1c.2 PCI bridge [0604]: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 [8086:9c94] (rev e3)
00:1f.0 ISA bridge [0601]: Intel Corporation Wildcat Point-LP LPC Controller [8086:9cc3] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] [8086:9c83] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Wildcat Point-LP SMBus Controller [8086:9ca2] (rev 03)
01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader [10ec:5227] (rev 01)
02:00.0 Network controller [0280]: Intel Corporation Wireless 7265 [8086:095a] (rev 50)
Comment 1 Jani Nikula 2017-09-18 08:52:59 UTC
(In reply to Nick Coghlan from comment #0)
> This is a follow-up to BZ#101380

Bug #101380.
Comment 2 Jani Nikula 2017-09-18 08:55:42 UTC
The ACPI errors belong to http://bugzilla.kernel.org/.
Comment 3 Elizabeth 2017-09-19 16:23:24 UTC
(In reply to Jani Nikula from comment #2)
> The ACPI errors belong to http://bugzilla.kernel.org/.
Hello Nick Coghlan , could you please clone this bug to bugzilla.kernel.org and add the reference to the URL field above.
Comment 4 Nick Coghlan 2017-09-20 02:52:58 UTC
ACPI bug filed: https://bugzilla.kernel.org/show_bug.cgi?id=197007

Since there weren't any other problems readily apparent in the dmesg log, and I do have that "acpi_backlight=vendor" setting enabled, I'm going to close this as "not your bug".
Comment 5 Nick Coghlan 2017-09-26 00:58:30 UTC
The initial reaction from kernel.org was "not our bug either": https://bugzilla.kernel.org/show_bug.cgi?id=197007#c2

Since the ACPI errors were the only odd thing in the dmesg logs, and they happen at startup regardless of whether the screen ever starts flickering or not, is there something else I can do to further investigate the screen misbehaviour?
Comment 6 Jani Nikula 2017-09-26 10:22:11 UTC
Please add drm.debug=14 module parameter, attach dmesg from boot to reproducing the problem, and if possible please attach a picture or video of the flicker.
Comment 7 Jani Nikula 2017-09-26 10:22:36 UTC
Oh, and do try current drm-tip branch of https://cgit.freedesktop.org/drm/drm-tip
Comment 8 Nick Coghlan 2017-09-27 02:55:37 UTC
The already attached dmesg log is from a session with drm.debug=0xe, and runs all the way from boot to an internal screen freeze+flicker+recovery. Unfortunately, the only errors listed are the ACPI ones from early in the boot cycle (and the response for the ACPI devs was that they don't see how those could be related to problems with the internal monitor).

This is a video of what the freezing misbehaviour looks like on my machine: https://www.dropbox.com/s/hjenwbeitamnnp5/2017-08-07%2009.59.28.mp4?dl=0

Bug #101380 was a previous report of a similar issue, and their video is what made me believe they were encountering the same problem: https://www.dropbox.com/s/4ffyk8ddtht2w10/example.mp4
Comment 9 Jani Nikula 2017-09-27 14:40:23 UTC
(In reply to Nick Coghlan from comment #8)
> The already attached dmesg log is from a session with drm.debug=0xe, and
> runs all the way from boot to an internal screen freeze+flicker+recovery.

There are zero i915 debug messages in the dmesg, so something apparently went wrong with the drm.debug. Or you have set a loglevel that drops debug messages.
Comment 10 Nick Coghlan 2017-10-31 08:05:30 UTC
The internal display has been even more erratic after a recentish kernel update (now on 4.13.9-200.fc26.x86_64, and symptoms no include going entirely dark, as well as lines in all sorts of colours, not just white), so I looked into this again and realised I hadn't been writing my /etc/default/grub updates back to the EFI boot config properly so the drm.debug setting indeed wasn't being applied (/facepalm).

I've fixed that now, so should be able to provide proper debug logs tomorrow.
Comment 11 Nick Coghlan 2017-10-31 12:34:31 UTC
Created attachment 135180 [details]
dmesg logs for healthy system state

Initial set of dmesg logs with system startup details. At this point, the internal monitor is still operating fine. (Hopefully this doesn't turn out to be a race condition, where adding debugging changes the timing enough that the issue doesn't occur)

I also tried the more detailed debug log settings requested at https://01.org/linuxgraphics/documentation/how-report-bugs for more recent kernels, but with that level of logging, the ring buffer had filled before I got to a terminal emulator, even with log_buf_len=2M  set.
Comment 12 Nick Coghlan 2017-10-31 12:56:11 UTC
Created attachment 135181 [details]
Dmesg logs for unhealthy system state

The good news is the internal monitor still misbehaved with debugging messages turned on.

The less helpful news is that aside from a couple of warnings about perf interrupts taking too long for the current sampling rate, there don't seem to be any notable differences between the logs for the healthy system state and the logs for the unhealthy one.
Comment 13 Mark Spencer 2017-11-18 21:32:25 UTC
Hi Nick!

I'm original bug reporter. We have probably same issues so having one bugreport instead of two would be better but nevermind. I didn't respond on my bugreport because I had nothing to add beside my logs which were probably missed by devs basing on their answers to me.

For the record I attached my logs in original bugreport here:
https://bugs.freedesktop.org/attachment.cgi?id=131875

I can confirm this bug affects kernel 4.14 as well. I would be glad if devs can give us any clues about this. I noticed that we both use KDE, maybe it's related?


I wanted to clear some off-topic things:

1. The acpi errors are from hp_wmi kernel module. You can disable this module by creating blacklist.conf file in /etc/modprobe.d and add line "blacklist hp_wmi" to it. On my machine (similar to yours) disabling it doesn't have any repercussions.

2. Are you sure you have still sound problem on 4.13 kernel? To my knowledge those sound issues were fixed long time ago and you can safely drop "acpi_backlight=vendor "acpi_osi=!Windows 2013" "acpi_osi=!Windows 2012" from cmdline.
Comment 14 Nick Coghlan 2017-11-19 03:51:24 UTC
Filing my own report as a new bug rather than reopening the original one came from https://bugs.freedesktop.org/show_bug.cgi?id=101380#c7 . I don't have a strong opinion on whether one or the other should be explicitly marked as a duplicate.

Being KDE specific certainly sounds plausible to me - I've seen at least one case (several years ago now) where KDE was hitting a crash bug in the driver that Gnome (2, at the time) consistently avoided due to differences in how they interacted with the driver.

And thanks for the other tips! I've started collecting notes on the current state of the Spectre's Linux driver support here: https://gitlab.com/ncoghlan_dev/desktop-linux/blob/master/README.md#hp-spectre-x360-fedora-22-kde-spin
Comment 15 Mark Spencer 2018-01-29 12:39:41 UTC
My machine has passed away. No idea if it was related to this issue and I would never find out.

@Nick Coghlan you are on your own now. Good luck!
Comment 16 Nick Coghlan 2018-01-30 03:26:03 UTC
I've mostly moved on to a new machine as well (a System 76 Galaga Pro). I still have the HP Spectre x360 for the time being, but I haven't decided yet if I'm going to put a developmental kernel on it and start learning how to debug Linux hardware driver problems, or revert it to Windows and give it away.
Comment 17 Jani Saarinen 2018-03-29 07:10:28 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 18 Nick Coghlan 2018-03-29 11:16:07 UTC
It was still happening with whichever version was current in Fedora as of my last comment above in January.

I'm updating the machine with a couple of months worth of Fedora updates now, but I'm not currently set up to run pre-release kernels and/or drivers.
Comment 19 Nick Coghlan 2018-03-29 11:25:21 UTC
Just noting which version of the kernel the machine is going to be running after the Fedora updates: kernel.x86_64 4.15.12-301.fc27

That machine has been switched off long enough that I doubt I'm going to have anything further to report tonight, though :)
Comment 20 Nick Coghlan 2018-03-29 13:21:55 UTC
I rebooted the Spectre into the 4.15.12 Fedora kernel, and it's still exhibiting the same misbehaviour reported previously.

If I attempt to reproduce this with a custom build from git, which parts of https://01.org/linuxgraphics/documentation/build-guide-0 will I need to follow? Just libdrm? Or more than that?
Comment 21 Elizabeth 2018-04-02 21:56:58 UTC
Hi, it should be the "building kernel". You can copy your actual config from (usually) /boot/config-your_actual_kernel_version to .config into the git folder. if anything goes wrong after installing, you still can boot to your older kernel from grub and remove the custom build.
Comment 22 Jani Saarinen 2018-04-23 19:16:31 UTC
Nick, have you been able to test https://cgit.freedesktop.org/drm-tip ?
Comment 23 Lakshmi 2018-09-08 22:54:04 UTC
Nick, any updates on this issue? Do you still have the issue?

Try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Comment 24 Lakshmi 2018-09-10 16:44:24 UTC
No feedback from many months, closing as resolved works for me.
Please re-open if issue persists with latest drm-tip https://cgit.freedesktop.org/drm-tip and send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.