Created attachment 134246 [details]
dmesg output without drm.debug set (oops)
This is a follow-up to BZ#101380, where I'm getting similar screen flickering, freezing, and whiting out symptoms to those described there, but it had previously been closed for lack of the requested debugging information. (If they'd be of value, I have some additional videos of the misbehaviour that I could upload, but the video already linked from that bug gives a pretty decent example of the visible symptoms.
The attached dmesg log was generated with drm.debug=0xe in the boot options, just after the internal monitoring had gone through an episode of freezing and whiting out (and while it was still flickering a bit).
The most suspicious looking lines are these ones:
[ 14.459261] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[ 14.459268] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459279] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459338] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[ 14.459342] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459349] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459404] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[ 14.459408] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459415] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459474] input: HP WMI hotkeys as /devices/virtual/input/input27
[ 14.459637] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[ 14.459642] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459650] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459692] ACPI Error: Field [D128] at 1152 exceeds Buffer [NULL] size 160 (bits) (20170303/dsopcode-236)
[ 14.459695] ACPI Error: Method parse/execution failed [\HWMC] (Node ffff8edcc61507f8), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
[ 14.459702] ACPI Error: Method parse/execution failed [\_SB.WMID.WMAA] (Node ffff8edcc61523c0), AE_AML_BUFFER_LIMIT (20170303/psparse-543)
- Fedora 26
- KDE Plasma Desktop 5.10.1
- Display server: xorg-x11-server-Xorg-1.19.3-4.fc26.x86_64
- Driver: xorg-x11-drv-intel-2.99.917-28.20160929.fc26.x86_64
$ sudo lspci -nn
[sudo] password for ncoghlan:
00:00.0 Host bridge : Intel Corporation Broadwell-U Host Bridge -OPI [8086:1604] (rev 09)
00:02.0 VGA compatible controller : Intel Corporation HD Graphics 5500 [8086:1616] (rev 09)
00:03.0 Audio device : Intel Corporation Broadwell-U Audio Controller [8086:160c] (rev 09)
00:14.0 USB controller [0c03]: Intel Corporation Wildcat Point-LP USB xHCI Controller [8086:9cb1] (rev 03)
00:16.0 Communication controller : Intel Corporation Wildcat Point-LP MEI Controller #1 [8086:9cba] (rev 03)
00:1b.0 Audio device : Intel Corporation Wildcat Point-LP High Definition Audio Controller [8086:9ca0] (rev 03)
00:1c.0 PCI bridge : Intel Corporation Wildcat Point-LP PCI Express Root Port #2 [8086:9c92] (rev e3)
00:1c.2 PCI bridge : Intel Corporation Wildcat Point-LP PCI Express Root Port #3 [8086:9c94] (rev e3)
00:1f.0 ISA bridge : Intel Corporation Wildcat Point-LP LPC Controller [8086:9cc3] (rev 03)
00:1f.2 SATA controller : Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] [8086:9c83] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Wildcat Point-LP SMBus Controller [8086:9ca2] (rev 03)
01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader [10ec:5227] (rev 01)
02:00.0 Network controller : Intel Corporation Wireless 7265 [8086:095a] (rev 50)
(In reply to Nick Coghlan from comment #0)
> This is a follow-up to BZ#101380
The ACPI errors belong to http://bugzilla.kernel.org/.
(In reply to Jani Nikula from comment #2)
> The ACPI errors belong to http://bugzilla.kernel.org/.
Hello Nick Coghlan , could you please clone this bug to bugzilla.kernel.org and add the reference to the URL field above.
ACPI bug filed: https://bugzilla.kernel.org/show_bug.cgi?id=197007
Since there weren't any other problems readily apparent in the dmesg log, and I do have that "acpi_backlight=vendor" setting enabled, I'm going to close this as "not your bug".
The initial reaction from kernel.org was "not our bug either": https://bugzilla.kernel.org/show_bug.cgi?id=197007#c2
Since the ACPI errors were the only odd thing in the dmesg logs, and they happen at startup regardless of whether the screen ever starts flickering or not, is there something else I can do to further investigate the screen misbehaviour?
Please add drm.debug=14 module parameter, attach dmesg from boot to reproducing the problem, and if possible please attach a picture or video of the flicker.
Oh, and do try current drm-tip branch of https://cgit.freedesktop.org/drm/drm-tip
The already attached dmesg log is from a session with drm.debug=0xe, and runs all the way from boot to an internal screen freeze+flicker+recovery. Unfortunately, the only errors listed are the ACPI ones from early in the boot cycle (and the response for the ACPI devs was that they don't see how those could be related to problems with the internal monitor).
This is a video of what the freezing misbehaviour looks like on my machine: https://www.dropbox.com/s/hjenwbeitamnnp5/2017-08-07%2009.59.28.mp4?dl=0
Bug #101380 was a previous report of a similar issue, and their video is what made me believe they were encountering the same problem: https://www.dropbox.com/s/4ffyk8ddtht2w10/example.mp4
(In reply to Nick Coghlan from comment #8)
> The already attached dmesg log is from a session with drm.debug=0xe, and
> runs all the way from boot to an internal screen freeze+flicker+recovery.
There are zero i915 debug messages in the dmesg, so something apparently went wrong with the drm.debug. Or you have set a loglevel that drops debug messages.
The internal display has been even more erratic after a recentish kernel update (now on 4.13.9-200.fc26.x86_64, and symptoms no include going entirely dark, as well as lines in all sorts of colours, not just white), so I looked into this again and realised I hadn't been writing my /etc/default/grub updates back to the EFI boot config properly so the drm.debug setting indeed wasn't being applied (/facepalm).
I've fixed that now, so should be able to provide proper debug logs tomorrow.
Created attachment 135180 [details]
dmesg logs for healthy system state
Initial set of dmesg logs with system startup details. At this point, the internal monitor is still operating fine. (Hopefully this doesn't turn out to be a race condition, where adding debugging changes the timing enough that the issue doesn't occur)
I also tried the more detailed debug log settings requested at https://01.org/linuxgraphics/documentation/how-report-bugs for more recent kernels, but with that level of logging, the ring buffer had filled before I got to a terminal emulator, even with log_buf_len=2M set.
Created attachment 135181 [details]
Dmesg logs for unhealthy system state
The good news is the internal monitor still misbehaved with debugging messages turned on.
The less helpful news is that aside from a couple of warnings about perf interrupts taking too long for the current sampling rate, there don't seem to be any notable differences between the logs for the healthy system state and the logs for the unhealthy one.
I'm original bug reporter. We have probably same issues so having one bugreport instead of two would be better but nevermind. I didn't respond on my bugreport because I had nothing to add beside my logs which were probably missed by devs basing on their answers to me.
For the record I attached my logs in original bugreport here:
I can confirm this bug affects kernel 4.14 as well. I would be glad if devs can give us any clues about this. I noticed that we both use KDE, maybe it's related?
I wanted to clear some off-topic things:
1. The acpi errors are from hp_wmi kernel module. You can disable this module by creating blacklist.conf file in /etc/modprobe.d and add line "blacklist hp_wmi" to it. On my machine (similar to yours) disabling it doesn't have any repercussions.
2. Are you sure you have still sound problem on 4.13 kernel? To my knowledge those sound issues were fixed long time ago and you can safely drop "acpi_backlight=vendor "acpi_osi=!Windows 2013" "acpi_osi=!Windows 2012" from cmdline.
Filing my own report as a new bug rather than reopening the original one came from https://bugs.freedesktop.org/show_bug.cgi?id=101380#c7 . I don't have a strong opinion on whether one or the other should be explicitly marked as a duplicate.
Being KDE specific certainly sounds plausible to me - I've seen at least one case (several years ago now) where KDE was hitting a crash bug in the driver that Gnome (2, at the time) consistently avoided due to differences in how they interacted with the driver.
And thanks for the other tips! I've started collecting notes on the current state of the Spectre's Linux driver support here: https://gitlab.com/ncoghlan_dev/desktop-linux/blob/master/README.md#hp-spectre-x360-fedora-22-kde-spin
My machine has passed away. No idea if it was related to this issue and I would never find out.
@Nick Coghlan you are on your own now. Good luck!
I've mostly moved on to a new machine as well (a System 76 Galaga Pro). I still have the HP Spectre x360 for the time being, but I haven't decided yet if I'm going to put a developmental kernel on it and start learning how to debug Linux hardware driver problems, or revert it to Windows and give it away.
First of all. Sorry about spam.
This is mass update for our bugs.
Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!
If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
It was still happening with whichever version was current in Fedora as of my last comment above in January.
I'm updating the machine with a couple of months worth of Fedora updates now, but I'm not currently set up to run pre-release kernels and/or drivers.
Just noting which version of the kernel the machine is going to be running after the Fedora updates: kernel.x86_64 4.15.12-301.fc27
That machine has been switched off long enough that I doubt I'm going to have anything further to report tonight, though :)
I rebooted the Spectre into the 4.15.12 Fedora kernel, and it's still exhibiting the same misbehaviour reported previously.
If I attempt to reproduce this with a custom build from git, which parts of https://01.org/linuxgraphics/documentation/build-guide-0 will I need to follow? Just libdrm? Or more than that?
Hi, it should be the "building kernel". You can copy your actual config from (usually) /boot/config-your_actual_kernel_version to .config into the git folder. if anything goes wrong after installing, you still can boot to your older kernel from grub and remove the custom build.
Nick, have you been able to test https://cgit.freedesktop.org/drm-tip ?
Nick, any updates on this issue? Do you still have the issue?
Try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
No feedback from many months, closing as resolved works for me.
Please re-open if issue persists with latest drm-tip https://cgit.freedesktop.org/drm-tip and send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?