Bug 86000

Summary: 2-3 second black screen on MacBook Pro 11,1 (13" retina)
Product: DRI Reporter: samcurry <haskell.curry1909>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: minor    
Priority: medium CC: catalin.hritcu, haskell.curry1909, intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
Xorg.0.log
none
dmesg
none
Xorg.0.log
none
dmesg-debug
none
dmesg-debug
none
dmesg-debug
none
dmesg-debug-latestKernel
none
dmesg-suspend-wakeup none

Description samcurry 2014-11-07 12:30:18 UTC
Created attachment 109081 [details]
dmesg

First off I apologize since I'll probably require some hand-holding. I have a MacBook11,1 (13" retina from 2014 as stated) running Arch Linux (regularly updated). The symptoms are simple: everything works fine (i.e. no tearing etc.) but the screen will randomly go black for about 2-3 seconds. It appears that tapping a key or moving the mouse does speed up "recovery", but otherwise I do not have a reliable way of triggering this problem. It *appears* to never happen after a fresh reboot, but only after a few hours of uptime including suspend-wake cycles. It's rather unpleasant but not a show-stopper of course.

I'm not a 100% sure, but it looks like there are dmesg entries accompanying every blackout.
Comment 1 samcurry 2014-11-07 12:31:04 UTC
Created attachment 109082 [details]
Xorg.0.log
Comment 2 samcurry 2014-11-07 17:56:12 UTC
Hm, this is interesting. I spent all day at work, and it seemed to work fine. Now, I'm at home, and it's started playing up within about 20 minutes...

Differences:
- playing music through USB audio interface (also have a different one at work, although today I didn't touch it)
- on battery power (although I'm pretty sure this has happened on AC too)
- different WiFi network (this has previously happened at work, though)

Seems the USB interface is perhaps most likely to be causing interference? Although, I've had this behaviour without it plugged in, too... Though maybe having used it once then unplugging it again might trigger whatever it is that goes wrong? How could I investigate this?

Another note: I have enabled a few of powertop's suggestions, such as autosuspend for certain USB devices... I will try disabling them again. Attaching another updated dmesg too.
Comment 3 samcurry 2014-11-07 18:05:11 UTC
Created attachment 109100 [details]
dmesg
Comment 4 samcurry 2014-11-07 18:05:35 UTC
Created attachment 109101 [details]
Xorg.0.log
Comment 5 samcurry 2014-11-08 12:42:26 UTC
OK, it's rather frustrating today: computer is idle, but every 30 seconds or so it happens. Here's some output (note that the timestamps in `dmesg` don't actually reflect wall time).

Now that I'm interacting with the machine, it seems to have let up a bit... So weird. Currently no USB devices connected, running on battery.

13:38:11 $ dmesg -T | tail
[Fri Nov  7 06:59:13 2014]  [<ffffffff81087d3b>] ? __queue_delayed_work+0x8b/0x1c0
[Fri Nov  7 06:59:13 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 06:59:13 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 06:59:13 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 06:59:13 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 06:59:13 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 06:59:13 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 06:59:13 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 06:59:13 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 06:59:13 2014] ---[ end trace 1521c397a8b333ed ]---
13:38:20 $ dmesg -T | tail
[Fri Nov  7 06:59:49 2014]  [<ffffffff81087d3b>] ? __queue_delayed_work+0x8b/0x1c0
[Fri Nov  7 06:59:49 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 06:59:49 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 06:59:49 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 06:59:49 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 06:59:49 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 06:59:49 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 06:59:49 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 06:59:49 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 06:59:49 2014] ---[ end trace 1521c397a8b333f2 ]---
13:38:46 $ dmesg -T | tail
[Fri Nov  7 07:00:03 2014]  [<ffffffffa00babb1>] ? kcryptd_crypt+0x211/0x4b9 [dm_crypt]
[Fri Nov  7 07:00:03 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 07:00:03 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 07:00:03 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 07:00:03 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 07:00:03 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 07:00:03 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:03 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 07:00:03 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:03 2014] ---[ end trace 1521c397a8b333f7 ]---
13:39:09 $ dmesg -T | tail
[Fri Nov  7 07:00:33 2014]  [<ffffffff81087d3b>] ? __queue_delayed_work+0x8b/0x1c0
[Fri Nov  7 07:00:33 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 07:00:33 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 07:00:33 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 07:00:33 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 07:00:33 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 07:00:33 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:33 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 07:00:33 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:33 2014] ---[ end trace 1521c397a8b333fc ]---
13:39:32 $ dmesg -T | tail
[Fri Nov  7 07:00:49 2014]  [<ffffffff81087d3b>] ? __queue_delayed_work+0x8b/0x1c0
[Fri Nov  7 07:00:49 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 07:00:49 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 07:00:49 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 07:00:49 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 07:00:49 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 07:00:49 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:49 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 07:00:49 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:00:49 2014] ---[ end trace 1521c397a8b33401 ]---
13:39:45 $ dmesg -T | tail
[Fri Nov  7 07:01:01 2014]  [<ffffffff813b6e4a>] pm_runtime_work+0x7a/0xd0
[Fri Nov  7 07:01:01 2014]  [<ffffffff81088b85>] process_one_work+0x145/0x400
[Fri Nov  7 07:01:01 2014]  [<ffffffff8108914b>] worker_thread+0x6b/0x4a0
[Fri Nov  7 07:01:01 2014]  [<ffffffff810890e0>] ? init_pwq.part.22+0x10/0x10
[Fri Nov  7 07:01:01 2014]  [<ffffffff8108e06a>] kthread+0xea/0x100
[Fri Nov  7 07:01:01 2014]  [<ffffffff81070000>] ? register_exec_domain+0x20/0xa0
[Fri Nov  7 07:01:01 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:01:01 2014]  [<ffffffff8153c6fc>] ret_from_fork+0x7c/0xb0
[Fri Nov  7 07:01:01 2014]  [<ffffffff8108df80>] ? kthread_create_on_node+0x1b0/0x1b0
[Fri Nov  7 07:01:01 2014] ---[ end trace 1521c397a8b33406 ]---
Comment 6 Imre Deak 2014-11-10 22:05:50 UTC
This is a breakage of power well refcounting around VDD on/off. I can think of some race that would be fixed by Ville's VDD/PPS locking fixes. Could you provide a log booting with drm.debug=14? Also could you try if you can reproduce the problem with the drm-intel-nightly branch from git://anongit.freedesktop.org/drm-intel and provide a log for that too?
Comment 7 samcurry 2014-11-12 10:52:24 UTC
OK, I have rebooted using the drm.debug=14 argument. I haven't yet had an error, but if I do, I'll post the `dmesg` output. If you think it's useless to do that without having tried the nightly driver, I can skip ahead and try to provoke the error using the bleeding edge driver.
Comment 8 samcurry 2014-11-12 13:02:18 UTC
Alright, I haven't yet had the black screen problem, but I have noticed that the error appears in dmesg when changing backlight brightness too. I'll attach the new dmesg shortly.

About trying drm-intel-nightly: I'm not quite sure how. I've cloned the repository, which too quite a while, but when I try to build the contents, it tells me to configure. I do `make xconfig` and accept all the defaults, then build, but I'm not sure if that's correct, nor what I should do to try the new driver (I take it I shouldn't be using the whole kernel compiled this way?). Please advise.
Comment 9 samcurry 2014-11-12 13:05:11 UTC
Created attachment 109336 [details]
dmesg-debug

dmesg, with drm.debug=14. Blank screen hasn't yet occurred, but similar errors appear when changing panel backlight brightness.
Comment 10 samcurry 2014-11-12 13:11:23 UTC
Created attachment 109337 [details]
dmesg-debug

dmesg, with drm.debug=14. Blank screen has just occurred; similar errors appear when changing panel backlight brightness.
Comment 11 Jani Nikula 2014-11-12 14:36:54 UTC
(In reply to samcurry from comment #8)
> Alright, I haven't yet had the black screen problem, but I have noticed that
> the error appears in dmesg when changing backlight brightness too. I'll
> attach the new dmesg shortly.

This is unsurprising and in line with comment #6.

Bug 86105 is a possible duplicate.

> About trying drm-intel-nightly: I'm not quite sure how. I've cloned the
> repository, which too quite a while, but when I try to build the contents,
> it tells me to configure. I do `make xconfig` and accept all the defaults,
> then build, but I'm not sure if that's correct, nor what I should do to try
> the new driver (I take it I shouldn't be using the whole kernel compiled
> this way?). Please advise.

I usually either run 'make localmodconfig' or start with the distro config. Many distros install it in /boot/config-*. It's possible your current kernel is also configured to include the config in /proc/config.gz.

This is generally not a kernel build helpline; please refer to http://kernelnewbies.org/KernelBuild or your distro's documentation (Google finds https://wiki.archlinux.org/index.php/Kernels/Compilation/Traditional).
Comment 12 samcurry 2014-11-12 15:12:03 UTC
Created attachment 109348 [details]
dmesg-debug

I'm really sorry for the spam, but I discovered that the kernel ring buffer was getting full, meaning that the log only showed the error state, not the boot. Here's the full log, including exceptions, but on the old kernel. I'm building the nightly kernel now, to try and reproduce the error.
Comment 13 samcurry 2014-11-12 18:18:07 UTC
(In reply to Imre Deak from comment #6)
> Also could you try if you can
> reproduce the problem with the drm-intel-nightly branch from
> git://anongit.freedesktop.org/drm-intel and provide a log for that too?

Well, I don't want to get ahead of myself, but after having compiled the kernel from drm-intel-nightly as suggested, so far so good! I'm not sure if there's any sense to it, but I'll attach my `dmesg` as suggested above, from the bleeding edge kernel.
Comment 14 samcurry 2014-11-12 18:20:48 UTC
Created attachment 109365 [details]
dmesg-debug-latestKernel
Comment 15 samcurry 2014-11-13 11:21:45 UTC
Created attachment 109399 [details]
dmesg-suspend-wakeup

OK, still no exceptions, which is nice.

Another problem has surfaced though, namely that my machine will not suspend properly. It suspends and then immediately awakes. Perhaps this requires opening a new bug? Or perhaps it relates to another component entirely?

This is the log from the moment I request a suspend, and at line 52,
> Nov 13 12:11:17 hostname kernel: ACPI: Low-level resume complete
it seems to have decided to wake up again. My apologies if this is entirely unrelated, but I only have this behaviour on the nightly kernel.
Comment 16 samcurry 2014-11-14 13:11:10 UTC
(In reply to samcurry from comment #15)
> Another problem has surfaced though, namely that my machine will not suspend
> properly. It suspends and then immediately awakes. Perhaps this requires
> opening a new bug? Or perhaps it relates to another component entirely?

Never mind, solved using method described in <https://bugzilla.kernel.org/show_bug.cgi?id=16512>.

Personally I consider the black-screen bug solved in the nightly kernel, in any case.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.