Bug 105637 - i915 crashes MacBook2,1 with commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec (4.15)
Summary: i915 crashes MacBook2,1 with commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: low normal
Assignee: Lakshmi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords: bisected
: 106160 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-03-21 00:15 UTC by Dennis
Modified: 2019-01-13 01:29 UTC (History)
6 users (show)

See Also:
i915 platform: I945GM
i915 features: display/Other


Attachments
modprobe i915 (20.42 KB, text/plain)
2018-03-21 00:16 UTC, Dennis
no flags Details
dmesg with modprobe i915 with debug.drm=14 (92.43 KB, text/plain)
2018-03-22 00:08 UTC, Dennis
no flags Details
using drm-tip, dmesg -w, i modprobe i915 at about 297s in (55.05 KB, text/plain)
2018-05-04 19:24 UTC, Dennis
no flags Details
dmesg modprobe i915 on linux 4.14.13 drm.debug=14 (at 146 seconds in) (129.00 KB, text/plain)
2018-05-20 09:54 UTC, Peter Nowee
no flags Details
git bisect log on linux (torvalds tree) for MacBook 2,1 i915 GPU hang (3.29 KB, text/plain)
2018-05-28 06:13 UTC, Peter Nowee
no flags Details
dmesg for last good commit d87ce76402950b drm.debug=14 (modprobe i915 at 116 seconds in) (115.08 KB, text/plain)
2018-09-17 10:00 UTC, Peter Nowee
no flags Details
dmesg for first bad commit 23ac127328259 drm.debug=14 (no log after modprobe i915 as it hangs the computer) (61.16 KB, text/plain)
2018-09-17 10:03 UTC, Peter Nowee
no flags Details
dmesg for first bad commit 23ac127328259 drm.debug=14 (modprobe i915 at 345 seconds in) (70.18 KB, text/plain)
2018-09-21 09:40 UTC, Peter Nowee
no flags Details
netconsole output with the above patches and the 5s delay (7.76 KB, text/plain)
2018-09-26 15:02 UTC, Dennis
no flags Details
[PATCH] drm/i915: Restore vblank interrupts earlier (2.46 KB, patch)
2018-09-27 19:41 UTC, Ville Syrjala
no flags Details | Splinter Review
[PATCH v2] drm/i915: Restore vblank interrupts earlier (2.50 KB, patch)
2018-09-27 19:45 UTC, Ville Syrjala
no flags Details | Splinter Review
output for the previous v2 patch (52.73 KB, text/plain)
2018-09-27 20:40 UTC, Dennis
no flags Details
[PATCH] drm/i915: Use the correct crtc when sanitizing plane mapping (5.55 KB, patch)
2018-09-28 12:57 UTC, Ville Syrjala
no flags Details | Splinter Review
output for the previous correct-crtc patch (39.53 KB, text/plain)
2018-09-28 20:01 UTC, Dennis
no flags Details
[PATCH v2 2/3] drm/i915: Use the correct crtc when sanitizing plane mapping (7.51 KB, patch)
2018-10-03 16:31 UTC, Ville Syrjala
no flags Details | Splinter Review

Description Dennis 2018-03-21 00:15:26 UTC
The screen backlight on my MacBook2,1 laptop no longer works, apparently because I'm booting in EFI mode, instead of the previous BIOS (CSM?) mode. (Because my original laptop harddrive broke, and I only seem able to boot off an external USB drive in EFI mode.)

My machine has a 32-bit EFI and a 64-bit cpu.

Lukas Wunner (very helpful dude!!) suggests that i915 is required to control my backlight in EFI mode (I don't think I used i915/drm previously when booting normally off the internal harddrive, and it worked fine, in /sys/class/backlight).

  https://bugzilla.kernel.org/show_bug.cgi?id=199091

However, i915 is crashing my system -- immediately after trying to load the driver, my laptop hangs and the screen gets messed up with random horizontal white lines. I'll attach the output I captured after recompiling it as a module, and booting with the drm.debug=0xf kernel option.
Comment 1 Dennis 2018-03-21 00:16:08 UTC
Created attachment 138232 [details]
modprobe i915
Comment 2 Elizabeth 2018-03-21 15:36:40 UTC
Hello, Could you also share full dmesg with drm.debug=14 and xorg.log?
Comment 3 Dennis 2018-03-22 00:08:24 UTC
Created attachment 138266 [details]
dmesg with modprobe i915 with debug.drm=14
Comment 4 Dennis 2018-03-22 00:09:13 UTC
Comment on attachment 138266 [details]
dmesg with modprobe i915 with debug.drm=14

(I don't use/want X, so no xorg.conf available)
Comment 5 Jani Saarinen 2018-03-29 07:11:27 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 6 Jani Saarinen 2018-04-25 11:28:44 UTC
Jani, any suggestions here?
Comment 7 Jani Nikula 2018-04-25 11:43:48 UTC
Subject says "crash", features says "GPU hang", comment #0 says backlight. Which is it?
Comment 8 Dennis 2018-04-25 18:45:23 UTC
GPU hang. The backlight/efi issue was simply my motivation to start using the i915 driver.
Comment 9 Jani Nikula 2018-04-26 07:10:18 UTC
(In reply to Dennis from comment #8)
> GPU hang. The backlight/efi issue was simply my motivation to start using
> the i915 driver.

But... there's zero indication of a GPU hang anywhere!
Comment 10 Jani Nikula 2018-04-26 07:14:46 UTC
Please add drm.debug=14, don't do any of the module moving hacks indicated in the dmesg, and attach full dmesg from boot, reproducing whatever problem you have.
Comment 11 Dennis 2018-04-26 11:05:20 UTC
I'm only able to get a dmesg if I first boot without i915 loaded, so I can ssh in, and capture whatever output I can remotely before my system crashes. That's what that dmesg that I provided is ... my gpu/system hangs/crashes at the end of that log and I have to forcefully reboot.
Comment 12 Jani Saarinen 2018-05-04 12:26:08 UTC
Still missing latest logs.
Comment 13 Dennis 2018-05-04 12:33:57 UTC
I was only asked for a "full dmesg with drm.debug=14 and xorg.log" (and I don't have X)... what am I missing?
Comment 14 Jani Saarinen 2018-05-04 12:52:30 UTC
Have you tried with latest drm-tip https://cgit.freedesktop.org/drm-tip?
Comment 15 Dennis 2018-05-04 19:24:42 UTC
Created attachment 139355 [details]
using drm-tip, dmesg -w, i modprobe i915 at about 297s in
Comment 16 Peter Nowee 2018-05-20 09:54:15 UTC
Created attachment 139646 [details]
dmesg modprobe i915 on linux 4.14.13 drm.debug=14 (at 146 seconds in)

I have a MacBook2,1 as well and can confirm the issue. I am booting 
from the internal HDD, not an external USB drive. I am using Debian 
Stretch with stable and stable-backports kernels.

I tried loading i915 on several kernels:

- 4.9.88: Works.
- 4.11.6: Works.
- 4.12.6: Works.
- 4.13.13: Works.
- 4.14.13: Works.

- 4.15.11: Hangs.
- 4.16.5: Hangs.

Where it works (<=4.14), there are still warnings and traces in the 
dmesg logs that might provide some clues as to what is going wrong in 
later versions (>=4.15). Attached is the dmesg log for 4.14.13.
Comment 17 Jani Nikula 2018-05-21 09:10:02 UTC
It looks like a git bisect between v4.14 and v4.15 would be the best bet.
Comment 18 Peter Nowee 2018-05-22 14:24:52 UTC
No clues in the warnings and traces that the earlier kernels gave already?

(see 4.14 dmesg attachment of my previous comment)
Comment 19 Jani Nikula 2018-05-22 15:55:59 UTC
(In reply to Peter Nowee from comment #18)
> No clues in the warnings and traces that the earlier kernels gave already?

Not really.

It's also still completely unclear to me what the actual failure mode is.
Comment 20 Peter Nowee 2018-05-28 06:13:49 UTC
Created attachment 139806 [details]
git bisect log on linux (torvalds tree) for MacBook 2,1 i915 GPU hang

Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is:

    commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec
    Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Date:   Fri Nov 17 21:19:09 2017 +0200
    
        drm/i915: Redo plane sanitation during readout
        
        Unify the plane disabling during state readout by pulling the code into
        a new helper intel_plane_disable_noatomic(). We'll also read out the
        state of all planes, so that we know which planes really need to be
        diabled.
        
        Additonally we change the plane<->pipe mapping sanitation to work by
        simply disabling the offending planes instead of entire pipes. And
        we do it before we otherwise sanitize the crtcs, which means we don't
        have to worry about misassigned planes during crtc sanitation anymore.
        
        v2: Reoder patches to not depend on enum old_plane_id
        v3: s/for_each_pipe/for_each_intel_crtc/
        
        Cc: Thierry Reding <thierry.reding@gmail.com>
        Cc: Alex Villacís Lasso <alexvillacislasso@hotmail.com>
        Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103223
        Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
        Tested-by: Thierry Reding <thierry.reding@gmail.com>
        Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-3-ville.syrjala@linux.intel.com
        Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
        (cherry picked from commit b1e01595a66dc206a2c75401ec4c285740537f3f)
        Signed-off-by: Jani Nikula <jani.nikula@intel.com>

Attached is the bisect log.

I don't really know how to go from here. I tried reverting that commit on top of the current master, but that leads to merge conflicts. I tried to solve them, but did not succeed, because too much has changed. Hope someone else can pick it up from here. But let me know if you need some more info or if I need to test something.
Comment 21 Dennis 2018-08-26 11:18:04 UTC
Still hangs/crashes in 4.18.5
Comment 22 Jani Nikula 2018-09-14 12:58:54 UTC
(In reply to Peter Nowee from comment #20)
> Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is:
> 
>     commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec
>     Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
>     Date:   Fri Nov 17 21:19:09 2017 +0200
>     
>         drm/i915: Redo plane sanitation during readout

Please double check the bisect by trying commits 23ac127328259 and d87ce76402950b, and attaching dmesg for each.
Comment 23 Peter Nowee 2018-09-17 10:00:18 UTC
Created attachment 141595 [details]
dmesg for last good commit d87ce76402950b drm.debug=14 (modprobe i915 at 116 seconds in)
Comment 24 Peter Nowee 2018-09-17 10:03:03 UTC
Created attachment 141596 [details]
dmesg for first bad commit 23ac127328259 drm.debug=14 (no log after modprobe i915 as it hangs the computer)
Comment 25 Peter Nowee 2018-09-17 10:11:54 UTC
I double-checked both commits: When running `modprobe i915`, 23ac127328259 still hangs, whereas d87ce76402950b still works ok. See my previous two comments for the dmesg log attachments. 

Note that I do not have any kernel logging from when it hangs. Reading the kernel messages over SSH also did not produce any additional information.
Comment 26 Peter Nowee 2018-09-21 09:40:40 UTC
Created attachment 141672 [details]
dmesg for first bad commit 23ac127328259 drm.debug=14 (modprobe i915 at 345 seconds in)

On second try, I am able to get more kernel messages over SSH before it hangs after all. See this new attachment for commit 23ac127328259 (first bad commit).
Comment 27 Peter Nowee 2018-09-21 14:28:29 UTC
*** Bug 106160 has been marked as a duplicate of this bug. ***
Comment 28 Jani Nikula 2018-09-26 11:13:50 UTC
(In reply to Peter Nowee from comment #20)
> Created attachment 139806 [details]
> git bisect log on linux (torvalds tree) for MacBook 2,1 i915 GPU hang
> 
> Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is:
> 
>     commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec
>     Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
>     Date:   Fri Nov 17 21:19:09 2017 +0200
>     
>         drm/i915: Redo plane sanitation during readout

Cc: Ville. His commit, confirmed regression, and an old platform. A perfect match. ;)
Comment 29 Ville Syrjala 2018-09-26 12:06:55 UTC
Is there an ethernet port on the machine by any change?

If yes, netconsole should be capable of giving us more of the dmesg.

If no, maybe someting like this would let you catch more of the debug output before the hang:
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index c13b2f5704b6..3cc0df6b5ebe 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -16164,6 +16164,8 @@ intel_modeset_setup_hw_state(struct drm_device *dev,
        /* HW state is read out, now we need to sanitize this mess. */
        get_encoder_power_domains(dev_priv);
 
+       msleep(5000);
+
        intel_sanitize_plane_mapping(dev_priv);
 
        for_each_intel_encoder(dev, encoder) {


One peculiar thing about the good logs is that there are no planes enabled when we initialize the driver. That would suggest that it's using the VGA plane, which is rather odd considering it's supposed to be using EFI.

This should tells whether the VGA plane is enabled or not:
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index c13b2f5704b6..0e976dd2535f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15311,10 +15311,13 @@ static void i915_disable_vga(struct drm_i915_private *dev_priv)
        u8 sr1;
        i915_reg_t vga_reg = i915_vgacntrl_reg(dev_priv);
 
+       DRM_DEBUG_KMS("VGA = 0x%08x\n", I915_READ(vga_reg));
+
        /* WaEnableVGAAccessThroughIOPort:ctg,elk,ilk,snb,ivb,vlv,hsw */
        vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO);
        outb(SR01, VGA_SR_INDEX);
        sr1 = inb(VGA_SR_DATA);
+       DRM_DEBUG_KMS("SR1 = 0x%02x\n", sr1);
        outb(sr1 | 1<<5, VGA_SR_DATA);
        vga_put(pdev, VGA_RSRC_LEGACY_IO);
        udelay(300);

Another peculiarity are the !connector_state->crtc warns. Not sure what is causing those. Maybe the whole thing was just busted back then.
Comment 30 Dennis 2018-09-26 15:02:26 UTC
Created attachment 141751 [details]
netconsole output with the above patches and the 5s delay

The additional debug output lines said that VGA = 0x80100000 and SR1 = 0x00.

After the 5 second added delay, the next output line was:
  "plane B attached to the wrong pipe, disabling plane"
Comment 31 Ville Syrjala 2018-09-27 19:41:20 UTC
Created attachment 141764 [details] [review]
[PATCH] drm/i915: Restore vblank interrupts earlier

The obvious fail I see in the code is that we're going to try a vblank wait without vblank interrupts yet being enabled. Though that should not really cause a hang, and rather we should just get a WARN from the vblank code. Hmm. Or maybe not. I suppose bad thing might happen if we haven't called drm_vblank_reset() yet. Worth a shot at least. Please test.
Comment 32 Ville Syrjala 2018-09-27 19:45:37 UTC
Created attachment 141765 [details] [review]
[PATCH v2] drm/i915: Restore vblank interrupts earlier

Let's try that again. This time in a form that actually builds :)
Comment 33 Dennis 2018-09-27 20:40:17 UTC
Created attachment 141768 [details]
output for the previous v2 patch

Progress! It doesn't crash this time! And I finally have access to my laptop's backlight again and can turn it off!! :P I applied the patch manually against my kernel version 4.18.5 (and left the previous patches in as well, eg. the 5 second delay). However, it seems to get into a loop, where every ~10s it tries to "intel_tv_detect" something. And I can't unload the module - it says that intel_gtt is being used. I also notice a kernel trace in the logs beginning with "vblank not available on crtc 0". (Thank you!)
Comment 34 Ville Syrjala 2018-09-28 12:57:32 UTC
Created attachment 141775 [details] [review]
[PATCH] drm/i915: Use the correct crtc when sanitizing plane mapping

Let's try to use the correct pipe when disabling the plane. Keep the previous patch as well.
Comment 35 Dennis 2018-09-28 13:42:00 UTC
I'm having trouble applying this patch to my 4.18 kernel since the plane->get_hw_state() function only takes one parameter here - it doesn't use that second &pipe and fails to compile :s.
Comment 36 Ville Syrjala 2018-09-28 15:25:27 UTC
(In reply to Dennis from comment #35)
> I'm having trouble applying this patch to my 4.18 kernel since the
> plane->get_hw_state() function only takes one parameter here - it doesn't
> use that second &pipe and fails to compile :s.

You need to cherry-pick commit fcba862e8428 ("drm/i915: Have plane->get_hw_state() return the current pipe")

Looks like it will apply cleanly onto 4.18.
Comment 37 Dennis 2018-09-28 20:01:20 UTC
Created attachment 141782 [details]
output for the previous correct-crtc patch

Things seem to work now, no kernel traces. Except I still get those intel_tv_detect messages being dumped every 10 seconds, and I can't normally unload the i915 module (I have to rmmod -f it).
Comment 38 Ville Syrjala 2018-09-28 20:19:19 UTC
(In reply to Dennis from comment #37)
> Created attachment 141782 [details]
> output for the previous correct-crtc patch
> 
> Things seem to work now, no kernel traces.

Great. I'll mail out the patches.

> Except I still get those
> intel_tv_detect messages being dumped every 10 seconds,

That's normal. It's just polling every 10s to see if you plugged a TV into the machine.

>  and I can't normally
> unload the i915 module (I have to rmmod -f it).

Probably fbcon hanging on to the device. One of these should most likely allow you to rmmod cleanly:
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 1 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
echo 1 > /sys/class/vtconsole/vtcon1/bind
Comment 39 Peter Nowee 2018-10-02 12:43:13 UTC
Just confirming that applying Ville's patches on top of drm-tip (4.19-rc6) solves the problem on my MacBook as well.
Comment 40 Ville Syrjala 2018-10-03 16:31:53 UTC
Created attachment 141858 [details] [review]
[PATCH v2 2/3] drm/i915: Use the correct crtc when sanitizing plane  mapping

Slight adjustment to the patch based on review feedback. Can you double check that it still works as intended for you?
Comment 41 Dennis 2018-10-03 17:41:39 UTC
Excellent work! That patch applies cleanly and things finally work again. Merci beaucoup.
Comment 42 Ville Syrjala 2018-10-04 17:24:33 UTC
Fixes pushed:

commit 68bc30deac625b8be8d3950b30dc93d09a3645f5
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Oct 3 17:49:51 2018 +0300

    drm/i915: Restore vblank interrupts earlier

commit 62358aa4ee86481ce044bef04859820e1bc7c1d9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Oct 3 17:50:17 2018 +0300

    drm/i915: Use the correct crtc when sanitizing plane mapping

Thanks for the great bug report and testing chaps.
Comment 43 Lakshmi 2018-10-13 13:38:15 UTC
Peter/Dennis, 
Thanks for the feedback.
Closing this bug as fixed.
Comment 44 Peter Nowee 2019-01-13 01:29:14 UTC
Just received the 4.19.12 kernel through my distribution (Debian stretch-backports) and happy to see the fix made it through! Just in time to save the next Debian stable release for MacBook2,1 users. MacOS does not install on those anymore, so some of those users might be trying to install Linux sooner or later and would otherwise have run into this crash.

Thanks everyone, Dennis, Ville Syrjala for the fix, Jani Nikula, Jani Saarinen, Elizabeth, Lakshmi and all the other Intel people for great Linux support!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.