Description
Dennis
2018-03-21 00:15:26 UTC
Created attachment 138232 [details]
modprobe i915
Hello, Could you also share full dmesg with drm.debug=14 and xorg.log? Created attachment 138266 [details]
dmesg with modprobe i915 with debug.drm=14
Comment on attachment 138266 [details]
dmesg with modprobe i915 with debug.drm=14
(I don't use/want X, so no xorg.conf available)
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Jani, any suggestions here? Subject says "crash", features says "GPU hang", comment #0 says backlight. Which is it? GPU hang. The backlight/efi issue was simply my motivation to start using the i915 driver. (In reply to Dennis from comment #8) > GPU hang. The backlight/efi issue was simply my motivation to start using > the i915 driver. But... there's zero indication of a GPU hang anywhere! Please add drm.debug=14, don't do any of the module moving hacks indicated in the dmesg, and attach full dmesg from boot, reproducing whatever problem you have. I'm only able to get a dmesg if I first boot without i915 loaded, so I can ssh in, and capture whatever output I can remotely before my system crashes. That's what that dmesg that I provided is ... my gpu/system hangs/crashes at the end of that log and I have to forcefully reboot. Still missing latest logs. I was only asked for a "full dmesg with drm.debug=14 and xorg.log" (and I don't have X)... what am I missing? Have you tried with latest drm-tip https://cgit.freedesktop.org/drm-tip? Created attachment 139355 [details]
using drm-tip, dmesg -w, i modprobe i915 at about 297s in
Created attachment 139646 [details]
dmesg modprobe i915 on linux 4.14.13 drm.debug=14 (at 146 seconds in)
I have a MacBook2,1 as well and can confirm the issue. I am booting
from the internal HDD, not an external USB drive. I am using Debian
Stretch with stable and stable-backports kernels.
I tried loading i915 on several kernels:
- 4.9.88: Works.
- 4.11.6: Works.
- 4.12.6: Works.
- 4.13.13: Works.
- 4.14.13: Works.
- 4.15.11: Hangs.
- 4.16.5: Hangs.
Where it works (<=4.14), there are still warnings and traces in the
dmesg logs that might provide some clues as to what is going wrong in
later versions (>=4.15). Attached is the dmesg log for 4.14.13.
It looks like a git bisect between v4.14 and v4.15 would be the best bet. No clues in the warnings and traces that the earlier kernels gave already? (see 4.14 dmesg attachment of my previous comment) (In reply to Peter Nowee from comment #18) > No clues in the warnings and traces that the earlier kernels gave already? Not really. It's also still completely unclear to me what the actual failure mode is. Created attachment 139806 [details] git bisect log on linux (torvalds tree) for MacBook 2,1 i915 GPU hang Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is: commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri Nov 17 21:19:09 2017 +0200 drm/i915: Redo plane sanitation during readout Unify the plane disabling during state readout by pulling the code into a new helper intel_plane_disable_noatomic(). We'll also read out the state of all planes, so that we know which planes really need to be diabled. Additonally we change the plane<->pipe mapping sanitation to work by simply disabling the offending planes instead of entire pipes. And we do it before we otherwise sanitize the crtcs, which means we don't have to worry about misassigned planes during crtc sanitation anymore. v2: Reoder patches to not depend on enum old_plane_id v3: s/for_each_pipe/for_each_intel_crtc/ Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Alex Villacís Lasso <alexvillacislasso@hotmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103223 Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171117191917.11506-3-ville.syrjala@linux.intel.com Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit b1e01595a66dc206a2c75401ec4c285740537f3f) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Attached is the bisect log. I don't really know how to go from here. I tried reverting that commit on top of the current master, but that leads to merge conflicts. I tried to solve them, but did not succeed, because too much has changed. Hope someone else can pick it up from here. But let me know if you need some more info or if I need to test something. Still hangs/crashes in 4.18.5 (In reply to Peter Nowee from comment #20) > Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is: > > commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > Date: Fri Nov 17 21:19:09 2017 +0200 > > drm/i915: Redo plane sanitation during readout Please double check the bisect by trying commits 23ac127328259 and d87ce76402950b, and attaching dmesg for each. Created attachment 141595 [details]
dmesg for last good commit d87ce76402950b drm.debug=14 (modprobe i915 at 116 seconds in)
Created attachment 141596 [details]
dmesg for first bad commit 23ac127328259 drm.debug=14 (no log after modprobe i915 as it hangs the computer)
I double-checked both commits: When running `modprobe i915`, 23ac127328259 still hangs, whereas d87ce76402950b still works ok. See my previous two comments for the dmesg log attachments. Note that I do not have any kernel logging from when it hangs. Reading the kernel messages over SSH also did not produce any additional information. Created attachment 141672 [details]
dmesg for first bad commit 23ac127328259 drm.debug=14 (modprobe i915 at 345 seconds in)
On second try, I am able to get more kernel messages over SSH before it hangs after all. See this new attachment for commit 23ac127328259 (first bad commit).
*** Bug 106160 has been marked as a duplicate of this bug. *** (In reply to Peter Nowee from comment #20) > Created attachment 139806 [details] > git bisect log on linux (torvalds tree) for MacBook 2,1 i915 GPU hang > > Ok, I did a git bisect on linux (torvalds tree) and the first bad commit is: > > commit 23ac12732825901b3fc6ac720958d8bff9a0d6ec > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > Date: Fri Nov 17 21:19:09 2017 +0200 > > drm/i915: Redo plane sanitation during readout Cc: Ville. His commit, confirmed regression, and an old platform. A perfect match. ;) Is there an ethernet port on the machine by any change? If yes, netconsole should be capable of giving us more of the dmesg. If no, maybe someting like this would let you catch more of the debug output before the hang: diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index c13b2f5704b6..3cc0df6b5ebe 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -16164,6 +16164,8 @@ intel_modeset_setup_hw_state(struct drm_device *dev, /* HW state is read out, now we need to sanitize this mess. */ get_encoder_power_domains(dev_priv); + msleep(5000); + intel_sanitize_plane_mapping(dev_priv); for_each_intel_encoder(dev, encoder) { One peculiar thing about the good logs is that there are no planes enabled when we initialize the driver. That would suggest that it's using the VGA plane, which is rather odd considering it's supposed to be using EFI. This should tells whether the VGA plane is enabled or not: diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index c13b2f5704b6..0e976dd2535f 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -15311,10 +15311,13 @@ static void i915_disable_vga(struct drm_i915_private *dev_priv) u8 sr1; i915_reg_t vga_reg = i915_vgacntrl_reg(dev_priv); + DRM_DEBUG_KMS("VGA = 0x%08x\n", I915_READ(vga_reg)); + /* WaEnableVGAAccessThroughIOPort:ctg,elk,ilk,snb,ivb,vlv,hsw */ vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO); outb(SR01, VGA_SR_INDEX); sr1 = inb(VGA_SR_DATA); + DRM_DEBUG_KMS("SR1 = 0x%02x\n", sr1); outb(sr1 | 1<<5, VGA_SR_DATA); vga_put(pdev, VGA_RSRC_LEGACY_IO); udelay(300); Another peculiarity are the !connector_state->crtc warns. Not sure what is causing those. Maybe the whole thing was just busted back then. Created attachment 141751 [details]
netconsole output with the above patches and the 5s delay
The additional debug output lines said that VGA = 0x80100000 and SR1 = 0x00.
After the 5 second added delay, the next output line was:
"plane B attached to the wrong pipe, disabling plane"
Created attachment 141764 [details] [review] [PATCH] drm/i915: Restore vblank interrupts earlier The obvious fail I see in the code is that we're going to try a vblank wait without vblank interrupts yet being enabled. Though that should not really cause a hang, and rather we should just get a WARN from the vblank code. Hmm. Or maybe not. I suppose bad thing might happen if we haven't called drm_vblank_reset() yet. Worth a shot at least. Please test. Created attachment 141765 [details] [review] [PATCH v2] drm/i915: Restore vblank interrupts earlier Let's try that again. This time in a form that actually builds :) Created attachment 141768 [details]
output for the previous v2 patch
Progress! It doesn't crash this time! And I finally have access to my laptop's backlight again and can turn it off!! :P I applied the patch manually against my kernel version 4.18.5 (and left the previous patches in as well, eg. the 5 second delay). However, it seems to get into a loop, where every ~10s it tries to "intel_tv_detect" something. And I can't unload the module - it says that intel_gtt is being used. I also notice a kernel trace in the logs beginning with "vblank not available on crtc 0". (Thank you!)
Created attachment 141775 [details] [review] [PATCH] drm/i915: Use the correct crtc when sanitizing plane mapping Let's try to use the correct pipe when disabling the plane. Keep the previous patch as well. I'm having trouble applying this patch to my 4.18 kernel since the plane->get_hw_state() function only takes one parameter here - it doesn't use that second &pipe and fails to compile :s. (In reply to Dennis from comment #35) > I'm having trouble applying this patch to my 4.18 kernel since the > plane->get_hw_state() function only takes one parameter here - it doesn't > use that second &pipe and fails to compile :s. You need to cherry-pick commit fcba862e8428 ("drm/i915: Have plane->get_hw_state() return the current pipe") Looks like it will apply cleanly onto 4.18. Created attachment 141782 [details]
output for the previous correct-crtc patch
Things seem to work now, no kernel traces. Except I still get those intel_tv_detect messages being dumped every 10 seconds, and I can't normally unload the i915 module (I have to rmmod -f it).
(In reply to Dennis from comment #37) > Created attachment 141782 [details] > output for the previous correct-crtc patch > > Things seem to work now, no kernel traces. Great. I'll mail out the patches. > Except I still get those > intel_tv_detect messages being dumped every 10 seconds, That's normal. It's just polling every 10s to see if you plugged a TV into the machine. > and I can't normally > unload the i915 module (I have to rmmod -f it). Probably fbcon hanging on to the device. One of these should most likely allow you to rmmod cleanly: echo 0 > /sys/class/vtconsole/vtcon0/bind echo 1 > /sys/class/vtconsole/vtcon0/bind echo 0 > /sys/class/vtconsole/vtcon1/bind echo 1 > /sys/class/vtconsole/vtcon1/bind Just confirming that applying Ville's patches on top of drm-tip (4.19-rc6) solves the problem on my MacBook as well. Created attachment 141858 [details] [review] [PATCH v2 2/3] drm/i915: Use the correct crtc when sanitizing plane mapping Slight adjustment to the patch based on review feedback. Can you double check that it still works as intended for you? Excellent work! That patch applies cleanly and things finally work again. Merci beaucoup. Fixes pushed: commit 68bc30deac625b8be8d3950b30dc93d09a3645f5 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Oct 3 17:49:51 2018 +0300 drm/i915: Restore vblank interrupts earlier commit 62358aa4ee86481ce044bef04859820e1bc7c1d9 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Oct 3 17:50:17 2018 +0300 drm/i915: Use the correct crtc when sanitizing plane mapping Thanks for the great bug report and testing chaps. Peter/Dennis, Thanks for the feedback. Closing this bug as fixed. Just received the 4.19.12 kernel through my distribution (Debian stretch-backports) and happy to see the fix made it through! Just in time to save the next Debian stable release for MacBook2,1 users. MacOS does not install on those anymore, so some of those users might be trying to install Linux sooner or later and would otherwise have run into this crash. Thanks everyone, Dennis, Ville Syrjala for the fix, Jani Nikula, Jani Saarinen, Elizabeth, Lakshmi and all the other Intel people for great Linux support! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.