Bug 58396 - [i945GM regression] in 3.7.0 kernel on macbook1,1 (kernel crash)
Summary: [i945GM regression] in 3.7.0 kernel on macbook1,1 (kernel crash)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-17 09:05 UTC by Peter Ujfalusi
Modified: 2017-07-24 22:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel messages when i915 is loaded as module. (65.73 KB, text/plain)
2012-12-17 09:05 UTC, Peter Ujfalusi
no flags Details
bisect log between 3.6.0 and 3.7.0 (3.68 KB, text/plain)
2012-12-17 09:08 UTC, Peter Ujfalusi
no flags Details
lspci -v output on the machine (7.87 KB, text/plain)
2012-12-17 09:08 UTC, Peter Ujfalusi
no flags Details
dump encoder responsible for modeset (1.15 KB, patch)
2012-12-17 14:17 UTC, Daniel Vetter
no flags Details | Splinter Review
Patch to correct the check in intel_check_plane_mapping() (1.06 KB, patch)
2012-12-17 14:46 UTC, Peter Ujfalusi
no flags Details | Splinter Review
Dmesg with debug patch from comment #7 (65.08 KB, text/plain)
2012-12-17 17:13 UTC, Peter Ujfalusi
no flags Details
Dmesg with debug patch from comment #7 and commit fa55583 reverted (71.76 KB, text/plain)
2012-12-17 17:14 UTC, Peter Ujfalusi
no flags Details
PATCH: Extended check in intel_check_plane_mapping() (1.27 KB, patch)
2012-12-17 20:20 UTC, Peter Ujfalusi
no flags Details | Splinter Review
Dmesg with debug patch from comment #7 and patch from Comment #17 (66.21 KB, text/plain)
2012-12-17 21:28 UTC, Peter Ujfalusi
no flags Details
don't auto-disable disconnected outputs (1.83 KB, patch)
2012-12-17 23:46 UTC, Daniel Vetter
no flags Details | Splinter Review
Dmesg with debug patch from comment #7 and patch from Comment #22 (66.72 KB, text/plain)
2012-12-18 06:24 UTC, Peter Ujfalusi
no flags Details

Description Peter Ujfalusi 2012-12-17 09:05:49 UTC
Created attachment 71635 [details]
Kernel messages when i915 is loaded as module.

Macbook1,1 (http://www.everymac.com/systems/apple/macbook/specs/macbook_2.0_white.html) no longer boots with 3.7 kernel.

The screen turns blank and the machine no longer responds when Intel modeset is enabled.
I have managed to get a kernel log: compile the Intel gfx as module, blacklist it, boot the machine and via ssh insmod the i915.ko. I have attached the dmesg of events.
Comment 1 Peter Ujfalusi 2012-12-17 09:07:34 UTC
I have also bisected the kernel between 3.6.0 and 3.7.0:

fa55583797d12b10928a1813f3dcf066637caf5e is the first bad commit
commit fa55583797d12b10928a1813f3dcf066637caf5e
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Oct 10 23:14:00 2012 +0200

    drm/i915: fixup the plane->pipe fixup code
    
    We need to check whether the _other plane is on our pipe, not whether
    our plane is on the other pipe. Otherwise if not both pipes/planes are
    active, we won't properly clean up the mess and set up our desired
    plane->pipe mapping.
    
    v2: Fixup the logic, I've totally fumbled it. Noticed by Chris Wilson.
    
    v3: I've checked Bspec, and the flexible plane->pipe mapping is a
    gen2/3 feature, so test for that instead of PCH_SPLIT
    
    v4: Check whether we indeed have 2 pipes before checking the other
    pipe, to avoid upsetting i845g/i865g. Noticed by Chris Wilson.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51265
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49838
    Tested-by: Dave Airlie <airlied@gmail.com>
    Tested-by: Chris Wilson <chris@chris-wilson.co.uk> #855gm
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Attaching the bisect log as well to the bug.
Comment 2 Peter Ujfalusi 2012-12-17 09:08:03 UTC
Created attachment 71636 [details]
bisect log between 3.6.0 and 3.7.0
Comment 3 Peter Ujfalusi 2012-12-17 09:08:27 UTC
Created attachment 71637 [details]
lspci -v output on the machine
Comment 4 Peter Ujfalusi 2012-12-17 09:21:01 UTC
One more thing to add:
I have also tried:
git://people.freedesktop.org/~danvet/drm-intel drm-intel-nightly merged to mainline kernel.

Similar symptoms: the screen goes blank and the machine locks up.
Comment 5 Daniel Vetter 2012-12-17 09:32:24 UTC
Can you please boot 3.7 with the offending patch reverted and attach the dmesg (please boot with drm debugging enabled)?
Comment 6 Peter Ujfalusi 2012-12-17 09:37:56 UTC
(In reply to comment #5)
> Can you please boot 3.7 with the offending patch reverted and attach the
> dmesg (please boot with drm debugging enabled)?

I'll do it as soon as get home.
Comment 7 Daniel Vetter 2012-12-17 14:17:20 UTC
Created attachment 71656 [details] [review]
dump encoder responsible for modeset

Also please apply this little debug patch on top of a broken kernel and readd a new dmesg with drm debug enabled.

Analysis thus far is that we do a completely bogus modeset on the 2nd crtc, but while actually disabling the 1st crtc. Hence caller supplies no mode, and we fall over with a NULL deref trying to prepare the modeset on the 2nd crtc. Above patch might shed clue on why we try to do that bogus modeset.
Comment 8 Peter Ujfalusi 2012-12-17 14:45:31 UTC
(In reply to comment #7)
> Created attachment 71656 [details] [review] [review]
> dump encoder responsible for modeset
> 
> Also please apply this little debug patch on top of a broken kernel and
> readd a new dmesg with drm debug enabled.
> 
> Analysis thus far is that we do a completely bogus modeset on the 2nd crtc,
> but while actually disabling the 1st crtc. Hence caller supplies no mode,
> and we fall over with a NULL deref trying to prepare the modeset on the 2nd
> crtc. Above patch might shed clue on why we try to do that bogus modeset.

Will do. Meanwhile I looked at the offending patch:
fa55583 drm/i915: fixup the plane->pipe fixup code

The new check does not match the removed one so the code in intel_sanitize_crtc()
will going to take different path.
Snippet from the patch:

+
+       if ((val & DISPLAY_PLANE_ENABLE) &&
+           (!!(val & DISPPLANE_SEL_PIPE_MASK) == crtc->pipe))
+               return false;
+
...
-
-               if ((val & DISPLAY_PLANE_ENABLE) == 0 &&
-                   (!!(val & DISPPLANE_SEL_PIPE_MASK) == crtc->pipe))
-                       goto ok;
-

I'm mostly confident that this is the root cause of the issue I'm seeing.
I'l attach a patch to the bug to fix it. I have not tested it, but going to do that as soon as I got home.
Comment 9 Peter Ujfalusi 2012-12-17 14:46:16 UTC
Created attachment 71659 [details] [review]
Patch to correct the check in intel_check_plane_mapping()
Comment 10 Daniel Vetter 2012-12-17 14:59:55 UTC
(In reply to comment #9)
> Created attachment 71659 [details] [review] [review]
> Patch to correct the check in intel_check_plane_mapping()

The change you're reverting here is very much intentional - the old code checked whether the right plane is disabled and on the right pipe, whereas the new one whether the wrong plane is enabled and on the wrong pipe. This matters for the fixup code below, which without this change would disable the wrong pipe.
Comment 11 Daniel Vetter 2012-12-17 15:01:18 UTC
More succinct: We did change the sense of the "plane enabled?" check, but we also inverted the plane we actually check (see the added ! in reg = DSPCNTR(!crtc->plane);)
Comment 12 Peter Ujfalusi 2012-12-17 15:12:35 UTC
(In reply to comment #11)
> More succinct: We did change the sense of the "plane enabled?" check, but we
> also inverted the plane we actually check (see the added ! in reg =
> DSPCNTR(!crtc->plane);)

True, I have overlooked that change.
Will report back soon with the two dmesg (with debug patch from you) and with reverted fa55583.
Comment 13 Peter Ujfalusi 2012-12-17 15:28:40 UTC
(In reply to comment #11)
> More succinct: We did change the sense of the "plane enabled?" check, but we
> also inverted the plane we actually check (see the added ! in reg =
> DSPCNTR(!crtc->plane);)

Hrm, should the crtc->pipe need to be inverted as well?

Based on your earlier comment about the change in the check:
> whereas the new one whether the wrong plane is enabled and on the wrong pipe.

Something like this, so we check on the 'wrong plane' and on the 'wrong pipe':

@@ -8114,7 +8114,7 @@ intel_check_plane_mapping(struct intel_crtc *crtc)
        val = I915_READ(reg);
 
        if ((val & DISPLAY_PLANE_ENABLE) &&
-           (!!(val & DISPPLANE_SEL_PIPE_MASK) == crtc->pipe))
+           (!!(val & DISPPLANE_SEL_PIPE_MASK) == !crtc->pipe))
                return false;

Not sure... Will see when I get my hands on my laptop.
Comment 14 Peter Ujfalusi 2012-12-17 17:13:27 UTC
Created attachment 71675 [details]
Dmesg with debug patch from comment #7
Comment 15 Peter Ujfalusi 2012-12-17 17:14:47 UTC
Created attachment 71676 [details]
Dmesg with debug patch from comment #7 and commit fa55583 reverted

If I revert commit:
fa55583 drm/i915: fixup the plane->pipe fixup code

The display works fine.
Comment 16 Peter Ujfalusi 2012-12-17 17:15:19 UTC
(In reply to comment #13)
> @@ -8114,7 +8114,7 @@ intel_check_plane_mapping(struct intel_crtc *crtc)
>         val = I915_READ(reg);
>  
>         if ((val & DISPLAY_PLANE_ENABLE) &&
> -           (!!(val & DISPPLANE_SEL_PIPE_MASK) == crtc->pipe))
> +           (!!(val & DISPPLANE_SEL_PIPE_MASK) == !crtc->pipe))
>                 return false;
> 
> Not sure... Will see when I get my hands on my laptop.

This does not work.
Comment 17 Peter Ujfalusi 2012-12-17 20:20:22 UTC
Created attachment 71689 [details] [review]
PATCH: Extended check in intel_check_plane_mapping()

Hi Daniel,

this patch fixed the problem on my macbook1,1. KMS works fine after applying it.

We only checked if the 'wrong plane' (if it is enabled) was configured to use the crtc->pipe. I have added additional check which checks if the current plane's configuration in HW is correct or not (regarding to crtc->pipe).
I'm not really familiar with the terms in graphics area so the comments in the patch might be totally wrong.
I hope this helps to figure out a proper fix for the issue.

Thanks,
Peter
Comment 18 Daniel Vetter 2012-12-17 21:10:11 UTC
(In reply to comment #17)
> We only checked if the 'wrong plane' (if it is enabled) was configured to
> use the crtc->pipe. I have added additional check which checks if the
> current plane's configuration in HW is correct or not (regarding to
> crtc->pipe).
> I'm not really familiar with the terms in graphics area so the comments in
> the patch might be totally wrong.
> I hope this helps to figure out a proper fix for the issue.

Can you please attach debug dmesg for this patch (with the "dump encoder responsible for modeset" patch applied)? I still think that this is not required and only results in pipes getting turned off too often. Which then papers over a subsequent bug in the code. And we don't want to turn off pipes if not required, eventually that should allow us to seamlessly take over the bios configuration, speeding up boot times by quite a bit.
Comment 19 Daniel Vetter 2012-12-17 21:11:33 UTC
Then also please attach debug dmesg with just the debug patch applied, so that we can (hopefully) figure out what's going wrong.
Comment 20 Peter Ujfalusi 2012-12-17 21:18:07 UTC
(In reply to comment #19)
> Then also please attach debug dmesg with just the debug patch applied, so
> that we can (hopefully) figure out what's going wrong.

In Comment #14 you can find the dmesg with only the debug patch.
Under Comment #15 you can find the dmesg with the debug patch + reverted fa55583.
I would expect that with the last patch I have attached the dmesg would look the same as it is in Comment #15.
I'll reboot and post the dmesg soon.
Comment 21 Peter Ujfalusi 2012-12-17 21:28:28 UTC
Created attachment 71697 [details]
Dmesg with debug patch from comment #7 and patch from Comment #17

This dmesg contains the debug patch and the patch from Comment #17.
I expect this to be about the same as the dmesg in Comment #15.
Comment 22 Daniel Vetter 2012-12-17 23:46:42 UTC
Created attachment 71704 [details] [review]
don't auto-disable disconnected outputs

Sorry, I've missed your dmesg attachments, too many mails :(

Please test the attached patch. I still need to decide whether this is the "right" fix, but I think this should at least prevent your oops.

Chris, we need fastboot asap ;-)
Comment 23 Peter Ujfalusi 2012-12-18 06:24:00 UTC
Created attachment 71709 [details]
Dmesg with debug patch from comment #7 and patch from Comment #22

With the patch from Comment #22 KMS also works on macbook1,1
dmesg with the patch applied (+ the debug patch)
Comment 24 Peter Ujfalusi 2012-12-18 06:30:30 UTC
(In reply to comment #22)
> Please test the attached patch. I still need to decide whether this is the
> "right" fix, but I think this should at least prevent your oops.

With this patch KMS also works. If you decide that this is the right fix can you please make sure it hits linux-stable (3.7) soon.

Out of curiosity: do you have short explanation on why this patch actually helps/works?

Thanks,
Peter
Comment 25 Daniel Vetter 2012-12-18 08:43:34 UTC
> --- Comment #24 from Peter Ujfalusi <peter.ujfalusi@gmail.com> ---
> (In reply to comment #22)
>> Please test the attached patch. I still need to decide whether this is the
>> "right" fix, but I think this should at least prevent your oops.
>
> With this patch KMS also works. If you decide that this is the right fix can
> you please make sure it hits linux-stable (3.7) soon.

I'll submit it with the bug link updated asap for 3.8, if it passes
review and all, it should land in a stable backport in 1-2 weeks. But
holidays might delay that a bit.

> Out of curiosity: do you have short explanation on why this patch actually
> helps/works?

Your BIOS enables VGA on the same pipe 1 as it enables LVDS. When
setting up the initial kms config we go through all pipes: First we
disable pipe 0, since it's unused, then we'll enable the desired
config for LVDS on pipe 1.

The problem now is that for historical reasons the new i915 modeset
code disables any outputs when they're disconnected for _any_ modeset,
i.e. when we ensure that pipe 0 is off, it also wants to disable VGA
out on pipe 0. Imo this is braindead semantics, but it's bug-for-bug
compatible with the old modeset implementation from 3.6. The problem
now is that the new modeset always enables/disables the entire pipe,
so to disable VGA on pipe it needs to enable/disable LVDS on pipe 1,
too. There we fall over, since at we don't (yet) read out the mode set
up by the BIOS, hence the NULL deref oops.
Comment 26 Peter Ujfalusi 2012-12-18 10:42:01 UTC
(In reply to comment #25)
> I'll submit it with the bug link updated asap for 3.8, if it passes
> review and all, it should land in a stable backport in 1-2 weeks. But
> holidays might delay that a bit.

Cool, thanks.

> > Out of curiosity: do you have short explanation on why this patch actually
> > helps/works?
> 
> Your BIOS enables VGA on the same pipe 1 as it enables LVDS. When
> setting up the initial kms config we go through all pipes: First we
> disable pipe 0, since it's unused, then we'll enable the desired
> config for LVDS on pipe 1.
> 
> The problem now is that for historical reasons the new i915 modeset
> code disables any outputs when they're disconnected for _any_ modeset,
> i.e. when we ensure that pipe 0 is off, it also wants to disable VGA
> out on pipe 0. Imo this is braindead semantics, but it's bug-for-bug
> compatible with the old modeset implementation from 3.6. The problem
> now is that the new modeset always enables/disables the entire pipe,
> so to disable VGA on pipe it needs to enable/disable LVDS on pipe 1,
> too. There we fall over, since at we don't (yet) read out the mode set
> up by the BIOS, hence the NULL deref oops.

I see. Thank you for taking the time to explain it. It does make perfect sense. It would have taken a bit of time if I started to read the code and the debug messages to come to this conclusion (I'm an ASoC guy)...

Thanks again!
Peter
Comment 27 Florian Mickler 2013-01-04 09:23:15 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc2:

commit b0a2658acb5bf9ca86b4aab011b7106de3af0add
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Dec 18 09:37:54 2012 +0100

    drm/i915: don't disable disconnected outputs
Comment 28 Daniel Vetter 2013-01-07 09:12:23 UTC
Patch is merged, closing bug. Thanks a lot for reporting this and please reopen if it breaks again.
Comment 29 Florian Mickler 2013-01-19 22:59:09 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.8-rc4:

commit 3490ea5de6ac4af309c3df8a26a5cca61306334c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jan 7 10:11:40 2013 +0000

    drm/i915: Treat crtc->mode.clock == 0 as disabled


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.