Bug 38012 - [Arrandale] HP 8440p locks solid on 2.6.39 with 2.14.0+.
Summary: [Arrandale] HP 8440p locks solid on 2.6.39 with 2.14.0+.
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Jesse Barnes
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 39108 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-06-06 15:17 UTC by Zephaniah E. Hull
Modified: 2011-11-15 10:07 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Zephaniah E. Hull 2011-06-06 15:17:47 UTC
On my HP 8440p the system locks solid when xorg starts.

This does not happen when running Ubuntu 10.10 with the 2.12.0 driver, however it does with the 2.14.0 driver.

This does not happen with 2.6.38, but does with 2.6.39.

There are no logs on the system after the crash, nothing gets written to the disk for either the kernel log or the Xorg log.

The git bisect log for this is below, however I was unable to complete the bisect because my system will not boot inside the DRI branch which this was bisected to.

If necessary, I may be able to work around this with an install on an external drive, however that is time consuming enough that it would, at best, have to wait until next weekend.

Zephaniah E. Loss-Cutler-Hull.

git bisect start
# bad: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
git bisect bad 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
# good: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect good 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [0df0914d414a504b975f3cc66ace0c16ef55b7f3] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
git bisect bad 0df0914d414a504b975f3cc66ace0c16ef55b7f3
# good: [6445ced8670f37cfc2c5e24a9de9b413dbfc788d] Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect good 6445ced8670f37cfc2c5e24a9de9b413dbfc788d
# good: [5a0efea09f42f7c92bd98a38d66b4dff9589266b] sparc64: Sharpen address space randomization calculations.
git bisect good 5a0efea09f42f7c92bd98a38d66b4dff9589266b
# bad: [40c7f2112ce18fa5eb6dc209c50dd0f046790191] Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect bad 40c7f2112ce18fa5eb6dc209c50dd0f046790191
# bad: [4819d2e4310796c4e9eef674499af9b9caf36b5a] drm: Retry i2c transfer of EDID block after failure
git bisect bad 4819d2e4310796c4e9eef674499af9b9caf36b5a
Comment 1 Zephaniah E. Hull 2011-06-14 15:50:00 UTC
Alright, so I have bisected down my crash to bcd5023c961a44c7149936553b6929b2b233dd27, which alters the lid detection logic.

Reverting this change in 2.6.39 results in a kernel that is vaguely usable, however there is no output to the laptop's display.

Closing the lid works, it redetects displays and we are still good.

Reopening the lid causes the laptop to lock solid, exactly as it was on starting X.

I am now attempting to bisect where eDP handling for my laptop broke, joy.

On the bright side, bcd5023c961a44c7149936553b6929b2b233dd27 is confirmed to impact my system as well. (HP 8440p)

Zephaniah E. Loss-Cutler-Hull.
Comment 2 Zephaniah E. Hull 2011-06-20 15:36:07 UTC
Alright, I have bisected this down to 9035a97a32836d0e456ddafaaf249a844e6e4b5e.

fe16d949b45036d9f80e20e07bde1ddacc930b10 is good.

452858338aec31c1f4414bf07f31663690479869 is good.

And 9035a97a is bad.

Now the catch, this is a merge commit, and I'm still trying to figure out how to untangle it to have something I can revert.

Suggestions are welcome.
Comment 3 Zephaniah E. Hull 2011-06-21 08:44:11 UTC
(In reply to comment #2)
> Alright, I have bisected this down to 9035a97a32836d0e456ddafaaf249a844e6e4b5e.
> 
> fe16d949b45036d9f80e20e07bde1ddacc930b10 is good.
> 
> 452858338aec31c1f4414bf07f31663690479869 is good.
> 
> And 9035a97a is bad.
> 
> Now the catch, this is a merge commit, and I'm still trying to figure out how
> to untangle it to have something I can revert.
> 
> Suggestions are welcome.

So, knowing that 4528583 was the drm-intel-fixes branch, I have tested a version of 9035a97 with all the differences in drivers/gpu/drm/i915 between fe16d94 and 9035a97 reverted, effectively reverting all of the pure intel changes brought in by the merge.

The resulting kernel still had the bug.

At this point, I am stumped, and can not see how to bisect this problem any further.

Help?

Zephaniah E. Loss-Cutler-Hull.
Comment 4 Zephaniah E. Hull 2011-06-23 17:16:02 UTC
And it's been bisected down on the xorg driver side as well.

0d26d950fdada1f59dc6cb31fe2f03004825f773, 'KMS: add fake EDID on eDP too' is the commit which causes kernels from 9035a97 and forward to do a hard lock when we try to bring the eDP live.

Due to some later changes a straight revert against 2.15.0 is not possible, but changing the line in question to:

if (is_panel(koutput->connector_type) && koutput->connector_type != DRM_MODE_CONNECTOR_eDP) {

allows 2.15.0 to work with 9035a97.

Of course, the 9035a97 merge has very little to do with the decision to treat it as a panel, so I am well and truly lost as to what the _right_ fix is.

In the mean time, keeping the xorg driver from treating eDP as a panel at the very least lets my system work again, even if it's the wrong solution in the long term.

Zephaniah E. Loss-Cutler-Hull.
Comment 5 meng 2011-06-28 01:14:24 UTC
After testing, we can't reproduce the issue with commit 9035a97a on our Capella test machine. We run Ubuntu 11.04 with the 2.14.0 driver.
Comment 6 Chris Wilson 2011-07-13 13:11:01 UTC
*** Bug 39108 has been marked as a duplicate of this bug. ***
Comment 7 Chris Wilson 2011-07-13 13:24:14 UTC
As discussed on bug 39108:


commit 212fa9868767637e8f430485eeb522c99e63fd16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 13 21:11:14 2011 +0100

    Disable adding normal RTF modes for an eDP
    
    This is causing a hard hang with 2.6.39+, we don't know why so play safe
    and disable for the time being.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=38012
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

as a temporary avoidance measure.
Comment 8 Chris Wilson 2011-07-22 13:03:50 UTC
Jesse posted a patch for a related bug [bug 36888], please can you test https://bugs.freedesktop.org/attachment.cgi?id=48015
Comment 9 Chris Wilson 2011-07-29 02:21:16 UTC
Also some major modesetting bugs were squashed in keithp/drm-intel-fixes

commit d74362c9e45689d8d7e3d4bcf6681c4358ef4f2e
Author: Keith Packard <keithp@keithp.com>
Date:   Thu Jul 28 14:47:14 2011 -0700

    drm/i915: Flush other plane register writes
    
    Writes to the plane control register are buffered in the chip until a
    write to the DSPADDR (pre-965) or DSPSURF (post-965) register occurs.
    
    This patch adds flushes in:
    
        intel_enable_plane
        gen6_init_clock_gating
        ivybridge_init_clock_gating
    
    Signed-off-by: Keith Packard <keithp@keithp.com>

commit 2704cf5fbd248871a745d210733c6319959d2b0c
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Thu Jul 28 11:52:45 2011 -0700

    drm/i915: flush plane control changes on ILK+ as well
    
    After writing to the plane control reg we need to write to the surface
    reg to trigger the double buffered register latch.  On previous
    chipsets, writing to DSPADDR was enough, but on ILK+ DSPSURF is the reg
    that triggers the double buffer latch.
    
    v2: write DSPADDR too to cover pre-965 chipsets
    v3: use flush_display_plane instead, that's what it's for
    v4: send the right patch
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Tested-by: Keith Packard <keithp@keithp.com>
    Reviewed-by: Keith Packard <keithp@keithp.com>
    Signed-off-by: Keith Packard <keithp@keithp.com>


please can you test those?
Comment 10 Zephaniah E. Hull 2011-07-31 06:53:06 UTC
(In reply to comment #9)
> Also some major modesetting bugs were squashed in keithp/drm-intel-fixes
> 
> commit d74362c9e45689d8d7e3d4bcf6681c4358ef4f2e
> commit 2704cf5fbd248871a745d210733c6319959d2b0c

> please can you test those?

I can confirm that 3.0 with those two patches applied works perfectly with the 2.14.0-4ubuntu7.1 xorg driver which ships with Ubuntu 11.04.

So I think we can call this bug solved, and the 212fa9868767637e8f430485eeb522c99e63fd16 xorg driver commit mentioned above can be reverted.

Any chance that we can get those two patches marked for stable kernel backports?

Thanks!
Zephaniah E. Loss-Cutler-Hull.
Comment 11 Zephaniah E. Hull 2011-08-01 10:42:16 UTC
(In reply to comment #10)
> So I think we can call this bug solved, and the
> 212fa9868767637e8f430485eeb522c99e63fd16 xorg driver commit mentioned above can
> be reverted.
> 
> Any chance that we can get those two patches marked for stable kernel
> backports?
> 
> Thanks!
> Zephaniah E. Loss-Cutler-Hull.

Or, not.

It worked once at home, and since then has been failing in the exact same manner, damn.

Sorry, still broken.

Zephaniah E. Loss-Cutler-Hull.
Comment 12 Zephaniah E. Hull 2011-08-04 14:59:26 UTC
> It worked once at home, and since then has been failing in the exact same
> manner, damn.
> 
> Sorry, still broken.
> 
> Zephaniah E. Loss-Cutler-Hull.

Alright, after some poking, the patches are definitely an improvement which should get backported, but they are not the whole story.

I will try to get some debug logs tomorrow showing the modes being set.

The 3.0 kernel with the patches applied works for the laptop when no external display is connected.

It also works when my display at home is connected.

It locks up solidly when my display at work is connected, that is an Acer V193W.

Both are DVI monitors plugged into a docking station, the one at the office is 1440x900, the one at home is 1600x900.

The laptop is natively 1600x900, however I have caught it going into 1440x900 scaled by default when the work monitor is plugged in.
Comment 13 Zephaniah E. Hull 2011-08-09 10:44:23 UTC
(In reply to comment #12)
> Alright, after some poking, the patches are definitely an improvement which
> should get backported, but they are not the whole story.

However drm-intel-next as of 2cfad3600e192f867c0e03905c90903f189c010a (other versions not tested) appears to be working perfectly.

I'll test more, but so far it's working in all the configurations I use the system in.

Zephaniah E. Loss-Cutler-Hull.
Comment 14 Eugeni Dodonov 2011-10-03 09:51:44 UTC
So, is it fixed now, right?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.