Bug 37029 - [SNB] LVDS link failure
Summary: [SNB] LVDS link failure
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-09 10:40 UTC by Vivek Periaraj
Modified: 2012-07-01 03:52 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Report failure from _MUX query (2.04 KB, patch)
2011-05-09 11:03 UTC, Chris Wilson
no flags Details | Splinter Review
Full dmesg output (57.36 KB, text/plain)
2011-05-09 13:18 UTC, Vivek Periaraj
no flags Details
intel_reg_dumper output (11.08 KB, text/plain)
2011-05-11 01:58 UTC, Vivek Periaraj
no flags Details
Latest dmesg (6.08 KB, application/octet-stream)
2011-06-08 15:06 UTC, Vivek Periaraj
no flags Details
Reattaching the dmesg log (6.08 KB, text/plain)
2011-06-08 15:19 UTC, Vivek Periaraj
no flags Details
Latest intel_reg_dumper output (11.18 KB, text/plain)
2011-06-13 11:39 UTC, Vivek Periaraj
no flags Details

Description Vivek Periaraj 2011-05-09 10:40:18 UTC
Chipset:

00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09)

System architecture: 32 bit

xf86-video-intel: 2.15.0 as well as code compiled from git.

xserver: 7.6

mesa: 7.10.2

libdrm: 2.4.24

kernel: 2.6.38-2

Linux distribution: Debian 6.0

Machine: Dell Vostro 3750

Display connector: Dell Monitor


The problem:

The kernel driver i915 does not get loaded at boot time, it's throwing the following error. This is in kernel 2.6.38 and 2.6.39. 

[drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed

And it hangs there until I hard reboot.

I am not able to bring up the KDM. The error I get is 'No devices detected'.

Thanks,
Vivek.
Comment 1 Chris Wilson 2011-05-09 11:03:45 UTC
Created attachment 46495 [details] [review]
Report failure from _MUX query

Hmm, does this help?

Otherwise can you please append drm.debug=0xf to your kernel boot parameters and attach the full dmesg?
Comment 2 Vivek Periaraj 2011-05-09 13:18:45 UTC
Created attachment 46504 [details]
Full dmesg output

Full dmesg output
Comment 3 Vivek Periaraj 2011-05-09 13:19:30 UTC
That patch didn't seem to help. When I 'modprobe i915', the screen goes blank/black and I lose control of my keyboard. So I had to hard reboot. This the same behaviour before the patch too.

I ran my current kernel - 2.6.38-2-686 - with 'drm.debug=0xf' as one the boot parameters. Pls find it attached.
Comment 4 Chris Wilson 2011-05-09 14:00:19 UTC
Ah, the screen goes blank after the _MUX warning? Ok, that's a panic during modesetting. We need to either grab the OOPS over the network (using netconsole) or we can try random patches...

The most likely candidate in that case is

commit 31acbcc408f412d1ba73765b846c38642be553c3
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Apr 17 06:38:35 2011 +0100

    drm/i915/dp: Be paranoid in case we disable a DP before it is attached
    
    Given that the hardware may be left in a random condition by the BIOS,
    it is conceivable that we then attempt to clear the DP_PIPEB_SELECT bit
    without us ever enabling/attaching the DP encoder to a pipe. Thus
    causing a NULL deference when we attempt to wait for a vblank on that
    crtc.
    
    Reported-and-tested-by: Bryan Christ <bryan.christ@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36314
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36456
    Reported-and-tested-by: Bo Wang <bo.b.wang@intel.com>
    Cc: stable@kernel.org
    Signed-off-by: Keith Packard <keithp@keithp.com>

available in  git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 drm-intel-fixes
Comment 5 Vivek Periaraj 2011-05-09 15:12:48 UTC
I cloned the directory like:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6

And looked for 'drm-intel-fixes' and couldn't find it? I think I am missing something here :)

But do you think the error 'No devices detected' is related to not getting i915 working or is it possible that I could get X working (atleast the 2D driver) just from xorg driver? I am not sure my particular variant of Sandybridge graphics driver is yet supported by the xorg driver?

Thanks,
Vivek.
Comment 6 Chris Wilson 2011-05-09 23:22:49 UTC
After cloning from Keith's tree, you will want to do "git checkout origin/drm-intel-fixes".

The first step in enabling chipset support is getting KMS (the Kernel Mode Setting) up and running. Until we detect the device, nothing will work.

And don't worry, all SNB variants are well supported by xf86-video-intel, mesa and va-api now.

You are just proving to be the exception. :)
Comment 7 Vivek Periaraj 2011-05-10 05:41:08 UTC
Chris,

I took the patch directly from the bug report in comment # 3, instead of git(was not able to get it working). But no luck on this patch too. Still I am getting "MUX INFO call failed" error before the screen going blank.

I have another computer that I can use to ssh to this one. Will that help in catching the oops? If so, how should I do?

Thanks,
Vivek.
Comment 8 Chris Wilson 2011-05-10 06:01:12 UTC
If the machine is panicing whilst bringing up the device, you are unlikely to be able to ssh in. Instead, we need to emit the panic to the network and capture it on the second system. To do this we need to setup netconsole.

You need to compile in netconsole and the driver for your ethernet card, delay loading of the i915.ko module until after the network (simply using a module for i915.ko is enough) and then we need to pass the configuration to netconsole on the kernel commandline. See Documentation/networking/netconsole.txt

For example I use  

  netconsole=@192.168.1.31/eth0,6666@192.168.1.10/00:17:f2:cb:f3:27

(local-port@local.addr/dev,remote-port@remote.addr/remote.mac.addr)

Then on the target machine you need to run netcat -u -l 6666 (or similar).
Comment 9 Vivek Periaraj 2011-05-10 12:32:07 UTC
Ah well. I tried every which way and not able to get this netconsole working. It appears to be such a simple thing but the kernel doesn't just send any packets to the remote machine. Is there any other way? Any other patches to try? Thanks for your patience.
Comment 10 Chris Wilson 2011-05-10 12:46:28 UTC
Hmm. Ok, we'll have to do this the hard way; trial-and-error. :(

If you are comfortable hacking the kernel, I would disable each of the calls to initialise each of the outputs until we find which is causing the hang (and then proceed down from there).

That is: comment out each of the intel_*_init() calls in intel_setup_outputs(), found in drivers/gpu/drm/i915/intel_display.c
Comment 11 Vivek Periaraj 2011-05-10 16:33:35 UTC
Somehow the netconsole started working. I believe it's because I hadn't set 'dmesg -n 8'. So I modprobed i915 and following is the output from that time to just before the screen going blank.

[  432.925269] [drm] Initialized drm 1.1.0 20060810
[  432.970939] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  432.970947] i915 0000:00:02.0: setting latency timer to 64
[  432.989926] i915 0000:00:02.0: irq 48 for MSI/MSI-X
[  432.989938] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  432.989952] [drm] Driver supports precise vblank timestamp query.
[  433.022907] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  433.022957] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  433.153084] Console: switching to colour frame buffer device 200x56
[  433.153097] fb0: inteldrmfb frame buffer device
[  433.153100] drm: registered panic notifier
[  433.153678] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
 
Any clues?

Thanks,
Vivek
Comment 12 Chris Wilson 2011-05-11 00:12:16 UTC
Ok, that's an important step forward, as we know that it is *not* panicking during output probing.

So as far as i915.ko is concerned everything is running... And you should be able to ssh in. Can you please do so, install intel_reg_dumper from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools and attach its output?

The problem seems to be that we fail to drive an output correctly, which is a far less severe problem than I first anticipated. And usually amounts to an incorrect value in the configuration tables...
Comment 13 Vivek Periaraj 2011-05-11 01:43:29 UTC
There seems to be no 'intel_reg_dumper', there is 'intel_gpu_dump'. Is that the one?

I get "Couldn't map MMIO region: No such file or directory" when I run it.
Comment 14 Chris Wilson 2011-05-11 01:50:04 UTC
intel_reg_dumper is not in the tarball, but is in the git repo [http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/tree/tools/intel_reg_dumper.c]

But couldn't map MMIO region... That error is unexpected. Hmm, worth trying again with the git checkout just in case it is an old bug...
Comment 15 Vivek Periaraj 2011-05-11 01:58:22 UTC
Created attachment 46586 [details]
intel_reg_dumper output
Comment 16 Vivek Periaraj 2011-05-11 01:59:18 UTC
Attached is the output of intel_reg_dumper (from git)
Comment 17 Chris Wilson 2011-05-11 02:11:49 UTC
Out of curiousity, which laptop is this? (Interested in grabbing all such examples of LVDS failures...)
Comment 18 Vivek Periaraj 2011-05-11 02:15:17 UTC
This is Dell Vostro 3750, just bought 15 days back. :)
Comment 19 Chris Wilson 2011-05-11 02:31:23 UTC
This is wrong:

                TRANSACONF: 0xc0000000 (enable, active)

That should include the 6 bpc setting for your pipe (0x50), 0x00 implies 8bpc which does not match your panel or pipe.

Hmm.
Comment 20 Vivek Periaraj 2011-05-11 07:11:44 UTC
I will be going to home in another couple of hours. Anything to try from my side?
Comment 21 Chris Wilson 2011-05-11 07:31:55 UTC
If you get the opportunity, can you test drm-intel-next? I'd like to base the testing on the refactored code.
Comment 22 Vivek Periaraj 2011-05-11 08:59:36 UTC
Sure. I am not used to git, so if you could give me step by step instruction to clone and fetch the files, that would be great. Thanks.
Comment 23 Vivek Periaraj 2011-05-11 10:07:30 UTC
I got this from the net. I am using this, hope it's not outdated, as I see several repos with different user names.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel.git
cd drm-intel
git checkout drm-intel-next
Comment 24 Vivek Periaraj 2011-05-11 11:23:14 UTC
I tried with the above repository. Still same issue with exact same results. :( 

If there is a later repository, let me know, then I will try with that too. Thanks.
Comment 25 Jesse Barnes 2011-05-11 13:41:14 UTC
Can you try the bpp-color branch of my git repo at git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel.git?
Comment 26 Vivek Periaraj 2011-05-11 16:57:32 UTC
Same deal with 'bpp-color' too. One change though is the new 'fbcon' line as attached below:

[  173.504728] [drm] Initialized drm 1.1.0 20060810
[  173.545918] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  173.545928] i915 0000:00:02.0: setting latency timer to 64
[  173.567845] i915 0000:00:02.0: irq 48 for MSI/MSI-X
[  173.567857] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  173.567861] [drm] Driver supports precise vblank timestamp query.
[  173.600636] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  173.600676] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  173.877780] fbcon: inteldrmfb (fb0) is primary device
[  173.877941] Console: switching to colour frame buffer device 200x56
[  173.877951] fb0: inteldrmfb frame buffer device
[  173.877954] drm: registered panic notifier
[  173.878216] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS

I have seen in other places in the web for people with exact same message except that it didn't panic for them. I appears to be harmless messages for them.
Comment 27 Vivek Periaraj 2011-05-16 14:21:05 UTC
Just not leave out anything, I also tried 'drm-intel-fixes' with same results.

I think we have exhausted everything?
Comment 28 sanova 2011-05-19 11:30:37 UTC
I have the same model: dell vostro 3750, and same problems.
I have tried to run many distros and system stops while loading at the following point:

[Firmware Bug] ACPI(PEGP) defines _DOD but not _DOS

As i can read, it seems depending on intel graphic driver / intel kernel modules.
Comment 29 Vivek Periaraj 2011-05-23 08:07:16 UTC
Hi Chris,

When can we expect this to be fixed? In the next kernel?

Thanks,
Vivek.
Comment 30 Vivek Periaraj 2011-06-08 15:04:25 UTC
Since I saw few patches being released, I tried Jesse's 'bpp_color' branch and now I am get past the previous hang point and yet the kernel still throws oops message. But atleast this time there are more information. Pls find the attachment for the entire output. Hope it helps

[  139.797264] [drm] Initialized drm 1.1.0 20060810
[  139.833062] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  139.833140] i915 0000:00:02.0: setting latency timer to 64
[  139.848995] i915 0000:00:02.0: irq 48 for MSI/MSI-X
[  139.849069] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[  139.849151] [drm] Driver supports precise vblank timestamp query.
[  139.849404] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  139.849534] [drm:intel_dsm_platform_mux_info] *ERROR* MUX INFO call failed
[  140.091411] fbcon: inteldrmfb (fb0) is primary device
[  140.264170] Console: switching to colour frame buffer device 200x56
[  140.268258] fb0: inteldrmfb frame buffer device
[  140.268287] drm: registered panic notifier
[  140.268496] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
[  140.272123] acpi device:28: registered as cooling_device4
[  140.272646] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/device:27/LNXVIDEO:00/input/input11
[  140.272766] ACPI: Video Device [PEGP] (multi-head: yes  rom: yes  post: no)
[  140.290237] acpi device:2b: registered as cooling_device5
[  140.290570] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:01/input/input12
[  140.290705] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[  140.290903] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Comment 31 Vivek Periaraj 2011-06-08 15:06:00 UTC
Created attachment 47740 [details]
Latest dmesg
Comment 32 Vivek Periaraj 2011-06-08 15:19:50 UTC
Created attachment 47741 [details]
Reattaching the dmesg log
Comment 33 Vivek Periaraj 2011-06-08 18:04:39 UTC
Surprise! 

I tried the latest kernel from debian experimental (2.6.39-rc7-686-pae) just to see if something got magically fixed. And it did. Now the i915 module loads fine without kernel oops and xorg driver correctly detects the card and sets the modes properly.

I am able to run with 1600x900 plus 1280x1024 external monitor.

I really don't know what got fixed. But I am glad I tried the latest kernel. :)

Thanks guys, for all the patience and support!

Pls let me know if you need any info.
Comment 34 sanova 2011-06-13 11:13:14 UTC
On my pc it doesn't work. I'm on 64-bit architecture. I don't know if it can depends on that. But i compile kernel from git with patch and nothing changes.
I have not solved yet.
Comment 35 Vivek Periaraj 2011-06-13 11:39:37 UTC
Created attachment 47914 [details]
Latest intel_reg_dumper output

Attaching my reg dumper just after it all started working. Maybe it helps in finding the problem
Comment 36 sanova 2011-06-15 06:18:44 UTC
i have to correct me. I solved too. The reason of fail is usb optical mouse. If i keep usb mouse connected at start up it stops at that loading point otherwise it run loading intel module without stopping.
Comment 37 Daniel Vetter 2012-05-11 05:51:10 UTC
Ok, both reporters say that things work now. If you still have issues, please file a new bug report (because this one here is rather old, so likely a new issues).

Thanks for reporting this.
Comment 38 Vivek Periaraj 2012-06-20 09:27:36 UTC
Hello There,

I am getting this problem still. In the sense, that when USB devices are plugged into the ports, 
intel's xorg driver is failing to load. I had to remove the devices for the driver to load 
correctly. Should I have to create a new bug report for this?

Regards,
Vivek.


On Friday, May 11, 2012 06:21:10 PM bugzilla-daemon@freedesktop.org wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=37029
> 
> Daniel Vetter <daniel@ffwll.ch> changed:
> 
>            What    |Removed                     |Added
> ---------------------------------------------------------------------------
> - Status|NEW                         |RESOLVED
>          Resolution|                            |FIXED
> 
> --- Comment #37 from Daniel Vetter <daniel@ffwll.ch> 2012-05-11 05:51:10
> PDT --- Ok, both reporters say that things work now. If you still have
> issues, please file a new bug report (because this one here is rather old,
> so likely a new issues).
> 
> Thanks for reporting this.
Comment 39 Daniel Vetter 2012-06-20 09:52:06 UTC
On Wed, Jun 20, 2012 at 6:27 PM,  <bugzilla-daemon@freedesktop.org> wrote:
> --- Comment #38 from Vivek Periaraj <Vivek.Periaraj@gmail.com> 2012-06-20 09:27:36 PDT ---
> I am getting this problem still. In the sense, that when USB devices are
> plugged into the ports,
> intel's xorg driver is failing to load. I had to remove the devices for the
> driver to load
> correctly. Should I have to create a new bug report for this?

Yep, please file a new bug. It's ridiculously hard to triage graphics
bugs, usually we can only mark them as duplicates once we've found the
patch that fixes things.

For your bug report please try out the latest kernels (i.e. 3.5-rc)
first just to check whether that issue is fixed already.

If not, please attach drm.debug=0xe to your kernel cmdline and boot
once with the usb attached and once without and attach the full dmesg
to your bug report for both cases. That should hopefully tell us where
things fail in the usb case.
-Daniel
Comment 40 Vivek Periaraj 2012-06-22 08:31:45 UTC
Hi Daniel,

I tried with the latest kernel available from the debian repository (3.4-trunk-686-pae) and this 
intel driver from this kernel is working correctly. That is, it proceeds without interruption even 
when the USB devices are plugged in. Thanks. I will raise a bug report when I see the problem again.

Regards,
Vivek.

On Wednesday, June 20, 2012 10:22:06 PM you wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=37029
> 
> --- Comment #39 from Daniel Vetter <daniel@ffwll.ch> 2012-06-20 09:52:06
> PDT ---
> 
> On Wed, Jun 20, 2012 at 6:27 PM,  <bugzilla-daemon@freedesktop.org> wrote:
> > --- Comment #38 from Vivek Periaraj <Vivek.Periaraj@gmail.com> 2012-06-20
> > 09:27:36 PDT --- I am getting this problem still. In the sense, that
> > when USB devices are plugged into the ports,
> > intel's xorg driver is failing to load. I had to remove the devices for
> > the driver to load
> > correctly. Should I have to create a new bug report for this?
> 
> Yep, please file a new bug. It's ridiculously hard to triage graphics
> bugs, usually we can only mark them as duplicates once we've found the
> patch that fixes things.
> 
> For your bug report please try out the latest kernels (i.e. 3.5-rc)
> first just to check whether that issue is fixed already.
> 
> If not, please attach drm.debug=0xe to your kernel cmdline and boot
> once with the usb attached and once without and attach the full dmesg
> to your bug report for both cases. That should hopefully tell us where
> things fail in the usb case.
> -Daniel
Comment 41 Florian Mickler 2012-07-01 03:34:07 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc1:

commit e04c735029bc133466b89265a0745a226d0eac23
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 2 20:43:56 2012 +0100

    drm/i915: Wait for the clocks to stabilise before updating PLLs


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.