Bug 50297 - [SNB regression] linux-image-3.2.0-2-amd64: Kernel crash when closing the lid
Summary: [SNB regression] linux-image-3.2.0-2-amd64: Kernel crash when closing the lid
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-23 20:06 UTC by Sylvain Archenault
Modified: 2017-07-24 23:01 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
syslog crash log (5.93 KB, text/plain)
2012-05-23 20:06 UTC, Sylvain Archenault
no flags Details
output of lspci -v (7.71 KB, text/plain)
2012-05-23 20:08 UTC, Sylvain Archenault
no flags Details
output of lspci -nn (1.90 KB, application/octet-stream)
2012-05-23 20:09 UTC, Sylvain Archenault
no flags Details
kern.log with drm.debug=0xe (148.33 KB, text/plain)
2012-05-25 18:59 UTC, Sylvain Archenault
no flags Details
dmesg with drm.debug=0xe (60.39 KB, text/plain)
2012-05-25 19:00 UTC, Sylvain Archenault
no flags Details

Description Sylvain Archenault 2012-05-23 20:06:51 UTC
Created attachment 62045 [details]
syslog crash log

Hello

On my Dell XPS 14z laptop running SID up-to-date, I have a kernel oops 
whenever
I close the lid. It's configured to suspend, but the system crashes. If I
suspend from the menum it works fine. The bug seems to appeare in 3.1.0 
because
it's working with 3.0.0. I also tried the kernel from experimental, 
3.3.0 and I
have the same problem.

As you can see in this thread, http://lists.debian.org/debian-
user/2012/05/msg02078.html, if affects other laptop running similar 
hardware.

I installed kerneloops, but I'm not sure it's working correctly because 
I can't find the trace in syslog anymore... kerneloops.org seems currently down as
well.

I attached the part of the syslog concerned with the crash and lspci informations.

Let me know if you need something else.

Thanks
Sylvain
Comment 1 Sylvain Archenault 2012-05-23 20:08:41 UTC
Created attachment 62046 [details]
output of lspci -v
Comment 2 Sylvain Archenault 2012-05-23 20:09:00 UTC
Created attachment 62047 [details]
output of lspci -nn
Comment 3 Sylvain Archenault 2012-05-23 20:09:32 UTC
Bug report in debian bts
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=674243
Comment 4 Daniel Vetter 2012-05-24 01:52:10 UTC
We need more context around that backtrace. Can you please attach the full dmesg? The system shouldn't die with it (only display), so ssh should still work. Otherwise please stitch it together from your logfiles. Also, please boot with drm.debug=0xe added to your kernel cmdline, that will dump much more information about drm/i915 into dmesg.
Comment 5 Daniel Vetter 2012-05-24 01:53:01 UTC
And please attach logfiles with mimetype "text/plain", much easier to handle that way.
Comment 6 Daniel Vetter 2012-05-24 01:56:48 UTC
And please test what happens with the latest 3.4 kernel.
Comment 7 Sylvain Archenault 2012-05-25 18:51:53 UTC
I tried with the latest kernel and have the same problem. When I close the lid, the laptop shut down the network connection so I can't use SSH, I would need to change the setup.

I also boot with the debug drm. dmesg doesn't really show anything during the crash. kern.log is better, so I attached it.

Let me know if I can provide more information.

Thanks
Sylvain
Comment 8 Sylvain Archenault 2012-05-25 18:59:51 UTC
Created attachment 62119 [details]
kern.log with drm.debug=0xe
Comment 9 Sylvain Archenault 2012-05-25 19:00:24 UTC
Created attachment 62120 [details]
dmesg with drm.debug=0xe
Comment 10 Sylvain Archenault 2012-05-31 17:38:17 UTC
I also tried with the new intel driver 2.19 that was uploaded to sid a couple of days ago. I have a similar crash, but this time the screen didn't go black, it stayed they way it was before I closed the lid.
Comment 11 Daniel Vetter 2012-06-01 00:12:35 UTC
Hm, I don't see the NULL deref BUG in the latest set of logs any more. And it would be really good to see that in the context of the larger dmesg (so that we know what happened around that time wrt suspend/resume). With the latest upgrades, has anything changed with how the machine dies?
Comment 12 Sylvain Archenault 2012-06-04 08:47:39 UTC
The only thing that changed the last time I reproduced the bug was that the screen didn't go black.

As for the log, I was also surprised that the NULL deref BUG was not in the log files. Is there anything else that I can add to the kernel cmd line that could help ?
Comment 13 Daniel Vetter 2012-06-04 08:50:02 UTC
Ok, I guess we're a bit stuck on this one. Can you please try to bisect where exactly this problem has been introduced between 3.0 and 3.1?
Comment 14 Sylvain Archenault 2012-06-04 08:54:35 UTC
I guess, but how could i do that ? I know how to build a kernel, I should be able to get the source from git for 3.0 and 3.1. I suppose you'd like me to build the kernel at different revision ?
Comment 15 Daniel Vetter 2012-06-04 09:05:59 UTC
1) Grab a git clone (any existing recent one should be good)

2) Check out 3.0

git checkout v3.0

3) Compile kernel and confirm that it's indeed good. Tell the git bisect machine so

git bisect good

(this will ask you whether you want to start the bisect)

4) Checkout out 3.1

git checkout v3.1

5) Compile kernel and confirm that it's indeed bad. Tell git bisect if that's the case.

git bisect bad

git will automatically check out a mid-point commit that you can then check. Depending upon the result, tell git so with

git bisect [bad|good]

You can check your progress with

git bisect visualize
Comment 16 Sylvain Archenault 2012-06-07 06:32:09 UTC
So I build kernel from git for three version 3.0, 3.1 and 3.2, they're all working correctly. The 3.2 version shipped by debian is crashing. I don't know exactly what are the differences.

Something that may also be of interest is that when I tried 3.2, the wifi was not working, probably because I forgot to do something. So that may be something to look at. 

I'll try different things.
Comment 17 Daniel Vetter 2012-06-07 06:37:29 UTC
Hm, can you please try the latest upstream kernel, i.e. 3.4.x? Just to check whether this might be a problem introduce later on and then brought to debian's stable 3.2 via a backport.
Comment 18 Sylvain Archenault 2012-06-07 10:11:31 UTC
I already tried 3.4.0 end of may, I can try 3.4.1, but I don't think it would do a difference.
Comment 19 Daniel Vetter 2012-06-07 12:10:47 UTC
Ok, sorry, I've missed that you've tested 3.4 already. So it looks like your problem has been introduced in mainline between 3.2 and 3.4 somewhere (and the issue in debian's kernel might or might not be due to a backport). So can you please try to do the bisect with these two kernel versions from the mainline git?

i.e.

git bisect good v3.2
git bisect bad v3.4

Usually if we have the bad commit it's _much_ easiert to fix the bug.
Comment 20 Sylvain Archenault 2012-06-25 19:31:22 UTC
I build kernel up to 3.4.4 without being able to reproduce the problem, it concerns only kernel provided by Debian.

I'll ask Debian kernel team advice.

Thanks
Sylvain
Comment 21 Daniel Vetter 2012-06-26 01:03:56 UTC
Ok, I'll close this as a downstream issue, thanks for reporting this bug anyway. And if the investigation from the debian team shows that this is indeed and upstream bug, please reopen so we can have a look at this again.

(One quick check: Have you used the exact debian kernel config to build the upstream kernel?)
Comment 22 Jonathan Nieder 2012-08-29 16:29:42 UTC
Hi again,

You were right --- the distro-specific aspect is a .config difference.

Sylvain Archenault writes[1]:

> After a lot of tries, I found out which module causes the crash, it's 
> CONFIG_HOTPLUG_PCI_ACPI (module acpiphp).
>
> When it's built as a module, it's not loaded by default on my machine, 
> and closing the lid works. But if I load it, it crashes.

The crash is in gen6_write_entry; your help would still be much appreciated. Any ideas for tracking it down further?

Thanks,
Jonathan

[1] http://bugs.debian.org/674243#75
Comment 23 Sylvain Archenault 2012-11-15 21:36:40 UTC
The bug has been fixed with this commit:
https://patchwork.kernel.org/patch/1562951/


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.