Bug 30352 - [gm45 regression 2.6.36-rc5] xserver hangs if I close the lid
Summary: [gm45 regression 2.6.36-rc5] xserver hangs if I close the lid
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium major
Assignee: Chris Wilson
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-23 10:51 UTC by darkbasic
Modified: 2017-07-24 23:06 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (68.64 KB, text/plain)
2010-09-24 02:56 UTC, darkbasic
no flags Details
Xorg.0.log (33.19 KB, text/plain)
2010-09-24 02:57 UTC, darkbasic
no flags Details
intel_reg_dumper (10.47 KB, text/plain)
2010-09-24 02:57 UTC, darkbasic
no flags Details
ShowState (299.20 KB, text/plain)
2010-09-25 11:34 UTC, darkbasic
no flags Details
/var/log/messages with drm-intel-next and debugging on (63.32 KB, text/plain)
2010-09-29 13:26 UTC, darkbasic
no flags Details
dmesg with drm-intel-next and debugging on (124.06 KB, text/plain)
2010-09-29 13:27 UTC, darkbasic
no flags Details
A photo of the screen at the wrong resolution (1024x768) (135.29 KB, image/jpeg)
2010-09-29 13:28 UTC, darkbasic
no flags Details
Corruption after resuming from suspension. Had to reboot using magic SysRQ keys. (148.74 KB, image/jpeg)
2010-10-04 03:22 UTC, darkbasic
no flags Details

Description darkbasic 2010-09-23 10:51:50 UTC
It's a problem I already had in the past with 2.6.3[4 or 5, I don't remember] and I think it's related to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=27169
Fortunately it had been solved in 2.6.36 (at least rc3 and rc4). Today I upgraded to 2.6.36-rc5-git and it came back again :-(

I use kde-4.5.1 and xorg-server-1.9. Kde locks the screen when I close the lid.

Let me know if you need logs or other infos.

Darkbasic
Comment 1 Chris Wilson 2010-09-23 14:21:20 UTC
dmesg, Xorg.log, chicken feet, intel_reg_dumper.
Comment 2 darkbasic 2010-09-24 02:56:52 UTC
Created attachment 38925 [details]
dmesg
Comment 3 darkbasic 2010-09-24 02:57:21 UTC
Created attachment 38926 [details]
Xorg.0.log
Comment 4 darkbasic 2010-09-24 02:57:44 UTC
Created attachment 38927 [details]
intel_reg_dumper
Comment 5 darkbasic 2010-09-24 02:59:39 UTC
Here it is. I will post chicken feet as soon as you will solve bug #27169
 :P
Comment 6 Chris Wilson 2010-09-24 03:04:05 UTC
Just to check those logs were after the KDE hang? Don't see any evidence of either GPU or KMS errors, so I presume we have a soft-lockup.

echo t > /proc/sysrq-trigger
Comment 7 darkbasic 2010-09-24 04:28:24 UTC
> Just to check those logs were after the KDE hang?

Yes, I switched to a virtual console while the (hanged) xserver was still running.

> echo t > /proc/sysrq-trigger

I will try as soon as it will happen again. Fortunately it doesn't happen every time I close the lid.

Thank you.
Comment 8 darkbasic 2010-09-25 11:33:07 UTC
One more freeze, one more log :)
I attached the output of "echo t > /proc/sysrq-trigger" from /var/log/messages.
Comment 9 darkbasic 2010-09-25 11:34:40 UTC
Created attachment 38953 [details]
ShowState

List of current tasks and their information.
Comment 10 Chris Wilson 2010-09-25 11:55:06 UTC
Hmm, I am puzzled. Didn't see any of the usual patterns associated with an X freeze in that dump.

Let's try a fishing expedition. :(

Can you enable as much of the kernel debugging as you can tolerate (mutex debugging in particular) and see if that catches anything untoward?
Comment 11 darkbasic 2010-09-27 03:31:24 UTC
> Can you enable as much of the kernel debugging as you can tolerate (mutex
> debugging in particular) and see if that catches anything untoward?

Ok... Then what should I attach?
/var/log/messages?
/var/log/messages after 'echo t > /proc/sysrq-trigger'?
something from debug-fs?
Comment 12 Chris Wilson 2010-09-27 03:33:45 UTC
dmesg or /var/log/messages

Also a perf record -f -g -a across the lid hang and then attach the output of perf timechart. Might show something.
Comment 13 darkbasic 2010-09-28 04:30:49 UTC
> Also a perf record -f -g -a across the lid hang

Uhm... Have I to start it and wait for the lid hang? Because it takes few seconds to write ~1MB of data and I may have to wait half a day before an hang happens...
Comment 14 Chris Wilson 2010-09-28 05:05:04 UTC
Sorry, I thought this was highly reproducible.

git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel.git drm-intel-next contains some more hangchecks for catching missing interrupts. [There are currently a few regressions involving EDID probing and others which may bite instead.]
Comment 15 darkbasic 2010-09-28 05:46:54 UTC
I'm compiling... I hope it will help.

> [There are currently a few regressions involving EDID probing and others which
> may bite instead.]

Sweetness compared to bug #27169, but thanks for the advice :)
Comment 16 darkbasic 2010-09-29 02:07:21 UTC
Yesterday resuming from suspension triggered a freeze which forced me to reboot using magic sysrq keys. Unfortunately I was still using the mainline kernel.
Now I'm using drm-intel-next with ALL debugging on and still no hang happened. Anyway I noticed kde asking to start the external monitor configuration wizard sometimes when I open the lid. As I said, maybe it's really #27169 related.
Comment 17 darkbasic 2010-09-29 13:26:50 UTC
Created attachment 39052 [details]
/var/log/messages with drm-intel-next and debugging on
Comment 18 darkbasic 2010-09-29 13:27:37 UTC
Created attachment 39053 [details]
dmesg with drm-intel-next and debugging on
Comment 19 darkbasic 2010-09-29 13:28:20 UTC
Created attachment 39054 [details]
A photo of the screen at the wrong resolution (1024x768)
Comment 20 darkbasic 2010-09-29 13:34:10 UTC
Finally! Here are the logs with drm-intel-next and full debug on :)
Also, something *VERY* strange happened with drm-intel-next: after the hang I restarted the X server, but it detected the wrong resolution: 1024x768 instead of 1280x800.
It is nothing new (I already had this problem before, but it was very rare) apart from the fact that _after_ the hang it *ALWAYS* detects the wrong resolution until I reboot!
Comment 21 darkbasic 2010-09-29 16:08:38 UTC
# xrandr 
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 8192 x 8192
LVDS1 connected 1280x800+0+0 (normal left inverted right x axis y axis) 303mm x 190mm
   1280x800       60.0*+
   1024x768       60.0  
   800x600        60.3     56.2  
   640x480        59.9  
VGA1 disconnected (normal left inverted right x axis y axis)
HDMI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)
HDMI2 disconnected (normal left inverted right x axis y axis)
DP2 disconnected (normal left inverted right x axis y axis)
DP3 disconnected (normal left inverted right x axis y axis)
TV1 disconnected 1024x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
  1024x768 (0x50)   26.9MHz
        h: width  1024 start 1025 end 1088 total 1120 skew    0 clock   24.0KHz
        v: height  768 start  769 end  800 total  801           clock   30.0Hz
Comment 22 darkbasic 2010-10-02 23:59:53 UTC
Did you discover something?
Comment 23 Chris Wilson 2010-10-03 02:39:26 UTC
Well, you still have the spurious TV detection which is throwing everything off. I suspected that this is a render-whilst-modesetting hang, except that it bears none of the usual symptoms (a hung gpu). So I'm back at a soft-hang, only I don't have a good suggestion on how to track it down at the moment. Does switching between compositor modes (GL vs Render) make any difference on the occurrence of the hang?
Comment 24 darkbasic 2010-10-03 04:06:18 UTC
I cannot use kde 4.5 desktop effects because they still don't work with intel and mesa 7.10-devel, even disabling blur.
Comment 25 Chris Wilson 2010-10-03 04:24:41 UTC
Thanks, that rules out the more frequent occurrences of swap+interrupt hangs.
Comment 26 darkbasic 2010-10-04 03:22:29 UTC
Created attachment 39149 [details]
Corruption after resuming from suspension. Had to reboot using magic SysRQ keys.
Comment 27 darkbasic 2010-10-04 03:23:55 UTC
I don't if it may be useful, but I experienced corruption after resuming from suspend to ram. I had to reboot using magic sysrq keys. I attached a photo.
Comment 28 darkbasic 2010-10-04 03:25:25 UTC
P.S. I already switched back to 2.6.36-rc6-20100930 because I had even more problems with intel-drm-next.
Comment 29 Chris Wilson 2010-10-04 04:05:30 UTC
(In reply to comment #27)
> I don't if it may be useful, but I experienced corruption after resuming from
> suspend to ram. I had to reboot using magic sysrq keys. I attached a photo.

Just tiling corruption, a fence register being misprogrammed. Extremely surprising though on gm45.
Comment 30 darkbasic 2010-10-04 05:42:39 UTC
I'm not so surprised: 2.6.32 had been the latest working kernel and starting from 2.6.33 (which introduced the spurious TV detection) I had only problems.
Comment 31 Jesse Barnes 2011-02-22 11:30:59 UTC
Is this still an issue with current bits?  Some of the X VT switch bugs come to mind with this issue, so an X server upgrade may also help?  Using gdb to attach to the hung X server and getting a backtrace would identify that issue.
Comment 32 darkbasic 2011-02-22 11:41:38 UTC
Fixed with 2.6.38-drm-fixes + xorg-server-1.9.4.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.