Bug 50545 - [snb rc6] system hang after idling for some time
Summary: [snb rc6] system hang after idling for some time
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Daniel Vetter
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-31 08:26 UTC by Wen-chien Jesse Sung
Modified: 2017-07-24 23:01 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
netconsole output (43.42 KB, text/plain)
2012-06-15 02:22 UTC, Wen-chien Jesse Sung
no flags Details
dmesg (163.03 KB, text/plain)
2012-06-15 02:36 UTC, Wen-chien Jesse Sung
no flags Details
disable rc6 for some models (2.56 KB, patch)
2012-08-22 09:19 UTC, Wen-chien Jesse Sung
no flags Details | Splinter Review
implement Hiz w/a for msaa (1.58 KB, patch)
2012-12-14 22:19 UTC, Daniel Vetter
no flags Details | Splinter Review

Description Wen-chien Jesse Sung 2012-05-31 08:26:32 UTC
Ref: https://bugs.launchpad.net/bugs/1002170

Hi,

Lenovo ThinkCentre S510 (SandyBridge i5-2500S) may hang after idling for some period of time. This may be related to rc6 since that if I add i915.i915_enable_rc6=0 to boot parameter then this issue is gone.

This issue also happens on kernel 3.4.

A screenshot when system hangs:
https://launchpadlibrarian.net/105661463/Photo%2012-5-18%2017%2038%2038.jpg

If there's any other information needed, please kindly let me know.

Thank you.
Comment 1 Daniel Vetter 2012-06-01 00:16:38 UTC
Please attach dmesg with drm.debug=0xe added to your kernel commandline. Also, how dead is the system? I.e. does network/ssh still work, does the magic SysRq to reboot still work, or is it a true hard-hang? And can you try to wire up netconsole so that we could have a peak at the last breaths of the system before it goes down?
Comment 2 Wen-chien Jesse Sung 2012-06-15 02:22:27 UTC
Created attachment 63062 [details]
netconsole output

Hi Daniel,

Please find the attached file for netconsole output.

When it hangs, neither SysRq magic nor network/ssh works. From ssh terminal I can tell that it died after 57 minutes, but the last entry in the log is at 1787.315652, so there's no log when the system goes down.
Comment 3 Daniel Vetter 2012-06-15 02:28:12 UTC
Can you also attach dmesg so that we have all the interesting lines from boot-up with drm.debug=0xe, too?
Comment 4 Wen-chien Jesse Sung 2012-06-15 02:36:51 UTC
Created attachment 63064 [details]
dmesg

dmesg is attached.

Thank you.
Comment 5 Wen-chien Jesse Sung 2012-07-02 18:55:07 UTC
Hi Daniel,

Is there anything I can do to get more info about this issue?
Comment 6 Daniel Vetter 2012-07-03 01:30:49 UTC
I'm running a bit low on ideas, but one thing would be to stop all drm clients (i.e. X) and check whether it still hangs. We still need to load the drm/i915.ko driver, because only when we load and enable rc6 can the cpu die actually reach the lowest power state, i.e. I want to check whether this might be an issue outside of the gpu, only brought to light due to the low power state.
Comment 7 Wen-chien Jesse Sung 2012-07-06 02:07:05 UTC
Daniel,

Tried with a normal boot, and stopped all X related processes. System hangs after 15 hours.
Comment 8 Chris Wilson 2012-07-06 02:23:57 UTC
Ok, what happens if the i915 is never loaded at all? Try something like adding
blacklist i915 to modprobe.conf, or append i915.noload to your kernel commandline.
Comment 9 Wen-chien Jesse Sung 2012-07-23 05:19:51 UTC
Hi Chris,

By adding i915 into blacklist and using text mode, system runs without any problem and has "2 days, 19:15" uptime so far.
Comment 10 Wen-chien Jesse Sung 2012-08-22 09:19:35 UTC
Created attachment 65942 [details] [review]
disable rc6 for some models

Hi Daniel and Chris,

Since there's another snb machine does not work well when rc6 is enabled ( https://launchpad.net/bugs/1008867 ), maybe we can just disable rc6 for these machines to make them at least work?
Comment 11 Ben Widawsky 2012-08-22 23:12:07 UTC
By any chance, does this patch help?
https://patchwork.kernel.org/patch/1363021/
Comment 12 Wen-chien Jesse Sung 2012-08-27 08:26:02 UTC
Hi Ben,

No, this patch does not help. System hangs after 2 days and 6 hours.
Comment 13 Wen-chien Jesse Sung 2012-09-10 08:23:02 UTC
Hi,

What do you think of the patch in #c10 ? Should I send it to mailing list also?
Comment 14 Daniel Vetter 2012-09-10 15:29:08 UTC
I think it'd be much better to figure out the root cause and fix it - since likely these rc6 issues don't have anything to do with these models specifically, we just haven't figured out yet what the real problem is.
Comment 15 Wen-chien Jesse Sung 2012-09-10 16:26:27 UTC
Hi Daniel,

Then I guess it's better to have a new bug entry for lp1008867. :)
https://launchpad.net/bugs/1008867
I'll create one later.

Also, please could you suggest what I can do to get useful info for finding out the root cause?

Thank you.
Comment 16 Chris Wilson 2012-12-09 14:03:20 UTC
*** Bug 53626 has been marked as a duplicate of this bug. ***
Comment 17 Daniel Vetter 2012-12-14 22:19:04 UTC
Created attachment 71524 [details] [review]
implement Hiz w/a for msaa

Kernel patch, please test.
Comment 18 Daniel Vetter 2012-12-14 22:20:17 UTC
Also: Is this an SNB GT1? Please spec the exact model and pci id of the VGA device.
Comment 19 Jani Nikula 2013-01-08 15:07:09 UTC
Please test and provide the requested info.
Comment 20 Jakub Luzny 2013-01-09 18:59:21 UTC
Hello, I have a similiar issue. My system ocassionally hangs with the same screen corruption, plays audio in a loop for about a second and then the laptop fan revs up. 

It happens more often when playing some flash videos. Sometime the system hangs twice a days, sometimes after a week. I'm going to try disabling rc6 after next crash. 

It's a MSI CR640 Sandy Bridge laptop with i3-2310M and:
00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0116] (rev 09)

Thank you
Comment 21 Wen-chien Jesse Sung 2013-01-10 08:00:58 UTC
It is an Intel(R) Core(TM) i5-2500S CPU @ 2.70GHz, and the VGA device is
00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102] (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device [17aa:307b]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 45
	Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: <access denied>
	Kernel driver in use: i915
	Kernel modules: i915

Output of lspci and the content of cpuinfo can be found at
https://launchpadlibrarian.net/105660635/ProcCpuinfo.txt
https://launchpadlibrarian.net/105660632/Lspci.txt

I'll test the patch next week and report the result.

Thank you.
Comment 22 Wen-chien Jesse Sung 2013-01-24 07:37:16 UTC
The patch in comment 17 should be the right one. With it applied, system stays alive after two days.

Thanks!
Comment 23 Daniel Vetter 2013-01-24 09:56:40 UTC
Awesome that this works out. Patch is merged into 3.8-rc2 as

commit 4283908ef7f11a72c3b80dd4cf026f1a86429f82
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Dec 14 23:38:28 2012 +0100

    drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled

I'm writing the mail to the stable kernel team right now so that it gets applied to older kernels. Thanks for reporting this issue.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.