Summary: | [gm45] hotplug storm after S3 resume (udev spins on drm device after wakeup) | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Tony Mantler <nicoya> | ||||
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | medium | CC: | jani.nikula, jrnieder | ||||
Version: | unspecified | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Description
Tony Mantler
2010-05-02 11:13:06 UTC
*** This bug has been marked as a duplicate of bug 25259 *** Tony writes[1]: > Yes, I'm still getting the issue, sort of. My understanding is that > some parts of the hotplug pipeline have moved around, but I still > get a spam of hotplug events from the graphics that nearly kills the > system after waking up. The most reliable way to trigger it is to > wake from sleep, open firefox (iceweasel), and play a youtube video > (flash). The symptoms will often occur without going through these > exact steps though. > > Executing the command "intel_reg_write 0x61110 0x0" as root stops > the hotplug spam and restores system functionality, though this also > apparently stops all hotplug events so the system won't detect > attaching an external monitor or something to the VGA port. > > I'm currently running linux-image-2.3.0-2-amd64 package version 3.2.12-1. > > I could certainly test patched kernels, as I can very reliably > reproduce the problem. Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 doesn't take care of these symptoms. Any hints for tracking this down? [1] http://bugs.debian.org/572537 Ok, to dig into this one we need full dmesg with drm.debug=0xe added to the kernel cmdline. Please also grab the dmesg with the added debug options while the problem is happening. Created attachment 61900 [details]
kern.log with drm.debug=0xe
Hotplug storm occurs upon wake on May 20th, after approx 23hrs in S3 sleep.
It's worth noting that the storm does *not* occur after a few much shorter S3 sleeps on May 19th. Makes me wonder if some register isn't getting reinitialized upon wake and is resuming with a decayed value after being powered down.
Timeout. Please do reopen if you can still reproduce the issue and help us diagnose the problem, thanks. Um, if I understand correctly then Tony replied with the log Daniel requested and then there was no reply. What did I miss? (In reply to comment #6) > Um, if I understand correctly then Tony replied with the log Daniel > requested and then there was no reply. What did I miss? Just left in NEEDINFO and I did a mass-close of unchanged bug reports in that state... All indications point towards flaky hardware, as it alternates on suspend&resume cycles between different HDMI/DP ports. I'm not aware of any particular erratum concerning gm45 hotplug detection that hasn't already been implemented, so unless this is widespread across many different manufacturers I would say it was a model, even machine, specific defect. If the exact cause can't be narrowed down, is there any sort of mitigation that might be appropriate? Rate limiting duplicate hotplug events maybe? (In reply to comment #9) > If the exact cause can't be narrowed down, is there any sort of mitigation > that might be appropriate? Rate limiting duplicate hotplug events maybe? Have you tried recent kernels? In comment #1, this has already been resolved dupe of bug 25259, which in turn has been resolved dupe of bug 25327, which has been fixed. There are also plenty of other irq/hotplug related changes in recent kernels; 2.6.32.9 isn't exactly new. (In reply to comment #10) > Have you tried recent kernels? In comment #1, this has already been resolved > dupe of bug 25259, Trying newer kernels is generally good advice for the restless, but I also want to point your attention to existing data in this same bug: | Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 | doesn't take care of these symptoms. Thanks and hope that helps, Jonathan (In reply to comment #11) > Trying newer kernels is generally good advice for the restless, but I also > want to point your attention to existing data in this same bug: > > | Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 > | doesn't take care of these symptoms. Thanks, I missed that somehow. Even so, the kernel is a fast moving target, and IMHO trying, say, current upstream master is much more productive than trying to go through all the changes since 3.2 in our irq/hotplug/suspend code that might have affected this bug. Also, if the problem still persists, I think debugging on current kernels that we work on is more likely to lead to correct conclusions anyway. Tasking to Daniel, far out on his hotplug todo list is interrupt mitigation for naughty hardware. Bug/behaviour is still present as of Debian kernel package 3.2.0-4, which corresponds to 3.2.35 mainline apparently. This should be fixed by the interrupt storm detection in 3.10. Presumed now fixed. Sorry for the delay, 3.10 hasn't filtered its way down to my laptop just yet. When it does I'll check the behaviour and reopen if it's still being annoying. As of 3.8 at least the problem still existed, and the intel_reg_write command also stopped working (just reported invalid argument when trying to run or some such) so that was a bit of a disaster. Not sure if I should open a bug for the command not working, as I theoretically won't need it once this bug is fixed. (In reply to comment #17) > Sorry for the delay, 3.10 hasn't filtered its way down to my laptop just > yet. When it does I'll check the behaviour and reopen if it's still being > annoying. It's in Debian's experimental suite. Ok, looks like this is working now. I've seen a few kernel messages indicating the interrupt storm code is triggering, and the system appears to be remaining responsive with low CPU usage. Thanks guys! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.