It seems that a few minutes after waking my laptop from sleep I'll often notice a sudden jump in system activity. I've traced this back to udev looping on the graphics drm device for some reason. Running udevd --debug results in the following messages looped over and over: 1267729775.453592 [13925] event_queue_insert: seq 189967 queued, 'change' 'drm' 1267729775.453639 [13925] udev_monitor_send_device: passed 200 bytes to monitor 0x23d72d0 1267729775.453725 [13926] worker_new: seq 189967 running 1267729775.453782 [13926] udev_device_new_from_syspath: device 0x23e4950 has devpath '/devices/pci0000:00/0000:00:02.0/drm/card0' 1267729775.453894 [13926] udev_device_read_db: device 0x23e4950 filled with db file data 1267729775.453923 [13926] udev_rules_apply_to_event: LINK 'char/226:0' /lib/udev/rules.d/50-udev-default.rules:2 1267729775.453951 [13926] udev_rules_apply_to_event: NAME 'dri/card0' /lib/udev/rules.d/50-udev-default.rules:48 1267729775.454015 [13926] udev_rules_apply_to_event: RUN 'udev-acl --action=$env{ACTION} --device=$env{DEVNAME}' /lib/udev/rules.d/70-acl.rules:81 1267729775.454040 [13926] udev_rules_apply_to_event: RUN 'socket:@/org/freedesktop/hal/udev_event' /lib/udev/rules.d/90-hal.rules:2 1267729775.454061 [13926] udev_rules_apply_to_event: GROUP 44 /lib/udev/rules.d/91-permissions.rules:61 1267729775.454180 [13926] udev_device_update_db: created db file for '/devices/pci0000:00/0000:00:02.0/drm/card0' in '/dev/.udev/db/drm:card0' 1267729775.454201 [13926] udev_node_add: creating device node '/dev/dri/card0', devnum=226:0, mode=0660, uid=0, gid=44 1267729775.454221 [13926] udev_node_mknod: preserve file '/dev/dri/card0', because it has correct dev_t 1267729775.454254 [13926] node_symlink: preserve already existing symlink '/dev/char/226:0' to '../dri/card0' 1267729775.454289 [13926] util_run_program: 'udev-acl --action=change --device=/dev/dri/card0' started 1267729775.458003 [13926] util_run_program: 'udev-acl --action=change --device=/dev/dri/card0' returned with exitcode 0 1267729775.458074 [13926] udev_monitor_send_device: passed 261 bytes to monitor 0x23e4950 1267729775.458128 [13926] udev_monitor_send_device: passed -1 bytes to monitor 0x23e4f00 1267729775.458151 [13926] worker_new: seq 189967 processed with 0 1267729775.458221 [13925] event_queue_delete: seq 189967 done with 0 Putting the laptop to sleep and then waking it up again "warm" seems to clear the problem such that it does not reoccur again until the next time I wake the system "cold". Clearly this activity has an adverse effect on battery life. Hardware is a Lenovo X200 laptop penelope:/home/nicoya# lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07) 00:03.2 IDE interface: Intel Corporation Mobile 4 Series Chipset PT IDER Controller (rev 07) 00:03.3 Serial controller: Intel Corporation Mobile 4 Series Chipset AMT SOL Redirection (rev 07) 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network Connection (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M-E LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 03:00.0 Network controller: Intel Corporation Wireless WiFi Link 5300 The kernel is the stock Debian amd64 kernel version 2.6.32-9 based on upstream 2.6.32.9. This bug is reported in Debian's bug tracking system here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572537
*** This bug has been marked as a duplicate of bug 25259 ***
Tony writes[1]: > Yes, I'm still getting the issue, sort of. My understanding is that > some parts of the hotplug pipeline have moved around, but I still > get a spam of hotplug events from the graphics that nearly kills the > system after waking up. The most reliable way to trigger it is to > wake from sleep, open firefox (iceweasel), and play a youtube video > (flash). The symptoms will often occur without going through these > exact steps though. > > Executing the command "intel_reg_write 0x61110 0x0" as root stops > the hotplug spam and restores system functionality, though this also > apparently stops all hotplug events so the system won't detect > attaching an external monitor or something to the VGA port. > > I'm currently running linux-image-2.3.0-2-amd64 package version 3.2.12-1. > > I could certainly test patched kernels, as I can very reliably > reproduce the problem. Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 doesn't take care of these symptoms. Any hints for tracking this down? [1] http://bugs.debian.org/572537
Ok, to dig into this one we need full dmesg with drm.debug=0xe added to the kernel cmdline. Please also grab the dmesg with the added debug options while the problem is happening.
Created attachment 61900 [details] kern.log with drm.debug=0xe Hotplug storm occurs upon wake on May 20th, after approx 23hrs in S3 sleep. It's worth noting that the storm does *not* occur after a few much shorter S3 sleeps on May 19th. Makes me wonder if some register isn't getting reinitialized upon wake and is resuming with a decayed value after being powered down.
Timeout. Please do reopen if you can still reproduce the issue and help us diagnose the problem, thanks.
Um, if I understand correctly then Tony replied with the log Daniel requested and then there was no reply. What did I miss?
(In reply to comment #6) > Um, if I understand correctly then Tony replied with the log Daniel > requested and then there was no reply. What did I miss? Just left in NEEDINFO and I did a mass-close of unchanged bug reports in that state...
All indications point towards flaky hardware, as it alternates on suspend&resume cycles between different HDMI/DP ports. I'm not aware of any particular erratum concerning gm45 hotplug detection that hasn't already been implemented, so unless this is widespread across many different manufacturers I would say it was a model, even machine, specific defect.
If the exact cause can't be narrowed down, is there any sort of mitigation that might be appropriate? Rate limiting duplicate hotplug events maybe?
(In reply to comment #9) > If the exact cause can't be narrowed down, is there any sort of mitigation > that might be appropriate? Rate limiting duplicate hotplug events maybe? Have you tried recent kernels? In comment #1, this has already been resolved dupe of bug 25259, which in turn has been resolved dupe of bug 25327, which has been fixed. There are also plenty of other irq/hotplug related changes in recent kernels; 2.6.32.9 isn't exactly new.
(In reply to comment #10) > Have you tried recent kernels? In comment #1, this has already been resolved > dupe of bug 25259, Trying newer kernels is generally good advice for the restless, but I also want to point your attention to existing data in this same bug: | Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 | doesn't take care of these symptoms. Thanks and hope that helps, Jonathan
(In reply to comment #11) > Trying newer kernels is generally good advice for the restless, but I also > want to point your attention to existing data in this same bug: > > | Since 3.2.12 is newer than 2.6.35, it sounds like the fix from bug 25259 > | doesn't take care of these symptoms. Thanks, I missed that somehow. Even so, the kernel is a fast moving target, and IMHO trying, say, current upstream master is much more productive than trying to go through all the changes since 3.2 in our irq/hotplug/suspend code that might have affected this bug. Also, if the problem still persists, I think debugging on current kernels that we work on is more likely to lead to correct conclusions anyway.
Tasking to Daniel, far out on his hotplug todo list is interrupt mitigation for naughty hardware.
Bug/behaviour is still present as of Debian kernel package 3.2.0-4, which corresponds to 3.2.35 mainline apparently.
This should be fixed by the interrupt storm detection in 3.10.
Presumed now fixed.
Sorry for the delay, 3.10 hasn't filtered its way down to my laptop just yet. When it does I'll check the behaviour and reopen if it's still being annoying. As of 3.8 at least the problem still existed, and the intel_reg_write command also stopped working (just reported invalid argument when trying to run or some such) so that was a bit of a disaster. Not sure if I should open a bug for the command not working, as I theoretically won't need it once this bug is fixed.
(In reply to comment #17) > Sorry for the delay, 3.10 hasn't filtered its way down to my laptop just > yet. When it does I'll check the behaviour and reopen if it's still being > annoying. It's in Debian's experimental suite.
Ok, looks like this is working now. I've seen a few kernel messages indicating the interrupt storm code is triggering, and the system appears to be remaining responsive with low CPU usage. Thanks guys!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.