Bug 25327

Summary: [g45] driver floods udev change events: system unusable
Product: xorg Reporter: Todd Brunhoff <toddb>
Component: Driver/intelAssignee: Eric Anholt <eric>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: cbm, erecio, gronslet, james.ausmus, john.ruemker, mcepl, William.Hanlon
Version: unspecified   
Hardware: IA64 (Itanium)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
proposed driver patch
none
lspci -vv
none
frequent udev change event
none
bios dump
none
usual stack trace during change event flood
none
dmesg after coming out of suspend from ram
none
Output of udevadm monitor --property of a *working* suspend/resume cycle
none
udevadm monitor --property while lockup for about one minute
none
Corresponding dmesg of previous attachment none

Description Todd Brunhoff 2009-11-27 22:28:42 UTC
Created attachment 31523 [details] [review]
proposed driver patch

Symptom: when X starts up (even during fedora 12 DVD install) system performance is terrible, and there are seconds when the UI is unresponsive. Xorg usually consumes 100% of the CPU.

Analysis: Even with X dead, running 'udevadm monitor --property', I get hundreds of 'change' events/sec on /devices/pci0000:00/0000:00:02.0/drm/card0 where SEQNUM increments by 1. The device is a component of the VGA chipset. With X running, the server reconfigures itself several times per second.

Using Fedora 12 kernel (2.6.31.5-127), and a hint from Mr. Packard, I narrowed the change events to two bits in the hotplug mask, the HDMI B and D int status. If these are removed from the mask, the system appears to work normally. I don't know if this is the final solution or something that only works with this Foxconn board with a TV plugged into the hdmi port; i.e., works for me.

System: motherboard is Foxconn G45M-S LGA 775 Intel G45 HDMI Micro ATX Intel; CPU is E5200. 2GB mem. The display is a Samsung 46" TV connected to the HDMI port. Other details attached:
 - lspci -vv output
 - the repeated udev change event
 - the common X stack trace when the change events are flooding the system
 - the bios dump

The patch that works for me is also attached.
Comment 1 Todd Brunhoff 2009-11-27 22:29:42 UTC
Created attachment 31524 [details]
lspci -vv
Comment 2 Todd Brunhoff 2009-11-27 22:30:15 UTC
Created attachment 31525 [details]
frequent udev change event
Comment 3 Todd Brunhoff 2009-11-27 22:30:46 UTC
Created attachment 31526 [details]
bios dump
Comment 4 Todd Brunhoff 2009-11-27 22:31:34 UTC
Created attachment 31527 [details]
usual stack trace during change event flood
Comment 5 Julien Cristau 2009-11-28 04:37:54 UTC

*** This bug has been marked as a duplicate of bug 25259 ***
Comment 6 Todd Brunhoff 2009-11-28 09:35:36 UTC
With all due respect, the symptom may be a duplicate, but the fix does not work for my motherboard. This patch (http://cvs.fedoraproject.org/viewvc/F-12/xorg-x11-drv-intel/uevent.patch?revision=1.1) addresses the i830 driver. This patch (http://cvs.fedoraproject.org/viewvc/F-12/kernel/drm-intel-no-tv-hotplug.patch?revision=1.1) removes the TV_HOTPLUG_INT_EN from i915_reg.h, and the kernel/driver I have includes that patch. Hence, it is not applicable to whatever my hardware does.
Comment 7 Eric Anholt 2009-12-03 10:46:40 UTC
It seems to me that we need to disable hotplug detect across modesets, as load-based detection probably triggers the hotplug bits.  Has anyone tried that?
Comment 8 Elmo R 2009-12-31 12:24:56 UTC
Please note after applying the patch listed (commenting out HDMI_ lines in i915_irq.c) for my HDMI/udevd issue, I get the following every second in my syslog:

Dec 31 15:19:40 pcsca65 kernel: DRHD: handling fault status reg 3
Dec 31 15:19:40 pcsca65 kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr b08003000
Dec 31 15:19:40 pcsca65 kernel: DMAR:[fault reason 05] PTE Write access is not set
Dec 31 15:19:40 pcsca65 kernel: DRHD: handling fault status reg 3
Dec 31 15:19:40 pcsca65 kernel: DMAR:[DMA Write] Request device [00:02.0] fault addr b08003000
Dec 31 15:19:40 pcsca65 kernel: DMAR:[fault reason 05] PTE Write access is not set

I tried commenting out two bits and just three bits but same error message.
Comment 9 Carl Worth 2010-02-17 09:50:50 UTC
Eric seems to have a theory for a fix here, so assigning to him.

-Carl
Comment 10 Eric Anholt 2010-02-22 07:11:43 UTC
Could you test with a v2.6.33rc7 or newer kernel?  It might be fixed now.
Comment 11 Todd Brunhoff 2010-02-22 09:53:08 UTC
I can try this weekend, I think. I did my initial patch on 2.6.31.5-127 based on some notes I found on building the kernel from the current fedora release (fc12 rawhide, at the time). Looks like kernel.org has http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.33-rc8.tar.bz2. We'll see how that goes.
Comment 12 MartinG 2010-02-26 23:04:48 UTC
I still have this problem in Fedora Rawhide, kernel-2.6.33-0.48.rc8.git1.fc14.x86_64. On every startup and on every resume from suspend, the system partially freezes for several seconds, sometimes more than a minute.

Right after resume from suspend (from ram), the system seems normal, but after ~30 seconds, or after some graphics events (typically pressing alt-F2 in KDE 4) the system locks up, and when when I move the mouse, the pointer will jump non-continuously.

I'd be happy to test anything from Fedora/koji.

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller])
libdrm-2.4.18-1.fc14.i686
libdrm-2.4.18-1.fc14.x86_64
xorg-x11-server-Xorg-1.7.99.901-6.20100215.fc14.x86_64
xorg-x11-drv-intel-2.10.0-4.fc13.x86_64
xorg-x11-drv-intel-devel-2.10.0-4.fc13.x86_64
intel-gpu-tools-2.10.0-4.fc13.x86_64
libdrm-devel-2.4.18-1.fc14.x86_64
kernel-2.6.33-0.48.rc8.git1.fc14.x86_64

Comment 13 MartinG 2010-02-26 23:06:09 UTC
Created attachment 33612 [details]
dmesg after coming out of suspend from ram
Comment 14 Todd Brunhoff 2010-02-26 23:12:34 UTC
Martin, thanks for testing this (I'm not finding the time to build/test). Could you run the 'udevadm monitor --property' as described above. It is the driver transistions that cause the performance problems. Thanks.
Comment 15 MartinG 2010-02-26 23:50:18 UTC
Created attachment 33613 [details]
Output of udevadm monitor --property of a *working* suspend/resume cycle

Very strange - after a reboot (making recent updates active), I could not reproduce this bug. Attached the output of udevadm monitor --property of a suspend to ram/resume cycle that went fine without lockup.

I will report back if the bug comes back. Current setup is:
libdrm-2.4.18-1.fc14.i686
libdrm-2.4.18-1.fc14.x86_64
xorg-x11-drv-intel-2.10.0-4.fc13.x86_64
xorg-x11-drv-intel-devel-2.10.0-4.fc13.x86_64
intel-gpu-tools-2.10.0-4.fc13.x86_64
libdrm-devel-2.4.18-1.fc14.x86_64
kernel-2.6.33-0.48.rc8.git1.fc14.x86_64
Comment 16 MartinG 2010-02-27 03:51:53 UTC
Created attachment 33614 [details]
udevadm monitor --property while lockup for about one minute

Now it happened again. I recorded "udevadmin monitor --property > file" for a while, including a couple of working suspend/resume cycles, but now, when coming out of suspend (to ram), the system was unresponsive for about one minute.
Comment 17 MartinG 2010-02-27 03:52:54 UTC
Created attachment 33615 [details]
Corresponding dmesg of previous attachment
Comment 18 Jesse Barnes 2010-07-01 15:44:30 UTC
*** Bug 25259 has been marked as a duplicate of this bug. ***
Comment 19 Chris Wilson 2010-07-15 05:12:34 UTC
There have been a few recent patches to prevent interrupt/hotplug storms:

mmit 2d1c9752eaa4c0b38f6fb1ab79a6addc146cd64e
Author: Andy Lutomirski <luto@MIT.EDU>
Date:   Sat Jun 12 05:21:18 2010 -0400

    drm/i915: Fix CRT hotplug regression in 2.6.35-rc1
    
    Commit 7a772c492fcfffae812ffca78a628e76fa57fe58 has two bugs which
    made the hotplug problems on my laptop worse instead of better.
    
    First, it did not, in fact, disable the CRT plug interrupt -- it
    disabled all the other hotplug interrupts.  It seems rather doubtful
    that that bit of the patch fixed anything, so let's just remove it.
    (If you want to add it back, you probably meant ~CRT_HOTPLUG_INT_EN.)
    
    Second, on at least my GM45, setting CRT_HOTPLUG_ACTIVATION_PERIOD_64
    and CRT_HOTPLUG_VOLTAGE_COMPARE_50 (when they were previously unset)
    causes a hotplug interrupt about three seconds later.  The old code
    never restored PORT_HOTPLUG_EN so this could only happen once, but
    they new code restores those registers.  So just set those bits when
    we set up the interrupt in the first place.
    
    Signed-off-by: Andy Lutomirski <luto@mit.edu>
    Signed-off-by: Eric Anholt <eric@anholt.net>


commit 7a772c492fcfffae812ffca78a628e76fa57fe58
Author: Adam Jackson <ajax@redhat.com>
Date:   Mon May 24 16:46:29 2010 -0400

    drm/i915/gen4: Extra CRT hotplug paranoia
    
    Disable the CRT plug interrupt while doing the force cycle, explicitly
    clear any CRT interrupt we may have generated, and restore when done.
    Should mitigate interrupt storms from hotplug detection.
    
    Signed-off-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Eric Anholt <eric@anholt.net>
Comment 20 Chris Wilson 2010-08-08 07:03:40 UTC
2.6.35 has the required patches, I think the hotplug storm has now passed.
Comment 21 Ben Hutchings 2010-08-10 19:46:10 UTC
(In reply to comment #20)
> 2.6.35 has the required patches, I think the hotplug storm has now passed.

Can you send these on to stable, please?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.