Summary: | i810 Driver and/or DRI cause hang on suspend under 2.6.12, xorg 6.8.2 | ||
---|---|---|---|
Product: | DRI | Reporter: | Michael Paik <mpaik> |
Component: | DRM/other | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | high | CC: | cbm |
Version: | DRI git | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Michael Paik
2005-06-23 23:44:52 UTC
*** This bug has been marked as a duplicate of 3612 *** Still an open bug, just wanted the cross-reference. Fetch from CVS HEAD of i915 source for kernel module has same problem. DRI does not have to be active; starting Xorg with Accel false and doing: modprobe i915 and trying to suspend produces same behavior. Created attachment 2946 [details]
Backtrace of the crash.
Here's a patch which should fix the problem.
Basically, if the i915 private hadn't been initialized the suspend would crash
accessing a NULL pointer.
*** Bug 3612 has been marked as a duplicate of this bug. *** Patch causes machine to suspend okay, but doesn't come out of suspend. Attempting to resume causes machine to wake to console, begin resume, but hang at resume of Xorg. Kernel oops stacktrace is seen briefly (~.5 sec) before screen blanks. /var/log/messages is clean. Is there some way to force syslogd to catch these panic messages? syslogd isn't running at the point when suspend/resume happens, so it can't. Resume worked here o.k., so whatever info you've got post it here. After much tinkering and disabling direct rendering and trying to load the i915/drm modules without actually enabling direct rendering, it appears that the resume hangs right after drm tries to enable the device at 0000:00:02.1 (0000 -> 0002), which is /dev/dri/card1. I notice that at that point ACPI hasn't gotten around to restarting that interrupt yet (drm kicks in after 0000:00:02.0[A], /dev/dri/card0). From /var/log/messages: [Last couple lines from the hard reboot] Jun 24 12:16:24 eleios kernel: Stopping tasks: =============================================================================== =========================================================| Jun 24 12:16:24 eleios kernel: Debug: sleeping function called from invalid context at mm/slab.c:2126 Jun 24 12:16:24 eleios kernel: in_atomic():0, irqs_disabled():1 Jun 24 12:16:24 eleios kernel: [<c015d212>] kmem_cache_alloc+0x63/0x78 Jun 24 12:16:24 eleios kernel: [<c0249bf6>] acpi_pci_link_set+0x3f/0x17f Jun 24 12:16:24 eleios kernel: [<c024a040>] irqrouter_resume+0x14/0x28 Jun 24 12:16:24 eleios kernel: [<c0289344>] sysdev_resume+0x3d/0xb5 Jun 24 12:16:24 eleios kernel: [<c028d593>] device_power_up+0x5/0xa Jun 24 12:16:24 eleios kernel: [<c014a82b>] suspend_enter+0x44/0x46 Jun 24 12:16:24 eleios kernel: [<c014a7b9>] suspend_prepare+0x57/0x85 Jun 24 12:16:24 eleios kernel: [<c014a89e>] enter_state+0x49/0x54 Jun 24 12:16:24 eleios kernel: [<c02470ba>] acpi_system_write_sleep+0x5a/0x6c Jun 24 12:16:23 eleios kernel: [<c0247060>] acpi_system_write_sleep+0x0/0x6c Jun 24 12:16:23 eleios kernel: [<c017cbf4>] vfs_write+0x9e/0x110 Jun 24 12:16:23 eleios kernel: [<c017cd11>] sys_write+0x41/0x6a Jun 24 12:16:23 eleios kernel: [<c0103a51>] syscall_call+0x7/0xb Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.6[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> Link [LNKE] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: Restarting tasks... done Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 0: Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002) Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 1: Jun 24 12:18:17 eleios syslogd 1.4.1: restart. Jun 24 12:18:18 eleios kernel: klogd 1.4.1, log source = /proc/kmsg started. Relevant bit of Xorg log: drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: Open failed drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: Open failed drmOpenByBusid: Searching for BusID pci:0000:00:02.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 7, (OK) drmOpenByBusid: drmOpenMinor returns 7 drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0 (II) I810(0): [drm] loaded kernel module for "i915" driver (II) I810(0): [drm] DRM interface version 1.2 (II) I810(0): [drm] created "i915" driver at busid "pci:0000:00:02.0" (II) I810(0): [drm] added 8192 byte SAREA at 0xf8997000 (II) I810(0): [drm] mapped SAREA 0xf8997000 to 0xb7e84000 (II) I810(0): [drm] framebuffer handle = 0xe0020000 (II) I810(0): [drm] added 1 reserved context for kernel (II) I810(0): Allocated 3072 kB for the back buffer at 0x7800000. (II) I810(0): Allocated 3072 kB for the depth buffer at 0x7400000. (II) I810(0): Allocated 32 kB for the logical context at 0x73f8000. (II) I810(0): Allocated 54016 kB for textures at 0x520000 (II) I810(0): Updated framebuffer allocation size from 5120 to 5128 kByte (II) I810(0): Updated pixmap cache from 512 scanlines to 514 scanlines (II) I810(0): 0x9da97ec: Memory at offset 0x00020000, size 5128 kBytes (II) I810(0): 0x9cd3920: Memory at offset 0x07fff000, size 4 kBytes (II) I810(0): 0x9ea9318: Memory at offset 0x07ffb000, size 16 kBytes (II) I810(0): 0x9ea92e4: Memory at offset 0x00000000, size 128 kBytes (II) I810(0): 0x9da982c: Memory at offset 0x07fea000, size 64 kBytes (II) I810(0): 0x9f2d280: Memory at offset 0x07ffa000, size 4 kBytes (II) I810(0): 0x9da987c: Memory at offset 0x07800000, size 3072 kBytes (II) I810(0): 0x9da989c: Memory at offset 0x07400000, size 3072 kBytes (II) I810(0): 0x9da98dc: Memory at offset 0x073f8000, size 32 kBytes (II) I810(0): 0x9da98bc: Memory at offset 0x00520000, size 54016 kBytes (II) I810(0): Activating tiled memory for the back buffer. (II) I810(0): Activating tiled memory for the depth buffer. (II) I810(0): [drm] Registers = 0xd0000000 (II) I810(0): [drm] Back Buffer = 0xe7800000 (II) I810(0): [drm] Depth Buffer = 0xe7400000 (II) I810(0): [drm] ring buffer = 0xe0000000 (II) I810(0): [drm] textures = 0xe0520000 (II) I810(0): [drm] dma control initialized, using IRQ 11 (II) I810(0): [drm] Initialized kernel agp heap manager, 55312384 (II) I810(0): [dri] visual configs initialized (==) I810(0): Write-combining range (0xe0000000,0x8000000) (II) I810(0): vgaHWGetIOBase: hwp->IOBase is 0x03d0, hwp->PIOOffset is 0x0000 (WW) I810(0): Extended BIOS function 0x5f05 failed. (II) I810(0): xf86BindGARTMemory: bind key 12 at 0x007df000 (pgoffset 2015) (II) I810(0): xf86BindGARTMemory: bind key 5 at 0x07fff000 (pgoffset 32767) (II) I810(0): xf86BindGARTMemory: bind key 6 at 0x07ffb000 (pgoffset 32763) (II) I810(0): xf86BindGARTMemory: bind key 8 at 0x07fea000 (pgoffset 32746) (II) I810(0): xf86BindGARTMemory: bind key 7 at 0x07ffa000 (pgoffset 32762) (II) I810(0): xf86BindGARTMemory: bind key 9 at 0x07800000 (pgoffset 30720) (II) I810(0): xf86BindGARTMemory: bind key 10 at 0x07400000 (pgoffset 29696) (II) I810(0): xf86BindGARTMemory: bind key 11 at 0x073f8000 (pgoffset 29688) (II) I810(0): Display plane A is disabled and connected to Pipe A. (II) I810(0): Display plane B is enabled and connected to Pipe B. (II) I810(0): Enabling plane B. (II) I810(0): Display plane A is now disabled and connected to Pipe A. (II) I810(0): Display plane B is now enabled and connected to Pipe B. (II) I810(0): PIPEACONF is 0x80000000 (II) I810(0): PIPEBCONF is 0x80000000 (II) I810(0): Mode bandwidth is 47 Mpixel/s (II) I810(0): maxBandwidth is 1440 Mbyte/s, pipe bandwidths are 252 Mbyte/s, 0 Mbyte/s (II) I810(0): Using XFree86 Acceleration Architecture (XAA) Screen to screen bit blits Solid filled rectangles 8x8 mono pattern filled rectangles Indirect CPU to Screen color expansion Solid Horizontal and Vertical Lines Offscreen Pixmaps Setting up tile and stipple cache: 16 128x128 slots 4 256x256 slots (==) I810(0): Backing store disabled (==) I810(0): Silken mouse enabled (II) I810(0): Initializing HW Cursor (**) Option "dpms" (**) I810(0): DPMS enabled (II) I810(0): X context handle = 0x1 (II) I810(0): [drm] installed DRM signal handler (II) I810(0): [DRI] installation complete (II) I810(0): direct rendering: Enabled To be clear, the Xorg log in the prior post is from boot, not from crash. No entries are recorded by X at that time. In the patch I attached before - I added some pci_disable_device() and pci_enable_device() function calls. Can you remove them and try again. Comes out of suspend (suspend indicator light turns off) but module fails to reload. No artifacts in message logs. Tinkering with the source in i915_pm.c fails to produce a working kludge. More to the point, why does this happen regardless of the order of pci_set_power_state, pci_enable_device, and i915_set_dpms in i915_pm? One would think that subsequent statements (pci_set_power_state, pci_enable_device) would produce kernel messages for the same device/minor rather than the subsequent ones. Do these calls increment the minor internally or something? Kernel documentation of these is relatively vague. Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002) I think you are mis-reading the messages. The module doesn't reload as it's already loaded, it just re-initializes. That's all. You should find that with lsmod shows that i915 is still loaded and you should be able to start X just fine. Sorry, I suppose I mis-spoke. When I say the module fails to reload, I mean to say that the machine comes out of suspend, but hangs during the re- initialization of the i915 drm module. That is to say, the suspend indicator light turns off, but the screen remains blank (if I start X with hardware accel on; it simply hangs at the console if I start with Accel false but the i915 module loaded) and the machine is entirely unresponsive. You might want to comment out the pci_set_power_state() too. Both the pci_enable/disable_device and pci_set_power_state() functions I added, probably shouldn't have been. So I'll probably remove them again. I have already tried removing them. Still hangs on resume with a black screen. Is there a straightforward kludge I can put in to only have it try to activate /dev/dri/card0? Does resume work without the i915 driver loaded ? Oh, and if you are not running the Xserver and trying to resume, no wonder you have a black screen. You either need to rePOST your BIOS with appropriate tools, or have the Xserver running when you resume so the Xserver will rePOST the video BIOS for you. On a dozen Celeron 4xx with i810e chipset and 2.611.3 / 2.612-rc5: Recent DRI i810 snapshtots all resume fine when using APM in the following ways: apm -s apmsleep ... Supend APM using power button fails and hangs on resume so enable Ignore user supend in .config and use apm -s Confirm ACPI S3 as well as swsusp-2 hang on resume or shortly thereafter ( after X repainting screen) . Note some systems have ATI GDC and use mach64. FB/DRM Drivers tested: i810fb/i810 vesafb/mach64 Note: Lockup on resume when using netconsole with 2.6 is due to netconsole. A fix is underrway. Use seriai console in the meantime. Resume works fine without the i915 driver loaded, even when the drm module is loaded. I am running the Xserver. In i915_drv.c, on line 107, you should find code that says this.... .resume = i915_resume, .suspend = i915_suspend, Just delete those two lines, which will avoid the i915's suspend/resume code from activating. Does that make resume work now ? Works beautifully. O.k. great. I've added some more code to the DRM CVS to hopefully fix it for you. Can you 'cvs update' and re-try, ensuring that any previous patches applied are reversed. This patch doesn't work against the 2.6.12 from the Fedora Project because the snapshot doesn't have the pm_message_t change to make it a struct... I don't know if this is in the current stable branch, but I recall it failed to patch against the -mm tree earlier this year. so, if you modify the ifdef around the pm_message_t, can you try it with FC4 and report back ? pm_message_t in the FC4 (well, FC5-beta really, since FC4 is using a 2.6.11 kernel which breaks agp support) 2.6.12 kernel is a u32. So setting int event = state, the module compiles, but again doesn't resume after suspend. For the time being I am running with .resume and .suspend commented out. Is there any functionality loss other than the smooth fading out of the screen on suspend? The i915 suspend/resume functions just ensure that the chip has powered down the the devices. No functionality is really lost, but it may be taking more power in S3 than intended without them. It'd be good to find out which I915_WRITE call is hanging the resume operation. Could you sprinkle a few printk()'s around after each I915_WRITE to find out which one on your hardware is causing the lockup. I did find that LVDS caused a problem before, and that's why it's still commented out. Maybe another register causes similar issues. The part of the code you are concerned with is in i915_set_dpms(), under the case 0: section. Doing I915_WRITE to the panel causes it to blank, making kernel messages unreadable. However, even with that and the LVD bit commented out, the card refuses to come up correctly. It gets through all the I915_WRITEs, but doesn't resume. Last visible kernel messages are: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 PCI: Enabling device 0000:00:02.1 (0000 -> 0002) O.k, I'm going to comment out just the resume function from comment #21 which should be enough to fix things for now, leaving the suspend function enabled. Having thought about it, it's the right thing to do. Turning these registers back on without programming the other dependent registers is not correct. The re-POSTing of the BIOS will fix things up for us anyway. Let me know if this still doesn't work. I tried commenting out just the .resume earlier, and it didn't work. I'll try it again and attach a stacktrace. .resume & .suspend functions have been ripped out now. So closing this. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.