|Summary:||i810 Driver and/or DRI cause hang on suspend under 2.6.12, xorg 6.8.2|
|Product:||DRI||Reporter:||Michael Paik <mpaik>|
|Component:||DRM/other||Assignee:||Default DRI bug account <dri-devel>|
|Status:||RESOLVED FIXED||QA Contact:|
|i915 platform:||i915 features:|
Description Michael Paik 2005-06-23 23:44:52 UTC
Kernel: 2.6.12 Platform: x86 Intel Centrino Model: IBM Thinkpad X40 Xorg: 6.8.2 [FC4 RPM] DRI-Common: 20050621 Snapshot DRI-i915: 20050621 Snapshot i810 Xorg Driver: included with DRI-i915 snapthot Upon suspend (ACPI suspend to RAM), system hangs. If I revert to the i810 driver included in the xorg 6.8.2 RPM, which deactivates DRI due to libdri version mismatch (4.3.0 needed, 5.0.0 found) the system suspends cleanly. Various other snapshots from the last 3 months of the i810 xorg driver show the same problem. I will cross-post this bug under DRI as well. For some reason, /var/log/messages isn't logging the oops (it logs some other call trace from a warning) so I'll have to type this manually from the artifact left on the screen on hang: EIP is at i915_set_dpms+0x1c/0x1b2 [i915] eax: 00000000 ebx: 00000003 ecx: f89d34a2 edx: 00000003 esi: 00000000 edi: 00000003 ebp: 00000003 esp: f7d68ea4 ds: 007b es: 007b ss: 0068 Process x40-suspend.sh (pid: 4615, threadinfo=f7d68000 task=f6722550) Stack: badc0ded f64cc89c c1d8519c 00000000 c021ae4a 00000001 0001c89c 00000000 00000003 f4488000 00000000 00000003 f89d34e7 00000246 00000003 c1d851e0 c1d851e0 c1d855ec c1d855ec c021ce7c c028d0ca f7ec8ddc f6360e34 f1e73888 Call Trace: [<c021ae4a>] pci_disable_device+0x57/0x63 [<f89d34e7>] i915_suspend+0x45/0x94 [i915] [<c021ce7c>] pci_device_suspend+0x13/0x20 [<c028d0ca>] suspend_device+0xda/0xe2 [<c028d190>] device_suspend+0xbe/ox1c2 [<c014a7b9>] suspend_prepare+0x57/0x85 [<c014a880>] enter_state+0x2b/0x54 [<c02470ba>] acpi_system_write_sleep+0x5a/0x6c [<c0247060>] acpi_system_write_sleep+0x0/0x6c [<c017cbf4>] vfs_write+0x9e/0x110 [<c017cd11>] sys_write+0x41/0x6a [<c0103a51>] syscall_call+0x7/0xb Code: ff ff 83 c4 18 5b c3 90 90 90 90 90 90 90 90 55 57 56 53 83 ec 1c 89 d7 8b b0 00 08 00 00 a1 08 45 db f8 85 c0 0f 85 8d 00 00 00 <8b> 46 04 8b 40 10 c7 80 c4 03 00 00 01 00 00 00 8b 4e 04 8b 51
Comment 1 Michael Paik 2005-06-23 23:45:07 UTC
*** This bug has been marked as a duplicate of 3612 ***
Comment 2 Michael Paik 2005-06-23 23:49:41 UTC
Still an open bug, just wanted the cross-reference.
Comment 3 Michael Paik 2005-06-24 00:15:57 UTC
Fetch from CVS HEAD of i915 source for kernel module has same problem. DRI does not have to be active; starting Xorg with Accel false and doing: modprobe i915 and trying to suspend produces same behavior.
Comment 4 Alan Hourihane 2005-06-24 02:00:09 UTC
Created attachment 2946 [details] Backtrace of the crash. Here's a patch which should fix the problem. Basically, if the i915 private hadn't been initialized the suspend would crash accessing a NULL pointer.
Comment 5 Alan Hourihane 2005-06-24 02:00:56 UTC
*** Bug 3612 has been marked as a duplicate of this bug. ***
Comment 6 Michael Paik 2005-06-24 05:22:34 UTC
Patch causes machine to suspend okay, but doesn't come out of suspend. Attempting to resume causes machine to wake to console, begin resume, but hang at resume of Xorg. Kernel oops stacktrace is seen briefly (~.5 sec) before screen blanks.
Comment 7 Michael Paik 2005-06-24 05:23:56 UTC
/var/log/messages is clean. Is there some way to force syslogd to catch these panic messages?
Comment 8 Alan Hourihane 2005-06-24 05:42:22 UTC
syslogd isn't running at the point when suspend/resume happens, so it can't. Resume worked here o.k., so whatever info you've got post it here.
Comment 9 Michael Paik 2005-06-24 09:45:44 UTC
After much tinkering and disabling direct rendering and trying to load the i915/drm modules without actually enabling direct rendering, it appears that the resume hangs right after drm tries to enable the device at 0000:00:02.1 (0000 -> 0002), which is /dev/dri/card1. I notice that at that point ACPI hasn't gotten around to restarting that interrupt yet (drm kicks in after 0000:00:02.0[A], /dev/dri/card0). From /var/log/messages: [Last couple lines from the hard reboot] Jun 24 12:16:24 eleios kernel: Stopping tasks: =============================================================================== =========================================================| Jun 24 12:16:24 eleios kernel: Debug: sleeping function called from invalid context at mm/slab.c:2126 Jun 24 12:16:24 eleios kernel: in_atomic():0, irqs_disabled():1 Jun 24 12:16:24 eleios kernel: [<c015d212>] kmem_cache_alloc+0x63/0x78 Jun 24 12:16:24 eleios kernel: [<c0249bf6>] acpi_pci_link_set+0x3f/0x17f Jun 24 12:16:24 eleios kernel: [<c024a040>] irqrouter_resume+0x14/0x28 Jun 24 12:16:24 eleios kernel: [<c0289344>] sysdev_resume+0x3d/0xb5 Jun 24 12:16:24 eleios kernel: [<c028d593>] device_power_up+0x5/0xa Jun 24 12:16:24 eleios kernel: [<c014a82b>] suspend_enter+0x44/0x46 Jun 24 12:16:24 eleios kernel: [<c014a7b9>] suspend_prepare+0x57/0x85 Jun 24 12:16:24 eleios kernel: [<c014a89e>] enter_state+0x49/0x54 Jun 24 12:16:24 eleios kernel: [<c02470ba>] acpi_system_write_sleep+0x5a/0x6c Jun 24 12:16:23 eleios kernel: [<c0247060>] acpi_system_write_sleep+0x0/0x6c Jun 24 12:16:23 eleios kernel: [<c017cbf4>] vfs_write+0x9e/0x110 Jun 24 12:16:23 eleios kernel: [<c017cd11>] sys_write+0x41/0x6a Jun 24 12:16:23 eleios kernel: [<c0103a51>] syscall_call+0x7/0xb Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.6[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> Link [LNKE] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:23 eleios kernel: Restarting tasks... done Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 0: Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002) Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 1: Jun 24 12:18:17 eleios syslogd 1.4.1: restart. Jun 24 12:18:18 eleios kernel: klogd 1.4.1, log source = /proc/kmsg started. Relevant bit of Xorg log: drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: Open failed drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: open result is -1, (Unknown error 999) drmOpenDevice: Open failed drmOpenByBusid: Searching for BusID pci:0000:00:02.0 drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is 7, (OK) drmOpenByBusid: drmOpenMinor returns 7 drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0 (II) I810(0): [drm] loaded kernel module for "i915" driver (II) I810(0): [drm] DRM interface version 1.2 (II) I810(0): [drm] created "i915" driver at busid "pci:0000:00:02.0" (II) I810(0): [drm] added 8192 byte SAREA at 0xf8997000 (II) I810(0): [drm] mapped SAREA 0xf8997000 to 0xb7e84000 (II) I810(0): [drm] framebuffer handle = 0xe0020000 (II) I810(0): [drm] added 1 reserved context for kernel (II) I810(0): Allocated 3072 kB for the back buffer at 0x7800000. (II) I810(0): Allocated 3072 kB for the depth buffer at 0x7400000. (II) I810(0): Allocated 32 kB for the logical context at 0x73f8000. (II) I810(0): Allocated 54016 kB for textures at 0x520000 (II) I810(0): Updated framebuffer allocation size from 5120 to 5128 kByte (II) I810(0): Updated pixmap cache from 512 scanlines to 514 scanlines (II) I810(0): 0x9da97ec: Memory at offset 0x00020000, size 5128 kBytes (II) I810(0): 0x9cd3920: Memory at offset 0x07fff000, size 4 kBytes (II) I810(0): 0x9ea9318: Memory at offset 0x07ffb000, size 16 kBytes (II) I810(0): 0x9ea92e4: Memory at offset 0x00000000, size 128 kBytes (II) I810(0): 0x9da982c: Memory at offset 0x07fea000, size 64 kBytes (II) I810(0): 0x9f2d280: Memory at offset 0x07ffa000, size 4 kBytes (II) I810(0): 0x9da987c: Memory at offset 0x07800000, size 3072 kBytes (II) I810(0): 0x9da989c: Memory at offset 0x07400000, size 3072 kBytes (II) I810(0): 0x9da98dc: Memory at offset 0x073f8000, size 32 kBytes (II) I810(0): 0x9da98bc: Memory at offset 0x00520000, size 54016 kBytes (II) I810(0): Activating tiled memory for the back buffer. (II) I810(0): Activating tiled memory for the depth buffer. (II) I810(0): [drm] Registers = 0xd0000000 (II) I810(0): [drm] Back Buffer = 0xe7800000 (II) I810(0): [drm] Depth Buffer = 0xe7400000 (II) I810(0): [drm] ring buffer = 0xe0000000 (II) I810(0): [drm] textures = 0xe0520000 (II) I810(0): [drm] dma control initialized, using IRQ 11 (II) I810(0): [drm] Initialized kernel agp heap manager, 55312384 (II) I810(0): [dri] visual configs initialized (==) I810(0): Write-combining range (0xe0000000,0x8000000) (II) I810(0): vgaHWGetIOBase: hwp->IOBase is 0x03d0, hwp->PIOOffset is 0x0000 (WW) I810(0): Extended BIOS function 0x5f05 failed. (II) I810(0): xf86BindGARTMemory: bind key 12 at 0x007df000 (pgoffset 2015) (II) I810(0): xf86BindGARTMemory: bind key 5 at 0x07fff000 (pgoffset 32767) (II) I810(0): xf86BindGARTMemory: bind key 6 at 0x07ffb000 (pgoffset 32763) (II) I810(0): xf86BindGARTMemory: bind key 8 at 0x07fea000 (pgoffset 32746) (II) I810(0): xf86BindGARTMemory: bind key 7 at 0x07ffa000 (pgoffset 32762) (II) I810(0): xf86BindGARTMemory: bind key 9 at 0x07800000 (pgoffset 30720) (II) I810(0): xf86BindGARTMemory: bind key 10 at 0x07400000 (pgoffset 29696) (II) I810(0): xf86BindGARTMemory: bind key 11 at 0x073f8000 (pgoffset 29688) (II) I810(0): Display plane A is disabled and connected to Pipe A. (II) I810(0): Display plane B is enabled and connected to Pipe B. (II) I810(0): Enabling plane B. (II) I810(0): Display plane A is now disabled and connected to Pipe A. (II) I810(0): Display plane B is now enabled and connected to Pipe B. (II) I810(0): PIPEACONF is 0x80000000 (II) I810(0): PIPEBCONF is 0x80000000 (II) I810(0): Mode bandwidth is 47 Mpixel/s (II) I810(0): maxBandwidth is 1440 Mbyte/s, pipe bandwidths are 252 Mbyte/s, 0 Mbyte/s (II) I810(0): Using XFree86 Acceleration Architecture (XAA) Screen to screen bit blits Solid filled rectangles 8x8 mono pattern filled rectangles Indirect CPU to Screen color expansion Solid Horizontal and Vertical Lines Offscreen Pixmaps Setting up tile and stipple cache: 16 128x128 slots 4 256x256 slots (==) I810(0): Backing store disabled (==) I810(0): Silken mouse enabled (II) I810(0): Initializing HW Cursor (**) Option "dpms" (**) I810(0): DPMS enabled (II) I810(0): X context handle = 0x1 (II) I810(0): [drm] installed DRM signal handler (II) I810(0): [DRI] installation complete (II) I810(0): direct rendering: Enabled
Comment 10 Michael Paik 2005-06-24 09:55:22 UTC
To be clear, the Xorg log in the prior post is from boot, not from crash. No entries are recorded by X at that time.
Comment 11 Alan Hourihane 2005-06-24 11:18:00 UTC
In the patch I attached before - I added some pci_disable_device() and pci_enable_device() function calls. Can you remove them and try again.
Comment 12 Michael Paik 2005-06-24 12:53:39 UTC
Comes out of suspend (suspend indicator light turns off) but module fails to reload. No artifacts in message logs. Tinkering with the source in i915_pm.c fails to produce a working kludge. More to the point, why does this happen regardless of the order of pci_set_power_state, pci_enable_device, and i915_set_dpms in i915_pm? One would think that subsequent statements (pci_set_power_state, pci_enable_device) would produce kernel messages for the same device/minor rather than the subsequent ones. Do these calls increment the minor internally or something? Kernel documentation of these is relatively vague. Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002)
Comment 13 Alan Hourihane 2005-06-24 13:53:11 UTC
I think you are mis-reading the messages. The module doesn't reload as it's already loaded, it just re-initializes. That's all. You should find that with lsmod shows that i915 is still loaded and you should be able to start X just fine.
Comment 14 Michael Paik 2005-06-24 13:56:53 UTC
Sorry, I suppose I mis-spoke. When I say the module fails to reload, I mean to say that the machine comes out of suspend, but hangs during the re- initialization of the i915 drm module. That is to say, the suspend indicator light turns off, but the screen remains blank (if I start X with hardware accel on; it simply hangs at the console if I start with Accel false but the i915 module loaded) and the machine is entirely unresponsive.
Comment 15 Alan Hourihane 2005-06-24 14:26:58 UTC
You might want to comment out the pci_set_power_state() too. Both the pci_enable/disable_device and pci_set_power_state() functions I added, probably shouldn't have been. So I'll probably remove them again.
Comment 16 Michael Paik 2005-06-24 22:41:36 UTC
I have already tried removing them. Still hangs on resume with a black screen. Is there a straightforward kludge I can put in to only have it try to activate /dev/dri/card0?
Comment 17 Alan Hourihane 2005-06-25 04:22:39 UTC
Does resume work without the i915 driver loaded ?
Comment 18 Alan Hourihane 2005-06-25 04:24:00 UTC
Oh, and if you are not running the Xserver and trying to resume, no wonder you have a black screen. You either need to rePOST your BIOS with appropriate tools, or have the Xserver running when you resume so the Xserver will rePOST the video BIOS for you.
Comment 19 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-25 05:38:12 UTC
On a dozen Celeron 4xx with i810e chipset and 2.611.3 / 2.612-rc5: Recent DRI i810 snapshtots all resume fine when using APM in the following ways: apm -s apmsleep ... Supend APM using power button fails and hangs on resume so enable Ignore user supend in .config and use apm -s Confirm ACPI S3 as well as swsusp-2 hang on resume or shortly thereafter ( after X repainting screen) . Note some systems have ATI GDC and use mach64. FB/DRM Drivers tested: i810fb/i810 vesafb/mach64 Note: Lockup on resume when using netconsole with 2.6 is due to netconsole. A fix is underrway. Use seriai console in the meantime.
Comment 20 Michael Paik 2005-06-25 21:33:48 UTC
Resume works fine without the i915 driver loaded, even when the drm module is loaded. I am running the Xserver.
Comment 21 Alan Hourihane 2005-06-27 01:39:41 UTC
In i915_drv.c, on line 107, you should find code that says this.... .resume = i915_resume, .suspend = i915_suspend, Just delete those two lines, which will avoid the i915's suspend/resume code from activating. Does that make resume work now ?
Comment 22 Michael Paik 2005-06-27 04:27:36 UTC
Comment 23 Alan Hourihane 2005-06-27 04:44:17 UTC
O.k. great. I've added some more code to the DRM CVS to hopefully fix it for you. Can you 'cvs update' and re-try, ensuring that any previous patches applied are reversed.
Comment 24 Michael Paik 2005-06-27 07:16:37 UTC
This patch doesn't work against the 2.6.12 from the Fedora Project because the snapshot doesn't have the pm_message_t change to make it a struct... I don't know if this is in the current stable branch, but I recall it failed to patch against the -mm tree earlier this year.
Comment 25 Alan Hourihane 2005-06-27 07:34:32 UTC
so, if you modify the ifdef around the pm_message_t, can you try it with FC4 and report back ?
Comment 26 Michael Paik 2005-06-27 13:49:46 UTC
pm_message_t in the FC4 (well, FC5-beta really, since FC4 is using a 2.6.11 kernel which breaks agp support) 2.6.12 kernel is a u32. So setting int event = state, the module compiles, but again doesn't resume after suspend. For the time being I am running with .resume and .suspend commented out. Is there any functionality loss other than the smooth fading out of the screen on suspend?
Comment 27 Alan Hourihane 2005-06-27 13:59:35 UTC
The i915 suspend/resume functions just ensure that the chip has powered down the the devices. No functionality is really lost, but it may be taking more power in S3 than intended without them. It'd be good to find out which I915_WRITE call is hanging the resume operation. Could you sprinkle a few printk()'s around after each I915_WRITE to find out which one on your hardware is causing the lockup. I did find that LVDS caused a problem before, and that's why it's still commented out. Maybe another register causes similar issues.
Comment 28 Alan Hourihane 2005-06-27 14:04:23 UTC
The part of the code you are concerned with is in i915_set_dpms(), under the case 0: section.
Comment 29 Michael Paik 2005-06-27 15:41:49 UTC
Doing I915_WRITE to the panel causes it to blank, making kernel messages unreadable. However, even with that and the LVD bit commented out, the card refuses to come up correctly. It gets through all the I915_WRITEs, but doesn't resume. Last visible kernel messages are: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11 PCI: Enabling device 0000:00:02.1 (0000 -> 0002)
Comment 30 Alan Hourihane 2005-06-28 00:33:32 UTC
O.k, I'm going to comment out just the resume function from comment #21 which should be enough to fix things for now, leaving the suspend function enabled. Having thought about it, it's the right thing to do. Turning these registers back on without programming the other dependent registers is not correct. The re-POSTing of the BIOS will fix things up for us anyway. Let me know if this still doesn't work.
Comment 31 Michael Paik 2005-06-28 14:38:38 UTC
I tried commenting out just the .resume earlier, and it didn't work. I'll try it again and attach a stacktrace.
Comment 32 Alan Hourihane 2005-10-27 09:33:00 UTC
.resume & .suspend functions have been ripped out now. So closing this.