Bug 3613

Summary: i810 Driver and/or DRI cause hang on suspend under 2.6.12, xorg 6.8.2
Product: DRI Reporter: Michael Paik <mpaik>
Component: DRM/otherAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: high CC: cbm
Version: DRI git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Michael Paik 2005-06-23 23:44:52 UTC
Kernel: 2.6.12
Platform: x86 Intel Centrino
Model: IBM Thinkpad X40
Xorg: 6.8.2 [FC4 RPM]
DRI-Common: 20050621 Snapshot
DRI-i915: 20050621 Snapshot
i810 Xorg Driver: included with DRI-i915 snapthot

Upon suspend (ACPI suspend to RAM), system hangs. If I revert to the i810 driver
included in the xorg 6.8.2 RPM, which deactivates DRI due to libdri version
mismatch (4.3.0 needed, 5.0.0 found) the system suspends cleanly. Various other
snapshots from the last 3 months of the i810 xorg driver show the same problem.
I will cross-post this bug under DRI as well.

For some reason, /var/log/messages isn't logging the oops (it logs some other
call trace from a warning) so I'll have to type this manually from the artifact
left on the screen on hang:

EIP is at i915_set_dpms+0x1c/0x1b2 [i915]
eax: 00000000   ebx: 00000003   ecx: f89d34a2   edx: 00000003
esi: 00000000   edi: 00000003   ebp: 00000003   esp: f7d68ea4
ds: 007b    es: 007b    ss: 0068
Process x40-suspend.sh (pid: 4615, threadinfo=f7d68000 task=f6722550)
Stack: badc0ded f64cc89c c1d8519c 00000000 c021ae4a 00000001 0001c89c 00000000
       00000003 f4488000 00000000 00000003 f89d34e7 00000246 00000003 c1d851e0
       c1d851e0 c1d855ec c1d855ec c021ce7c c028d0ca f7ec8ddc f6360e34 f1e73888
Call Trace:
 [<c021ae4a>] pci_disable_device+0x57/0x63
 [<f89d34e7>] i915_suspend+0x45/0x94 [i915]
 [<c021ce7c>] pci_device_suspend+0x13/0x20
 [<c028d0ca>] suspend_device+0xda/0xe2
 [<c028d190>] device_suspend+0xbe/ox1c2
 [<c014a7b9>] suspend_prepare+0x57/0x85
 [<c014a880>] enter_state+0x2b/0x54
 [<c02470ba>] acpi_system_write_sleep+0x5a/0x6c
 [<c0247060>] acpi_system_write_sleep+0x0/0x6c
 [<c017cbf4>] vfs_write+0x9e/0x110
 [<c017cd11>] sys_write+0x41/0x6a
 [<c0103a51>] syscall_call+0x7/0xb
Code: ff ff 83 c4 18 5b c3 90 90 90 90 90 90 90 90 55 57 56 53 83 ec 1c 89 d7 8b
 b0 00 08 00 00 a1 08 45 db f8 85 c0 0f 85 8d 00 00 00 <8b> 46 04 8b 40 10 c7 80
 c4 03 00 00 01 00 00 00 8b 4e 04 8b 51
Comment 1 Michael Paik 2005-06-23 23:45:07 UTC

*** This bug has been marked as a duplicate of 3612 ***
Comment 2 Michael Paik 2005-06-23 23:49:41 UTC
Still an open bug, just wanted the cross-reference.
Comment 3 Michael Paik 2005-06-24 00:15:57 UTC
Fetch from CVS HEAD of i915 source for kernel module has same problem.

DRI does not have to be active; starting Xorg with Accel false and doing:

modprobe i915

and trying to suspend produces same behavior.
Comment 4 Alan Hourihane 2005-06-24 02:00:09 UTC
Created attachment 2946 [details]
Backtrace of the crash.

Here's a patch which should fix the problem.

Basically, if the i915 private hadn't been initialized the suspend would crash
accessing a NULL pointer.
Comment 5 Alan Hourihane 2005-06-24 02:00:56 UTC
*** Bug 3612 has been marked as a duplicate of this bug. ***
Comment 6 Michael Paik 2005-06-24 05:22:34 UTC
Patch causes machine to suspend okay, but doesn't come out of suspend.
Attempting to resume causes machine to wake to console, begin resume, but hang
at resume of Xorg.

Kernel oops stacktrace is seen briefly (~.5 sec) before screen blanks.
Comment 7 Michael Paik 2005-06-24 05:23:56 UTC
/var/log/messages is clean. Is there some way to force syslogd to catch these
panic messages?
Comment 8 Alan Hourihane 2005-06-24 05:42:22 UTC
syslogd isn't running at the point when suspend/resume happens, so it can't.

Resume worked here o.k., so whatever info you've got post it here.
Comment 9 Michael Paik 2005-06-24 09:45:44 UTC
After much tinkering and disabling direct rendering and trying to load the 
i915/drm modules without actually enabling direct rendering, it appears that 
the resume hangs right after drm tries to enable the device at 0000:00:02.1 
(0000 -> 0002), which is /dev/dri/card1. I notice that at that point ACPI 
hasn't gotten around to restarting that interrupt yet (drm kicks in after 
0000:00:02.0[A], /dev/dri/card0).

From /var/log/messages: [Last couple lines from the hard reboot]
Jun 24 12:16:24 eleios kernel: Stopping tasks: 
===============================================================================
=========================================================|
Jun 24 12:16:24 eleios kernel: Debug: sleeping function called from invalid 
context at mm/slab.c:2126
Jun 24 12:16:24 eleios kernel: in_atomic():0, irqs_disabled():1
Jun 24 12:16:24 eleios kernel:  [<c015d212>] kmem_cache_alloc+0x63/0x78
Jun 24 12:16:24 eleios kernel:  [<c0249bf6>] acpi_pci_link_set+0x3f/0x17f
Jun 24 12:16:24 eleios kernel:  [<c024a040>] irqrouter_resume+0x14/0x28
Jun 24 12:16:24 eleios kernel:  [<c0289344>] sysdev_resume+0x3d/0xb5
Jun 24 12:16:24 eleios kernel:  [<c028d593>] device_power_up+0x5/0xa
Jun 24 12:16:24 eleios kernel:  [<c014a82b>] suspend_enter+0x44/0x46
Jun 24 12:16:24 eleios kernel:  [<c014a7b9>] suspend_prepare+0x57/0x85
Jun 24 12:16:24 eleios kernel:  [<c014a89e>] enter_state+0x49/0x54
Jun 24 12:16:24 eleios kernel:  [<c02470ba>] acpi_system_write_sleep+0x5a/0x6c
Jun 24 12:16:23 eleios kernel:  [<c0247060>] acpi_system_write_sleep+0x0/0x6c
Jun 24 12:16:23 eleios kernel:  [<c017cbf4>] vfs_write+0x9e/0x110
Jun 24 12:16:23 eleios kernel:  [<c017cd11>] sys_write+0x41/0x6a
Jun 24 12:16:23 eleios kernel:  [<c0103a51>] syscall_call+0x7/0xb
Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link 
[LNKC] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link 
[LNKB] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:00:1f.6[B] -> Link 
[LNKB] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:23 eleios kernel: ACPI: PCI Interrupt 0000:02:01.0[A] -> Link 
[LNKE] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:23 eleios kernel: Restarting tasks... done
Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link 
[LNKA] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 
0: 
Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002)
Jun 24 12:16:29 eleios kernel: [drm] Initialized i915 1.2.0 20041217 on minor 
1: 
Jun 24 12:18:17 eleios syslogd 1.4.1: restart.
Jun 24 12:18:18 eleios kernel: klogd 1.4.1, log source = /proc/kmsg started.



Relevant bit of Xorg log:

drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is -1, (Unknown error 999)
drmOpenDevice: open result is -1, (Unknown error 999)
drmOpenDevice: Open failed
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is -1, (Unknown error 999)
drmOpenDevice: open result is -1, (Unknown error 999)
drmOpenDevice: Open failed
drmOpenByBusid: Searching for BusID pci:0000:00:02.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 7, (OK)
drmOpenByBusid: drmOpenMinor returns 7
drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0
(II) I810(0): [drm] loaded kernel module for "i915" driver
(II) I810(0): [drm] DRM interface version 1.2
(II) I810(0): [drm] created "i915" driver at busid "pci:0000:00:02.0"
(II) I810(0): [drm] added 8192 byte SAREA at 0xf8997000
(II) I810(0): [drm] mapped SAREA 0xf8997000 to 0xb7e84000
(II) I810(0): [drm] framebuffer handle = 0xe0020000
(II) I810(0): [drm] added 1 reserved context for kernel
(II) I810(0): Allocated 3072 kB for the back buffer at 0x7800000.
(II) I810(0): Allocated 3072 kB for the depth buffer at 0x7400000.
(II) I810(0): Allocated 32 kB for the logical context at 0x73f8000.
(II) I810(0): Allocated 54016 kB for textures at 0x520000
(II) I810(0): Updated framebuffer allocation size from 5120 to 5128 kByte
(II) I810(0): Updated pixmap cache from 512 scanlines to 514 scanlines
(II) I810(0): 0x9da97ec: Memory at offset 0x00020000, size 5128 kBytes
(II) I810(0): 0x9cd3920: Memory at offset 0x07fff000, size 4 kBytes
(II) I810(0): 0x9ea9318: Memory at offset 0x07ffb000, size 16 kBytes
(II) I810(0): 0x9ea92e4: Memory at offset 0x00000000, size 128 kBytes
(II) I810(0): 0x9da982c: Memory at offset 0x07fea000, size 64 kBytes
(II) I810(0): 0x9f2d280: Memory at offset 0x07ffa000, size 4 kBytes
(II) I810(0): 0x9da987c: Memory at offset 0x07800000, size 3072 kBytes
(II) I810(0): 0x9da989c: Memory at offset 0x07400000, size 3072 kBytes
(II) I810(0): 0x9da98dc: Memory at offset 0x073f8000, size 32 kBytes
(II) I810(0): 0x9da98bc: Memory at offset 0x00520000, size 54016 kBytes
(II) I810(0): Activating tiled memory for the back buffer.
(II) I810(0): Activating tiled memory for the depth buffer.
(II) I810(0): [drm] Registers = 0xd0000000
(II) I810(0): [drm] Back Buffer = 0xe7800000
(II) I810(0): [drm] Depth Buffer = 0xe7400000
(II) I810(0): [drm] ring buffer = 0xe0000000
(II) I810(0): [drm] textures = 0xe0520000
(II) I810(0): [drm] dma control initialized, using IRQ 11
(II) I810(0): [drm] Initialized kernel agp heap manager, 55312384
(II) I810(0): [dri] visual configs initialized
(==) I810(0): Write-combining range (0xe0000000,0x8000000)
(II) I810(0): vgaHWGetIOBase: hwp->IOBase is 0x03d0, hwp->PIOOffset is 0x0000
(WW) I810(0): Extended BIOS function 0x5f05 failed.
(II) I810(0): xf86BindGARTMemory: bind key 12 at 0x007df000 (pgoffset 2015)
(II) I810(0): xf86BindGARTMemory: bind key 5 at 0x07fff000 (pgoffset 32767)
(II) I810(0): xf86BindGARTMemory: bind key 6 at 0x07ffb000 (pgoffset 32763)
(II) I810(0): xf86BindGARTMemory: bind key 8 at 0x07fea000 (pgoffset 32746)
(II) I810(0): xf86BindGARTMemory: bind key 7 at 0x07ffa000 (pgoffset 32762)
(II) I810(0): xf86BindGARTMemory: bind key 9 at 0x07800000 (pgoffset 30720)
(II) I810(0): xf86BindGARTMemory: bind key 10 at 0x07400000 (pgoffset 29696)
(II) I810(0): xf86BindGARTMemory: bind key 11 at 0x073f8000 (pgoffset 29688)
(II) I810(0): Display plane A is disabled and connected to Pipe A.
(II) I810(0): Display plane B is enabled and connected to Pipe B.
(II) I810(0): Enabling plane B.
(II) I810(0): Display plane A is now disabled and connected to Pipe A.
(II) I810(0): Display plane B is now enabled and connected to Pipe B.
(II) I810(0): PIPEACONF is 0x80000000
(II) I810(0): PIPEBCONF is 0x80000000
(II) I810(0): Mode bandwidth is 47 Mpixel/s
(II) I810(0): maxBandwidth is 1440 Mbyte/s, pipe bandwidths are 252 Mbyte/s, 0 
Mbyte/s
(II) I810(0): Using XFree86 Acceleration Architecture (XAA)
	Screen to screen bit blits
	Solid filled rectangles
	8x8 mono pattern filled rectangles
	Indirect CPU to Screen color expansion
	Solid Horizontal and Vertical Lines
	Offscreen Pixmaps
	Setting up tile and stipple cache:
		16 128x128 slots
		4 256x256 slots
(==) I810(0): Backing store disabled
(==) I810(0): Silken mouse enabled
(II) I810(0): Initializing HW Cursor
(**) Option "dpms"
(**) I810(0): DPMS enabled
(II) I810(0): X context handle = 0x1
(II) I810(0): [drm] installed DRM signal handler
(II) I810(0): [DRI] installation complete
(II) I810(0): direct rendering: Enabled
Comment 10 Michael Paik 2005-06-24 09:55:22 UTC
To be clear, the Xorg log in the prior post is from boot, not from crash. No 
entries are recorded by X at that time.
Comment 11 Alan Hourihane 2005-06-24 11:18:00 UTC
In the patch I attached before - I added some pci_disable_device() and
pci_enable_device() function calls. Can you remove them and try again.
Comment 12 Michael Paik 2005-06-24 12:53:39 UTC
Comes out of suspend (suspend indicator light turns off) but module fails to 
reload. No artifacts in message logs.

Tinkering with the source in i915_pm.c fails to produce a working kludge. More 
to the point, why does this happen regardless of the order of 
pci_set_power_state, pci_enable_device, and i915_set_dpms in i915_pm? One 
would think that subsequent statements (pci_set_power_state, 
pci_enable_device) would produce kernel messages for the same device/minor 
rather than the subsequent ones. Do these calls increment the minor internally 
or something? Kernel documentation of these is relatively vague.

Jun 24 12:16:29 eleios kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> Link 
[LNKA] -> GSI 11 (level, low) -> IRQ 11
Jun 24 12:16:29 eleios kernel: PCI: Enabling device 0000:00:02.1 (0000 -> 0002)
Comment 13 Alan Hourihane 2005-06-24 13:53:11 UTC
I think you are mis-reading the messages.

The module doesn't reload as it's already loaded, it just re-initializes. That's
all.

You should find that with lsmod shows that i915 is still loaded and you should
be able to start X just fine.
Comment 14 Michael Paik 2005-06-24 13:56:53 UTC
Sorry, I suppose I mis-spoke. When I say the module fails to reload, I mean to 
say that the machine comes out of suspend, but hangs during the re-
initialization of the i915 drm module. That is to say, the suspend indicator 
light turns off, but the screen remains blank (if I start X with hardware 
accel on; it simply hangs at the console if I start with Accel false but the 
i915 module loaded) and the machine is entirely unresponsive.
Comment 15 Alan Hourihane 2005-06-24 14:26:58 UTC
You might want to comment out the pci_set_power_state() too. 

Both the pci_enable/disable_device and pci_set_power_state() functions I added,
probably shouldn't have been. So I'll probably remove them again.
Comment 16 Michael Paik 2005-06-24 22:41:36 UTC
I have already tried removing them. Still hangs on resume with a black screen.

Is there a straightforward kludge I can put in to only have it try to activate
/dev/dri/card0?
Comment 17 Alan Hourihane 2005-06-25 04:22:39 UTC
Does resume work without the i915 driver loaded ?
Comment 18 Alan Hourihane 2005-06-25 04:24:00 UTC
Oh, and if you are not running the Xserver and trying to resume, no wonder you
have a black screen.

You either need to rePOST your BIOS with appropriate tools, or have the Xserver
running when you resume so the Xserver will rePOST the video BIOS for you.
Comment 19 FreeDesktop Bugzilla Database Corruption Fix User 2005-06-25 05:38:12 UTC
On a dozen Celeron 4xx with i810e chipset and 2.611.3 / 2.612-rc5:

Recent DRI i810 snapshtots all resume fine when using APM in the following ways:
  apm -s
  apmsleep ...
Supend APM using power button fails and hangs on resume so enable Ignore user supend in .config
and use apm -s

Confirm ACPI S3 as well as swsusp-2 hang on resume or shortly thereafter ( after X repainting screen) .

Note some systems have ATI GDC and use mach64.
FB/DRM Drivers tested:
  i810fb/i810
  vesafb/mach64

Note: Lockup on resume when using netconsole with 2.6 is due to netconsole. A fix is underrway. Use 
seriai console in the meantime.

Comment 20 Michael Paik 2005-06-25 21:33:48 UTC
Resume works fine without the i915 driver loaded, even when the drm module is
loaded.

I am running the Xserver.
Comment 21 Alan Hourihane 2005-06-27 01:39:41 UTC
In i915_drv.c, on line 107, you should find code that says this....
   
   .resume = i915_resume,
   .suspend = i915_suspend,

Just delete those two lines, which will avoid the i915's suspend/resume code
from activating.

Does that make resume work now ?
Comment 22 Michael Paik 2005-06-27 04:27:36 UTC
Works beautifully.
Comment 23 Alan Hourihane 2005-06-27 04:44:17 UTC
O.k. great.

I've added some more code to the DRM CVS to hopefully fix it for you. Can you
'cvs update' and re-try, ensuring that any previous patches applied are reversed.
Comment 24 Michael Paik 2005-06-27 07:16:37 UTC
This patch doesn't work against the 2.6.12 from the Fedora Project because the 
snapshot doesn't have the pm_message_t change to make it a struct... I don't 
know if this is in the current stable branch, but I recall it failed to patch 
against the -mm tree earlier this year.
Comment 25 Alan Hourihane 2005-06-27 07:34:32 UTC
so, if you modify the ifdef around the pm_message_t, can you try it with FC4 and
report back ?
Comment 26 Michael Paik 2005-06-27 13:49:46 UTC
pm_message_t in the FC4 (well, FC5-beta really, since FC4 is using a 2.6.11 
kernel which breaks agp support) 2.6.12 kernel is a u32. So setting int event 
= state, the module compiles, but again doesn't resume after suspend.

For the time being I am running with .resume and .suspend commented out.

Is there any functionality loss other than the smooth fading out of the screen 
on suspend?
Comment 27 Alan Hourihane 2005-06-27 13:59:35 UTC
The i915 suspend/resume functions just ensure that the chip has powered down the
the devices. No functionality is really lost, but it may be taking more power in
S3 than intended without them.

It'd be good to find out which I915_WRITE call is hanging the resume operation.

Could you sprinkle a few printk()'s around after each I915_WRITE to find out
which one on your hardware is causing the lockup.

I did find that LVDS caused a problem before, and that's why it's still
commented out. Maybe another register causes similar issues.

Comment 28 Alan Hourihane 2005-06-27 14:04:23 UTC
The part of the code you are concerned with is in i915_set_dpms(), under the
case 0: section.

Comment 29 Michael Paik 2005-06-27 15:41:49 UTC
Doing I915_WRITE to the panel causes it to blank, making kernel messages 
unreadable.

However, even with that and the LVD bit commented out, the card refuses to 
come up correctly. It gets through all the I915_WRITEs, but doesn't resume. 
Last visible kernel messages are:

ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> 
IRQ 11
PCI: Enabling device 0000:00:02.1 (0000 -> 0002)
Comment 30 Alan Hourihane 2005-06-28 00:33:32 UTC
O.k, I'm going to comment out just the resume function from comment #21 which
should be enough to fix things for now, leaving the suspend function enabled.

Having thought about it, it's the right thing to do. Turning these registers
back on without programming the other dependent registers is not correct. The
re-POSTing of the BIOS will fix things up for us anyway.

Let me know if this still doesn't work.
Comment 31 Michael Paik 2005-06-28 14:38:38 UTC
I tried commenting out just the .resume earlier, and it didn't work.

I'll try it again and attach a stacktrace.
Comment 32 Alan Hourihane 2005-10-27 09:33:00 UTC
.resume & .suspend functions have been ripped out now. So closing this.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.