Bug 18879 - [GEM] VT->X (and resuming from suspend) with compiz running locks machine, with git driver
[GEM] VT->X (and resuming from suspend) with compiz running locks machine, wi...
Status: VERIFIED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
git
Other All
: high critical
Assigned To: Jesse Barnes
Xorg Project Team
:
: 18940 19202 (view as bug list)
Depends on:
Blocks: 18098 18841
  Show dependency treegraph
 
Reported: 2008-12-03 18:52 UTC by Ben Gamari
Modified: 2009-01-12 10:28 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.0.log (70.36 KB, text/plain)
2008-12-22 18:40 UTC, liuhaien
no flags Details
xorg conf file (3.67 KB, text/plain)
2008-12-22 18:41 UTC, liuhaien
no flags Details
don't uninstall irq handler (898 bytes, patch)
2008-12-30 16:55 UTC, Jesse Barnes
no flags Details | Splinter Review
dmesg after oops (34.53 KB, text/plain)
2008-12-30 23:09 UTC, liuhaien
no flags Details
don't remove irq handler (take #2) (1.36 KB, patch)
2008-12-31 11:31 UTC, Jesse Barnes
no flags Details | Splinter Review
clear vblank enabled on irq uninstall (1.31 KB, patch)
2009-01-06 10:04 UTC, Jesse Barnes
no flags Details | Splinter Review
clear vblank enabled on irq uninstall (849 bytes, patch)
2009-01-07 16:34 UTC, Jesse Barnes
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Gamari 2008-12-03 18:52:49 UTC
After upgrading my xorg stack today to test some patches, I found that suspend and resume (S3) are now broken. When I attempt to resume the machine after suspending with compiz running, xorg comes back with a black screen and an unresponsive cursor.

mesa: 264cba6f70eacd9e04646104d10ba63c248d7b83
libdrm: b0d93c74d884b40bd94469a5ef75fdb2fef17680
xserver: f841d4e3cccbde02e91c948f5ffb9e32c8c3b3cc
xf86-video-intel: b662ecccb5c036fcc4aa19026642bde0a1ca2ac8
kernel: 2.6.27.7-132.fc10.x86_64
Comment 1 Ben Gamari 2008-12-03 23:22:39 UTC
As expected, VT switching is also now broken with compiz. The symptoms seem to reflect bug #18062 pretty closely yet before upgrading to git today, this bug had been fixed ever since keith's vblank counter fix was merged into the fedora kernel. Below is a backtrace from a frozen server.

#0  0x00000033b6addff7 in ioctl () from /lib64/libc.so.6
#1  0x0000003f57c02753 in drmIoctl (fd=12, request=3222823994, arg=0x7fff35c893f0) at xf86drm.c:183
#2  0x0000003f57c02bf0 in drmWaitVBlank (fd=12, vbl=0x7fff35c893f0) at xf86drm.c:1895
#3  0x0000000001063b3e in do_wait (vbl=0x7fff35c893f0, vbl_seq=0x7f921b4f19a0, fd=902337520) at ../common/vblank.c:255
#4  0x0000000001063d53 in driWaitForVBlank (priv=0x7f921b4f1940, missed_deadline=0x7fff35c8947f "")
    at ../common/vblank.c:406
#5  0x000000000106ba05 in intelSwapBuffers (dPriv=0x7f921b4f1940) at intel_buffers.c:740
#6  0x0000000001063f43 in driSwapBuffers (dPriv=0xc) at ../common/dri_util.c:321
#7  0x0000000000c049bf in __glXDRIdrawableSwapBuffers (basePrivate=0x7f92190abf20) at glxdri.c:251
#8  0x0000000000bf8c46 in __glXDisp_SwapBuffers (cl=<value optimized out>, pc=<value optimized out>) at glxcmds.c:1436
#9  0x0000000000bfbf5f in __glXDispatch (client=0x7f921b8a2620) at glxext.c:523
#10 0x000000000043e1b4 in Dispatch () at dispatch.c:437
#11 0x0000000000423fad in main (argc=8, argv=0x7fff35c89708, envp=<value optimized out>) at main.c:383
Comment 2 Ben Gamari 2008-12-04 00:32:31 UTC
By request of anholt, I've been spending a bit of time attempting to bisect this. I first attempted to rollback xf86-video-intel to b662ecccb5c036fcc4aa19026642bde0a1ca2ac8, but this made no impact. I then went back to master and tried rolling back mesa to MESA_7_2. Unfortunately, this broke glx,

[0331 ben@mercury /opt/exp/xorg/mesa] $ glxinfo
name of display: :1.0
Failed to initialize TTM buffer manager.  Falling back to classic.
[intel_init_bufmgr:500] Error initializing buffer manager.
brwCreateContext: failed to init intel context
X Error of failed request:  BadAlloc (insufficient resources for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Serial number of failed request:  24
  Current serial number in output stream:  27

Going to try a mesa commit from Sep 17 next (39d29fe7fec304fa3638db15b868ebbcb8292167).
Comment 3 Ben Gamari 2008-12-04 00:40:23 UTC
That mesa commit didn't configure, tried 72c914805b8b3b37bf8f44d94bc25ca3d146ac66 from Nov 1. This wouldn't compile due to dri2 changes,

dri2.c: In function ‘DRI2CopyRegion’:
dri2.c:297: error: ‘xDRI2CopyRegionReq’ has no member named ‘bitmask’

Tried 4c167f8fc1e56b6c82d8917c237e70531e3d57b9 from Nov 13. Same issue. This is futile. I'm going to give the hell up and get some sleep.
Comment 4 Eric Anholt 2008-12-05 12:51:05 UTC
the mesa 7.2 failed because you presumably have Legacy3D FALSE set since you'd moved to a GEM environment, but mesa 7.2 needs the classic memory allocation.
Comment 5 Ian Romanick 2008-12-05 18:05:09 UTC
I am able to reproduce this on the following bits:

Mesa:             f18880038b46c253d8689c9f6f7b77fca261e702
xf86-video-intel: 8d7cbab267e8fbcb2fcf90b18346b60607277266
libdrm:           b0d93c74d884b40bd94469a5ef75fdb2fef17680
xserver:          027ff97a1354ab4c83fecb615f6bc2a6b739b871
kernel:           66647dc60d16fae9f6963fd98b6d9baa1a8dac69

Start the xserver with compiz and doing 'chvt 1', waiting a few seconds, and doing 'chvt 7' results X hanging with an identical stack track.
Comment 6 Gordon Jin 2008-12-21 17:14:06 UTC
This seems the true block for Q4 release. Can any of you confirm this still exists?
Comment 7 liuhaien 2008-12-21 18:00:42 UTC
the issue has been fixed against:
Libdrm:		(master)0243c9f801a35de3465a0321c02f18a4d07ce5b8
Mesa_stable:		(intel-2008-q4)f96baeaac3ef41260ac3975750627ece073fdce0
Xserver_stable:	(server-1.6-branch)32e81074b967716865aef08b66ec29caf0fec2c5
Xf86_video_intel_stable:(xf86-video-intel-2.6-branch)
                      83f3c376b5942e134047a220e6e5f2432ffc492c
GEM_kernel:       (for-airlied)0fbdb7c9455a05eb89f358f0eb66fb8ab094a0c5
Comment 8 Gordon Jin 2008-12-21 19:47:18 UTC
*** Bug 19202 has been marked as a duplicate of this bug. ***
Comment 9 liuhaien 2008-12-21 22:19:17 UTC
it works on gm965 ,but broken on q965 with the same code. So I reopen this bug for q965.xorg comes back with a black screen and an unresponsive cursor. we can access it by remote but no response when run any applications. the only thing we can do is just reboot.


Comment 10 liuhaien 2008-12-21 23:44:36 UTC
the same issue happens on 945gm and g45.
Comment 11 Caleb Cushing 2008-12-22 03:50:03 UTC
any chance of a backported patch against 2.5?
Comment 12 Michael Fu 2008-12-22 18:18:14 UTC
(In reply to comment #9)
> it works on gm965 ,but broken on q965 with the same code. So I reopen this bug
> for q965.xorg comes back with a black screen and an unresponsive cursor. we can
> access it by remote but no response when run any applications. the only thing
> we can do is just reboot.
> 

can you grab the log via ssh and attach here?
Comment 13 liuhaien 2008-12-22 18:40:06 UTC
Created attachment 21423 [details]
xorg.0.log
Comment 14 liuhaien 2008-12-22 18:41:12 UTC
Created attachment 21424 [details]
xorg conf file
Comment 15 liuhaien 2008-12-22 18:43:01 UTC
(In reply to comment #12)
> (In reply to comment #9)
> > it works on gm965 ,but broken on q965 with the same code. So I reopen this bug
> > for q965.xorg comes back with a black screen and an unresponsive cursor. we can
> > access it by remote but no response when run any applications. the only thing
> > we can do is just reboot.
> > 
> 
> can you grab the log via ssh and attach here?
> 

xorg.0.log is attached.
Comment 16 Jesse Barnes 2008-12-30 14:18:30 UTC
I can't reproduce this particular problem with the drm-intel-next branch, mesa, xserver and xf86-video-intel from today.  However, compiz does give me a black screen when I VT switch back to it; only the mouse cursor is visible.  It changes when I move across a window though, so the window manager is running and doesn't appear to be stuck waiting for vblank at least... 
Comment 17 Jesse Barnes 2008-12-30 14:59:41 UTC
Oops, spoke too soon, looks like I am seeing this.  So far, I see what look like a couple of problems:
  - the vblank refcount is 0 even while X & compiz are running
    this shouldn't happen since X is constantly doing vblank sync'd buffer swaps
    (or at least it appears to be)
  - glxgears properly causes the refcount to be increased, but doesn't prevent
    the problem 

It looks like interrupts aren't coming in after the VT switch...
Comment 18 Jesse Barnes 2008-12-30 16:55:12 UTC
Created attachment 21571 [details] [review]
don't uninstall irq handler

This patch fixes the problem for me.  Looks like the server was calling into the DRM's vblank wait routine before the 2D driver had called in to re-enable interrupts.  We probably shouldn't be disabling interrupts at all though...
Comment 19 liuhaien 2008-12-30 22:18:02 UTC
(In reply to comment #18)
> Created an attachment (id=21571) [details]
> don't uninstall irq handler
> 
> This patch fixes the problem for me.  Looks like the server was calling into
> the DRM's vblank wait routine before the 2D driver had called in to re-enable
> interrupts.  We probably shouldn't be disabling interrupts at all though...
> 
hi,jesse,
your patch works with drm-intel-next branch for me , but it will cause oops against drm-intel-2.6.28 branch and kernel 2.6.28-release when start X.
Comment 20 liuhaien 2008-12-30 23:07:40 UTC
following is the oops info:
Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: Oops: 0000 [#1] SMP

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: last sysfs file: /sys/class/drm/card0/dev

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: Process X (pid: 2627, ti=f6110000 task=f60679e0 task.ti=f6110000)

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: Stack:

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  00000000 f60679e0 c041f8a0 00000000 00000000 f805da6c f6403e88 40046445

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  f6076400 f7e05648 f606bde0 fffffff4 f607642c f8067854 f615b600 bf8b8344

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: Call Trace:

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c0435763>] add_wait_queue+0x1f/0x2b

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<f805db34>] i915_irq_wait+0xc8/0x190 [i915]

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c041f8a0>] default_wake_function+0x0/0x8

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<f805da6c>] i915_irq_wait+0x0/0x190 [i915]

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<f7e05648>] drm_ioctl+0x1a7/0x22f [drm]

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c04831bc>] vfs_ioctl+0x47/0x5d

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c048366f>] do_vfs_ioctl+0x3c5/0x40f

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c0464c45>] handle_mm_fault+0x560/0x5bb

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c040440b>] common_interrupt+0x23/0x28

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c04836fa>] sys_ioctl+0x41/0x58

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c040386d>] sysenter_do_call+0x12/0x21

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel:  [<c0630000>] generic_processor_info+0x83/0x103

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: Code: 63 71 c0 e8 b3 24 f3 ff 83 c4 14 8b 13 8b 43 04 89 42 04 89 10 c7 43 04 00 02 20 00 c7 03 00 01 10 00 5b c3 57 89 c7 56 89 d6 53 <8b> 41 04 89 cb 39 d0 74 17 51 50 52 68 7a 63 71 c0 6a 1a 68 2f

Message from syslogd@x-q965 at Dec 31 14:50:44 ...
 kernel: EIP: [<c04f3dc9>] __list_add+0x7/0x52 SS:ESP 0068:f6110ebc

[1]+  Done                    xinit
Comment 21 liuhaien 2008-12-30 23:09:17 UTC
Created attachment 21582 [details]
dmesg after oops
Comment 22 Jesse Barnes 2008-12-31 11:31:39 UTC
Created attachment 21601 [details] [review]
don't remove irq handler (take #2)

This one works for me against the 2.6.28 branch.
Comment 23 liuhaien 2009-01-03 18:14:45 UTC
(In reply to comment #22)
> Created an attachment (id=21601) [details]
> don't remove irq handler (take #2)
> 
> This one works for me against the 2.6.28 branch.
> 

thanks,this patch also works for me against 2.6.28 branch.
Comment 24 Jesse Barnes 2009-01-06 10:04:03 UTC
Created attachment 21728 [details] [review]
clear vblank enabled on irq uninstall

Can you try this one too?  It's best to try VT switching back both before and after 5s or so have elapsed, that way you'll test the disable timer too.
Comment 25 liuhaien 2009-01-06 18:38:00 UTC
(In reply to comment #24)
> Created an attachment (id=21728) [details]
> clear vblank enabled on irq uninstall
> 
> Can you try this one too?  It's best to try VT switching back both before and
> after 5s or so have elapsed, that way you'll test the disable timer too.
> 

hi,jesse
against your two patches, VT-switch is fine but s3 still broken.
I 'm sorry for forgetting testing s3 with your first patch.
Comment 26 Gordon Jin 2009-01-06 21:52:22 UTC
*** Bug 18940 has been marked as a duplicate of this bug. ***
Comment 27 Jesse Barnes 2009-01-07 16:34:22 UTC
Created attachment 21774 [details] [review]
clear vblank enabled on irq uninstall

This one also works for me with kwin & compiz (though with compiz I can't always reproduce the problem now that a separate libdrm fix has been committed).  Suspend/resume works as well.  Are you sure this patch was causing suspend/resume to fail?  There were some problems with upstream kernels there...
Comment 28 liuhaien 2009-01-07 20:51:48 UTC
(In reply to comment #27)
> Created an attachment (id=21774) [details]
> clear vblank enabled on irq uninstall
> 
> This one also works for me with kwin & compiz (though with compiz I can't
> always reproduce the problem now that a separate libdrm fix has been
> committed).  Suspend/resume works as well.  Are you sure this patch was causing
> suspend/resume to fail?  There were some problems with upstream kernels
> there...
> 

I got the reason. we sticked libdrm to 2.4.3 recently,so the fix which you said above is not updated. right now,with the libdrm fix, your patch works for me with vt and s3. thanks.
Comment 29 Daniel Drake 2009-01-08 10:47:01 UTC
Jesse, another success report here:
https://bugs.gentoo.org/show_bug.cgi?id=253813
Would be great to see that patch upstream soon. Thanks!
Comment 30 Caleb Cushing 2009-01-08 12:49:41 UTC
I only tested the take #2 patch... but that didn't long term... need to test the new patch, I need to clarify in my downstream.
Comment 31 Eric Anholt 2009-01-09 17:09:11 UTC
Two new commits in drm-intel-next and drm-intel-2.6.28:

commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Jan 6 10:21:24 2009 -0800

    drm/i915: set vblank enabled flag correctly across IRQ install/uninstall

commit 9f4f07ceb1716d8796089fcef91621c5f07c872a
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Thu Jan 8 10:42:15 2009 -0800

    drm/i915: don't enable vblanks on disabled pipes

along with libdrm:
commit f4f76a6894b40abd77f0ffbf52972127608b9bca
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Wed Jan 7 10:18:08 2009 -0800

    libdrm: add timeout handling to drmWaitVBlank
Comment 32 Caleb Cushing 2009-01-11 16:05:20 UTC
trying to test the all 3 commits (was just going to grab them via web) but I can't seem to find the last patch. first 2 alone don't cut it.
Comment 33 Gordon Jin 2009-01-11 17:26:57 UTC
(In reply to comment #32)
> trying to test the all 3 commits (was just going to grab them via web) but I
> can't seem to find the last patch. first 2 alone don't cut it.

You should be able to find the last patch at http://cgit.freedesktop.org/mesa/drm/commit/?id=f4f76a6894b40abd77f0ffbf52972127608b9bca

Comment 34 liuhaien 2009-01-11 20:58:37 UTC
verified against:
xf86_video_intel   xf86-video-intel-2.6-branch
       commit 4447973345a2a7af20ba1d6cb18c5f1ed8949d00 (2.5.99.2)

mesa   intel-2008-q4 branch
       commit eef0dcc298f65158dc750a09f80317ded1101dc7 (before and close to 7.3)

kernel  drm-intel-2.6.28 branch
        commit e1a6fcee467556a7e955fe1f7ccc134dd2f974e7 (2.6.28 + 5 patches)

libdrm  master branch
       commit ac8b3308b9432edef5cabe30559004314d42d98c (after 2.4.3)

xserver server-1.6-branch
 commit 8cfb353078d9b5d03a9633304038141a60adc970
Comment 35 Caleb Cushing 2009-01-12 01:35:56 UTC
are the X11 fixes required for this to work?
Comment 36 Jesse Barnes 2009-01-12 09:39:34 UTC
No, just kernel and libdrm fixes.  For the kernel:
drm/i915: set vblank enabled flag correctly across install/uninstall
drm/i915: don't enable vblanks on disabled pipes

and for libdrm:
libdrm: add timeout handling to drmWaitVBlank
Comment 37 Caleb Cushing 2009-01-12 10:08:07 UTC
just making sure I'm not being stupid... libdrm isn't part of the kernel.. (my comment on X was due to the fact that gentoo has libdrm as an X package)

I couldn't get the libdrm patch to apply to 2.4.3. possible I'm doing something wrong... but I want to make sure it should.
Comment 38 Jesse Barnes 2009-01-12 10:28:45 UTC
Oh I'm not sure about applying it to 2.4.3, I was developing against git master.  You'd have to take a look at what else went into git master that might affect the context of the patch (or just replace your package tarball with a new tarball of git master to make things easy).