Bug 29454

Summary: hang with "AIGLX: Suspending AIGLX clients for VT switch"
Product: xorg Reporter: Oleksij Rempel <linux>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: hramrach
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
xorg.log
none
reg_dump on video hang
none
dmesg on video hang none

Description Oleksij Rempel 2010-08-09 00:06:39 UTC
Hallo,

i have multiple bugs/problems with my current setup. My be it is just one bug in different forms.

The bug of current topic (AIGLX: Suspending AIGLX clients for VT switch) tricky to reproduce after fresh start, but it happen if my station working some time.

This happening normally if i watch some flash video with browser on full screen. 
For example this one: http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/166#/beitrag/video/1109198/ZDF-heute-Sendung-vom-08-August-2010

i didn't get to reproduce it after fresh start, even i run this video several times.
So i use this mixture to reproduce: 
do it 3 times {
   - start openarena
   - than exit it
}
- after it start this video, screen will freeze after about 5 minutes.

nothing unusual in dmesg, but in xorg.log i get 
AIGLX: Suspending AIGLX clients for VT switch
after it freeze.
Comment 1 Oleksij Rempel 2010-08-09 00:08:08 UTC
uname -a 2.6.35-04812-g32d4379 (from intel_drm_next)

apt-cache policy xserver-xorg-video-intel
xserver-xorg-video-intel:
  Installiert: 2:2.12.0+git20100806.6304cb04-0ubuntu0sarvatt
  Kandidat:    2:2.12.0+git20100806.6304cb04-0ubuntu0sarvatt
  Versionstabelle:
 *** 2:2.12.0+git20100806.6304cb04-0ubuntu0sarvatt 0
        500 http://ppa.launchpad.net/xorg-edgers/ppa/ubuntu/ maverick/main amd64 Packages
        100 /var/lib/dpkg/status
     2:2.12.0-1ubuntu2 0
        500 http://de.archive.ubuntu.com/ubuntu/ maverick/main amd64 Packages
Comment 2 Oleksij Rempel 2010-08-09 00:08:39 UTC
Created attachment 37719 [details]
xorg.log
Comment 3 Chris Wilson 2010-08-09 00:55:34 UTC
What's the content of /sys/kernel/debug/dri/0/i915_gem_interrupt after a hang? Is the "AIGLX: Suspending..." a reliable pre-cursor to the hang (i.e. do you ever see it when it doesn't hang, or may it hang and not print the message)?
Comment 4 Oleksij Rempel 2010-08-09 05:21:37 UTC
I get this message only on the hang. The problems is there is one more hang issue exactly in this reproduction case. Starting openarena may cause hang without this message.

If it hang with openarena i has dark screen with big mouse pointer (i can move it). Currently it looks like i have 8 openrena hangs vs 1 "Suspending AIGLX". I still work on reproducing "Suspending AIGLX"

here is the output of i915_gem_interrupt after openaren hang:
Interrupt enable:    02028c53
Interrupt identity:  00000000
Interrupt mask:      fffc53ae
Pipe A stat:         00000306
Pipe B stat:         00000000
Interrupts received: 461645
Current sequence:    429563
Waiter sequence:     0
IRQ sequence:        0

if openarena startet normally it looks like this:
Interrupt enable:    02028c53
Interrupt identity:  00000000
Interrupt mask:      fffc53ae
Pipe A stat:         00040100
Pipe B stat:         00000000
Interrupts received: 15232
Current sequence:    31249
Waiter sequence:     0
IRQ sequence:        0
Comment 5 Oleksij Rempel 2010-08-09 10:12:18 UTC
Ok, i get it. This time it hang on video but instead of ".. Suspending AIGLX.." i get:
[ 13271.554] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 13271.554] 
Backtrace:
[ 13271.569] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4a0c68]
[ 13271.569] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49c1c4]
[ 13271.569] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47d7f4]
[ 13271.569] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7fe466ad4000+0x423f) [0x7fe466ad823f]
[ 13271.569] 4: /usr/bin/X (0x400000+0x67937) [0x467937]
[ 13271.569] 5: /usr/bin/X (0x400000+0x115ec3) [0x515ec3]
[ 13271.569] 6: /lib/libpthread.so.0 (0x7fe46ad11000+0xfb50) [0x7fe46ad20b50]
[ 13271.569] 7: /lib/libc.so.6 (ioctl+0x7) [0x7fe469d39877]
[ 13271.569] 8: /lib/libdrm_intel.so.1 (0x7fe467c87000+0x545d) [0x7fe467c8c45d]
[ 13271.569] 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fe467e92000+0x9f17) [0x7fe467e9bf17]
[ 13271.569] 10: /usr/bin/X (CallCallbacks+0x4c) [0x431eac]
[ 13271.570] 11: /usr/bin/X (FlushAllOutput+0x2c) [0x466d0c]
[ 13271.570] 12: /usr/bin/X (0x400000+0x28ecd) [0x428ecd]
[ 13271.570] 13: /usr/bin/X (0x400000+0x217cb) [0x4217cb]

Now it looks really like hang with openarena which is easier to reproduce.
Here is i915_gem_interrup:
Interrupt enable:    02028c53
Interrupt identity:  00000000
Interrupt mask:      fffc53ae
Pipe A stat:         00040202
Pipe B stat:         00000000
Interrupts received: 733626
Current sequence:    1159032
Waiter sequence:     0
IRQ sequence:        0

but even this is not constant. I run a loop every second:
cat gem_interr_3 | grep "Pipe A"
Pipe A stat:         00040202
Pipe A stat:         00040202
Pipe A stat:         00040000
Pipe A stat:         00040202
Pipe A stat:         00040202
Pipe A stat:         00040000
Pipe A stat:         00040202
Pipe A stat:         00040202
Pipe A stat:         00040202

I forgot to post my hardware:
intel_stepping 
Vendor: 0x8086, Device: 0x2e22, Revision: 0x03 (A3)
Intel DG45ID board, 4G RAM, Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz

xrandr 
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
VGA1 disconnected (normal left inverted right x axis y axis)
HDMI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)
HDMI2 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
   1920x1080      60.0*+
   1680x1050      60.0  
   1280x1024      75.0     60.0  
   1440x900       75.0     59.9  
   1280x960       60.0  
   1152x864       75.0  
   1280x720       50.0     60.0  
   1024x768       75.1     70.1     60.0  
   832x624        74.6  
   800x600        72.2     75.0     60.3     56.2  
   720x576        50.0  
   720x480        59.9  
   640x480        72.8     75.0     66.7     60.0  
   720x400        70.1  
DP2 disconnected (normal left inverted right x axis y axis)
Comment 6 Oleksij Rempel 2010-08-09 10:13:03 UTC
Created attachment 37734 [details]
reg_dump on video hang
Comment 7 Oleksij Rempel 2010-08-09 10:13:54 UTC
Created attachment 37735 [details]
dmesg on video hang
Comment 8 Oleksij Rempel 2010-08-09 10:16:40 UTC
I posted to mailing question about some flickering i see some times. Jesse pointed me to possible fifo underruns. Is it possible this bugs are connected?
Comment 9 Chris Wilson 2010-08-09 10:30:38 UTC
Hangs are highly unlikely as a result of FIFO underruns, there are two quite distinct functions of the chip. The FIFO is part of the display engine, a chunk of memory it uses to feed the encoders, if it underruns you just see a flicker (or worse the output may lose sync) but the render engine should be unaffected.
Comment 10 Chris Wilson 2010-08-09 10:38:07 UTC
I think it is a page-flipping hang, the last set of fixes went into 2.6.35-rc6, and so should be included in your kernel. Hmm.
Comment 11 Chris Wilson 2010-09-12 02:05:01 UTC
(In reply to comment #5)
> Ok, i get it. This time it hang on video but instead of ".. Suspending AIGLX.."
> i get:
> [ 13271.554] [mi] EQ overflowing. The server is probably stuck in an infinite
> loop.
> [ 13271.554] 
> Backtrace:
> [ 13271.569] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4a0c68]
> [ 13271.569] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49c1c4]
> [ 13271.569] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x47d7f4]
> [ 13271.569] 3: /usr/lib/xorg/modules/input/evdev_drv.so
> (0x7fe466ad4000+0x423f) [0x7fe466ad823f]
> [ 13271.569] 4: /usr/bin/X (0x400000+0x67937) [0x467937]
> [ 13271.569] 5: /usr/bin/X (0x400000+0x115ec3) [0x515ec3]
> [ 13271.569] 6: /lib/libpthread.so.0 (0x7fe46ad11000+0xfb50) [0x7fe46ad20b50]
> [ 13271.569] 7: /lib/libc.so.6 (ioctl+0x7) [0x7fe469d39877]
> [ 13271.569] 8: /lib/libdrm_intel.so.1 (0x7fe467c87000+0x545d) [0x7fe467c8c45d]

If you do see this again, can you check the dmesg for any warnings. Often these are accompanied by a kernel OOPS. (Though it might also be the page-fault-of-doom which is fixed in 2.6.36.)

Can you scan back through /var/log/messages* for any oopses?
Comment 12 Chris Wilson 2012-05-09 02:31:53 UTC
Ah hah! Should be fixed in 2.19.0

commit b817200371bfe16f44b879a793cf4a75ad17bc5c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 17 17:54:58 2012 +0100

    Don't issue a scanline wait while VT switched
    
    Be paranoid and check that we own the VT before emitting a scanline
    wait. If we attempt to wait on a fb/pipe that we do not own, we may
    issue an illegal command and cause a lockup.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.