Bug 16292

Summary: "Idle timed out, resetting engine" when running compiz.
Product: xorg Reporter: Peter Hutterer <peter.hutterer>
Component: Server/Ext/DRIAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 10101    
Attachments:
Description Flags
syslog with debug enabled (tail -f /var/log/syslog > syslog.out)
none
Switch to X server DRI context for GetImage none

Description Peter Hutterer 2008-06-10 03:44:18 UTC
[Moved from Bug 4989 to here]
Xorg.log 
http://bugs.freedesktop.org/attachment.cgi?id=17025

I see this bug on git master when running compiz and either starting a new
window or resizing a window. Doesn't happen when moving a window around.

lspci says:
00:08.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO]
(rev 01)

The last messages are loads of the following, requiring a hard reset.

(EE) RADEON(0): RADEONWaitForIdleCP: CP idle -22
(EE) RADEON(0): Idle timed out, resetting engine...

I added a return statement into RADEONWaitForIdle so the machine doesn't lock
up anymore, but the timout message is of course still there.

whot@hyena:~/X11R7/driver/xf86-video-ati$> git diff
diff --git a/src/radeon_commonfuncs.c b/src/radeon_commonfuncs.c
index 58fe306..dbdaaf5 100644
--- a/src/radeon_commonfuncs.c
+++ b/src/radeon_commonfuncs.c
@@ -705,6 +705,7 @@ void FUNC_NAME(RADEONWaitForIdle)(ScrnInfoPtr pScrn)

            xf86DrvMsg(pScrn->scrnIndex, X_ERROR,
                       "Idle timed out, resetting engine...\n");
+            return;
            RADEONEngineReset(pScrn);
            RADEONEngineRestore(pScrn);


Comment from Michel Daenzer in Bug 4989:
> You're saying this change keeps things working despite the timeout messages? > If so, the timeout is probably just too short and needs to be extended. 
> Ideally, something like radeon_init_timeout()/radeon_timedout() should be 
> used, to make the actual timeout duration independent of how quickly the 
> various loops are processed.

Multiplying the timeout by 100 doesn't really change the situation, haven't looked at radeon_init_timeout yet.
Comment 1 Michel Dänzer 2008-06-11 02:50:26 UTC
Actually the timeout doesn't matter, the problem is that the ioctl returns EINVAL... is there any corresponding kernel output? If not, is there any if you write 1 to /sys/module/drm/parameters/debug before reproducing the problem? Can you get a backtrace of the X server when the ioctl fails?
Comment 2 Michel Dänzer 2008-06-11 03:12:28 UTC
Oh, and are you using the DRM kernel modules from drm Git or those included in the kernel?
Comment 3 Michel Dänzer 2008-06-11 03:43:53 UTC
BTW, is MPX making sure software cursor rendering is never done from a SIGIO handler? That's not supported and could cause these symptoms.
Comment 4 Peter Hutterer 2008-06-11 22:21:28 UTC
On Wed, Jun 11, 2008 at 03:43:54AM -0700, bugzilla-daemon@freedesktop.org wrote:
> --- Comment #3 from Michel Dänzer <michel@tungstengraphics.com>  2008-06-11 03:43:53 PST ---
> BTW, is MPX making sure software cursor rendering is never done from a SIGIO
> handler? That's not supported and could cause these symptoms.

No, shouldn't happen. If we sw-render the cursors during SIGIO, timeouts are
the least of our problems.
Comment 5 Peter Hutterer 2008-06-11 23:32:16 UTC
Created attachment 17070 [details]
syslog with debug enabled (tail -f /var/log/syslog > syslog.out)

drm/drivers, server, everything from git.

without debug enabled, the only message appearing is:
[ 4893.520000] [drm:radeon_cp_idle] *ERROR* radeon_cp_idle called without lock held, held  0 owner da1d9100 da1d9100
Comment 6 Peter Hutterer 2008-06-11 23:38:32 UTC
Breakpoint 1, RADEONWaitForIdleCP (pScrn=0x826a948) at radeon_commonfuncs.c:706
706                 xf86DrvMsg(pScrn->scrnIndex, X_ERROR,
(gdb) bt
#0  RADEONWaitForIdleCP (pScrn=0x826a948) at radeon_commonfuncs.c:706
#1  0xb77727e8 in XAAGetImage (pDraw=0x89cc7f8, sx=0, sy=0, w=486, h=383, 
    format=2, planemask=4294967295, pdstLine=0xa5cca008 "") at xaaInit.c:262
#2  0xb77c527b in cwGetImage (pSrc=0x89cc7f8, x=0, y=0, w=486, h=383, 
    format=2, planemask=4294967295, pdstLine=0xa5cca008 "") at cw.c:354
#3  0x08152de2 in miSpriteGetImage (pDrawable=0x89cc7f8, sx=0, sy=0, w=486, 
    h=383, format=2, planemask=4294967295, pdstLine=0xa5cca008 "")
    at misprite.c:321
#4  0xb7a2c257 in __glXDRIbindTexImage (baseContext=0x84a8030, buffer=8414, 
    glxPixmap=0x89ccd60) at glxdri.c:488
#5  0xb7a17a27 in __glXDisp_BindTexImageEXT (cl=0x84aeb4c, pc=0x89c5518 "F")
    at glxcmds.c:1540
#6  0xb7a19168 in __glXDisp_VendorPrivate (cl=0x84aeb4c, 
    pc=0x89c550c "\233\020\006") at glxcmds.c:2255
#7  0xb7a1e1aa in __glXDispatch (client=0x84a8fc8) at glxext.c:492
#8  0x0808c712 in Dispatch () at dispatch.c:448
#9  0x08071ed4 in main (argc=1, argv=0xbfb40cf4, envp=0xbfb40cfc) at main.c:415


Full backtrace:

(gdb) bt full
#0  RADEONWaitForIdleCP (pScrn=0x826a948) at radeon_commonfuncs.c:706
        ret = -22
        info = (RADEONInfoPtr) 0x826ae88
        RADEONMMIO = (
    unsigned char *) 0xb78c1000 <Address 0xb78c1000 out of bounds>
        i = 0
        __FUNCTION__ = "RADEONWaitForIdleCP"
#1  0xb77727e8 in XAAGetImage (pDraw=0x89cc7f8, sx=0, sy=0, w=486, h=383, 
    format=2, planemask=4294967295, pdstLine=0xa5cca008 "") at xaaInit.c:262
        infoRec = (XAAInfoRecPtr) 0x829fa20
        pScreen = (ScreenPtr) 0x8275f58
        infoRec = (XAAInfoRecPtr) 0x829fa20
        pScrn = (ScrnInfoPtr) 0x826a948
#2  0xb77c527b in cwGetImage (pSrc=0x89cc7f8, x=0, y=0, w=486, h=383, 
    format=2, planemask=4294967295, pdstLine=0xa5cca008 "") at cw.c:354
        pScreen = (ScreenPtr) 0x8275f58
        pBackingDrawable = (DrawablePtr) 0x89cc7f8
        src_off_x = 0
        src_off_y = 0
#3  0x08152de2 in miSpriteGetImage (pDrawable=0x89cc7f8, sx=0, sy=0, w=486, 
    h=383, format=2, planemask=4294967295, pdstLine=0xa5cca008 "")
    at misprite.c:321
        pScreen = (ScreenPtr) 0x8275f58
        pScreenPriv = (miSpriteScreenPtr) 0x829f670
        pDev = (DeviceIntPtr) 0x0
        pCursorInfo = (miCursorInfoPtr) 0x84d38c8
#4  0xb7a2c257 in __glXDRIbindTexImage (baseContext=0x84a8030, buffer=8414, 
    glxPixmap=0x89ccd60) at glxdri.c:488
        pitch = 1944
        data = (void *) 0xa5cca008
        pRegion = (RegionPtr) 0x0
        pixmap = (PixmapPtr) 0x89cc7f8
        bpp = 4
        override = 0
        texname = 8
        format = 32993
        type = 5121
        pScreen = (ScreenPtr) 0x8275f58
        driDraw = (__GLXDRIdrawable *) 0x89ccd60
        screen = (__GLXDRIscreen * const) 0x83a4ed0
        __func__ = "__glXDRIbindTexImage"
#5  0xb7a17a27 in __glXDisp_BindTexImageEXT (cl=0x84aeb4c, pc=0x89c5518 "F")
    at glxcmds.c:1540
        req = (xGLXVendorPrivateReq *) 0x89c550c
        client = (ClientPtr) 0x84a8fc8
        context = (__GLXcontext *) 0x84a8030
        pGlxDraw = (__GLXdrawable *) 0x89ccd60
        drawId = 4194374
        buffer = 8414
        error = -1214169736
#6  0xb7a19168 in __glXDisp_VendorPrivate (cl=0x84aeb4c, 
    pc=0x89c550c "\233\020\006") at glxcmds.c:2255
        req = (xGLXVendorPrivateReq *) 0x89c550c
        vendorcode = 1330
        proc = (
    __GLXdispatchVendorPrivProcPtr) 0xb7a1791a <__glXDisp_BindTexImageEXT>
#7  0xb7a1e1aa in __glXDispatch (client=0x84a8fc8) at glxext.c:492
        rendering = 0 '\0'
        stuff = (xGLXSingleReq *) 0x89c550c
        opcode = 16 '\020'
        proc = (
    __GLXdispatchSingleProcPtr) 0xb7a1910f <__glXDisp_VendorPrivate>
        cl = (__GLXclientState *) 0x84aeb4c
        retval = 0
#8  0x0808c712 in Dispatch () at dispatch.c:448
        clientReady = (int *) 0x85bdff8
        result = 0
        client = (ClientPtr) 0x84a8fc8
        nready = 0
        icheck = (HWEventQueuePtr *) 0x824f3f0
        start_tick = 780
#9  0x08071ed4 in main (argc=1, argv=0xbfb40cf4, envp=0xbfb40cfc) at main.c:415
        i = 1
        j = 2
        k = 2
        xauthfile = 0x0
        alwaysCheckForInput = {0, 1}
Comment 7 Michel Dänzer 2008-06-11 23:51:31 UTC
Created attachment 17071 [details] [review]
Switch to X server DRI context for GetImage

Thanks for the information. Does this xserver patch fix it?
Comment 8 Peter Hutterer 2008-06-12 00:24:51 UTC
yep, patch fixes it. I still get the timeouts, but it doesnt lock up anymore.
Thanks.
Comment 9 Michel Dänzer 2008-06-12 00:29:21 UTC
(In reply to comment #8)
> yep, patch fixes it. I still get the timeouts, but it doesnt lock up anymore.

Weird. Did you revert your driver workaround before testing? Are you still getting messages like

[drm:radeon_cp_idle] *ERROR* radeon_cp_idle called without lock held, held  0 owner da1d9100 da1d9100

in the kernel output?
Comment 10 Peter Hutterer 2008-06-12 00:41:06 UTC
> --- Comment #9 from Michel Dänzer <michel@tungstengraphics.com>  2008-06-12 00:29:21 PST ---
> Weird. Did you revert your driver workaround before testing? Are you still
> getting messages like
> 
> [drm:radeon_cp_idle] *ERROR* radeon_cp_idle called without lock held, held  0
> owner da1d9100 da1d9100
> 
> in the kernel output?

Sorry, I should have been more clear: I still notice about 1 sec between
releasing the button and the window updating, but no error is printed.
My hack is reverted. With it I get the printed error.
Comment 11 Michel Dänzer 2008-06-12 01:14:34 UTC
(In reply to comment #10)
> Sorry, I should have been more clear: I still notice about 1 sec between
> releasing the button and the window updating, but no error is printed.

Could be XAA suckiness, is it better with EXA?

> My hack is reverted. With it I get the printed error.

Huh, I'm still confused then. Which error exactly?
Comment 12 Michel Dänzer 2008-06-12 01:16:28 UTC
Also, does the ioctl still fail with XAA? If so, is the backtrace still the same?
Comment 13 Peter Hutterer 2008-06-12 01:25:42 UTC
> Could be XAA suckiness, is it better with EXA?
yes, works fine now. thanks.

> Huh, I'm still confused then. Which error exactly?
nevermind, can't reproduce it now.
Comment 14 Peter Hutterer 2008-06-12 17:51:17 UTC
EXA:
- without your patch: no problems, no errors in Xorg.log, no perceived
timeouts.
- without your patch: no problems, no errors in Xorg.log, no perceived
timeouts

XAA:
- without your patch: the infinite loop as reported in comment #1.
- without your patch, with my "return;": error messages as reported in comment #1

- with your patch: no error messages in the Xorg.log, but a perceived timeout
  of ~1 sec between resizing and the actual update. Since no error is printed,
  this timeout is not fatal like without the patch.
- with your patch, with my "return;": no error messages, but a perceived
  timeout of ~1 sec between resizing and the actual update. Since no error is
  printed, this timeout is not fatal like without the patch.

One thing worth noting is that with XAA, gnome applications don't display much
more beyond the background colours until resized. gnome-calculator for example
only shows a grey and a white rectangle, no buttons etc. xterm doesn't always
render properly either, leaving half the window white. gnome-calculator
doesn't accept mouse clicks on the buttons. 
with EXA, that seems to be fine.

Do you want a syslog for any of those situations?
Comment 15 Michel Dänzer 2008-06-12 23:16:58 UTC
What matters is that the ioctl no longer fails with the patch with XAA.

XAA is known to be broken with compiz without Option "XaaNoOffscreenPixmaps".
Comment 16 Michel Dänzer 2008-06-13 02:35:12 UTC
Based on your previous comment, I'll assume that the patch does indeed fix the problem and pushed it, see below. Leaving the bug open though and making it a blocker for xserver 1.5, as this is a regression from 1.4.

commit 23b55a61f89f69454a3b0e3413b1f07d5fdf43aa
Author: Michel Dänzer <michel@tungstengraphics.com>
Date:   Fri Jun 13 11:13:56 2008 +0200

    AIGLX/DRI1: Switch to server context for calling pScreen->GetImage.
Comment 17 Adam Jackson 2008-06-19 13:33:06 UTC
Cherry picked (more or less) into server-1.5-branch, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.