Bug 16648 - [RFE] need automatically reset driver after X crash (vbetool post doesn't work sometimes)
[RFE] need automatically reset driver after X crash (vbetool post doesn't wor...
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
unspecified
Other All
: low enhancement
Assigned To: Eric Anholt
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-09 02:37 UTC by Zdenek Kabelac
Modified: 2009-10-23 14:24 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Working startup of Xorg (480.89 KB, text/plain)
2008-07-09 02:37 UTC, Zdenek Kabelac
no flags Details
Failing startup log (37.30 KB, text/plain)
2008-07-09 02:39 UTC, Zdenek Kabelac
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Kabelac 2008-07-09 02:37:57 UTC
Created attachment 17601 [details]
Working startup of Xorg

Hi

My server is often crashing with OpenGL application.
Why I press   SysRq+K  and as a root run  vbetool post
I get back to console. However I had to reboot anyway because the Xorg doesn't start again.


I'm attaching my two Xorg logs - one where the chip is initialized and works properly. The second is when the chip is blocked.
Comment 1 Zdenek Kabelac 2008-07-09 02:39:46 UTC
Created attachment 17602 [details]
Failing startup log 

This log is from startx  executed after vbetool post
Comment 2 Gordon Jin 2008-07-09 06:25:33 UTC
Why do you want to use vbetool? It is not recommended now since suspend/resume should work with the latest drm.

Comment 3 Zdenek Kabelac 2008-07-09 07:26:57 UTC
I use vbetool  for 'post mortem' case - when OpenGL application basicaly deadlocks Xserver - as you might actually see the forever repeating end lines of the first attachement:

mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

For sure fixing the vbetool doesn't help the xserver deadlock bug - but at least it would made faster debuging - so I do not have to reboot full machine between the retries for the begining.

So basicaly when my Xserver deadlock - the only 'safe' way I know so far - is to press  "MagicSysRQ + K" and then run 'vbetool post' to get back usable console screen - however Xserver cannot be restarted in this case anymore. I believe, there should be a way to 'unlock' GPU via software without the reboot - or am I wrong ?
Comment 4 Zdenek Kabelac 2008-07-11 14:37:59 UTC
Here is the backtrace of the EXA Xserver started after the crash -

Also I've even tried to reinsert  drm & i915 kernel modules - without any difference.


Program terminated with signal 11, Segmentation fault.
[New process 3987]
#0  DRILock (pScreen=0x9ffc50, flags=0) at dri.c:2181
2181        if (!*pDRIPriv->pLockRefCount) {


#0  DRILock (pScreen=0x9ffc50, flags=0) at dri.c:2181
#1  0x00007f4a9f40ae17 in I830LeaveVT (scrnIndex=<value optimized out>, 
    flags=<value optimized out>) at i830_driver.c:3207
#2  0x00000000004611dd in AbortDDX () at xf86Init.c:1302
#3  0x00000000004f1188 in AbortServer () at log.c:406
#4  0x00000000004f1855 in FatalError (f=0x7f4a9f437024 "lockup\n") at log.c:552
#5  0x00007f4a9f400f71 in I830WaitLpRing (pScrn=0x9f75f0, n=131064, 
    timeout_millis=0) at i830_accel.c:150
#6  0x00007f4a9f40134b in i830_wait_ring_idle () at i830.h:862
#7  I830Sync (pScrn=0x9f75f0) at i830_accel.c:204
#8  0x00007f4a9e13633c in exaWaitSync (pScreen=0x9ffc50) at exa.c:1045
#9  0x00007f4a9f405cfc in i830_crtc_lock (crtc=0x9fc790) at i830_display.c:886
#10 0x00000000004a193c in xf86CrtcSetMode (crtc=0x9fc790, mode=0x7f4a9f64c3e0, 
    rotation=1, x=0, y=0) at xf86Crtc.c:253
#11 0x00007f4a9f40697c in i830GetLoadDetectPipe (output=0x9fd270, 
    mode=0x7fffa8522280, dpms_mode=0x7fffa8522694) at i830_display.c:1639
#12 0x00007f4a9f401c9c in i830_crt_detect (output=0xa63910) at i830_crt.c:355
#13 0x00000000004a22a5 in xf86ProbeOutputModes (scrn=0x9f75f0, maxX=1680, 
    maxY=1680) at xf86Crtc.c:1378
#14 0x00000000004a92bc in xf86RandR12GetInfo12 (pScreen=0x9ffc50, 
    rotations=<value optimized out>) at xf86RandR12.c:1011
#15 0x000000000051f873 in RRGetInfo (pScreen=0x9ffc50) at rrinfo.c:196
#16 0x0000000000523ad9 in ProcRRGetScreenResources (client=0xc71090)
    at rrscreen.c:345
#17 0x00000000004467d4 in Dispatch () at dispatch.c:454
#18 0x000000000042cc4d in main (argc=4, argv=0x7fffa8522ae8, 
    envp=<value optimized out>) at main.c:441





And I've tried similar thing with XAA



#0  DRIUnlock (pScreen=0x9ffbe0) at dri.c:2202
#1  0x00007ff13de4ff3a in I830WaitLpRing (pScrn=0x9f75c0, n=131064, 
    timeout_millis=0) at i830_accel.c:140
#2  0x00007ff13de5034b in i830_wait_ring_idle () at i830.h:862
#3  I830Sync (pScrn=0x9f75c0) at i830_accel.c:204
#4  0x00007ff13cb3a577 in XAALeaveVT (index=0, flags=0) at xaaInit.c:524
#5  0x00000000004611dd in AbortDDX () at xf86Init.c:1302
#6  0x00000000004f1188 in AbortServer () at log.c:406
#7  0x00000000004f1855 in FatalError (f=0x7ff13de86024 "lockup\n") at log.c:552
#8  0x00007ff13de4ff71 in I830WaitLpRing (pScrn=0x9f75c0, n=131064, 
    timeout_millis=0) at i830_accel.c:150
#9  0x00007ff13de5034b in i830_wait_ring_idle () at i830.h:862
#10 I830Sync (pScrn=0x9f75c0) at i830_accel.c:204
#11 0x00007ff13de5a153 in i830WaitSync (pScrn=0x9f75c0) at i830_driver.c:3683
#12 0x00007ff13de54cfc in i830_crtc_lock (crtc=0x9fc720) at i830_display.c:886
#13 0x00000000004a193c in xf86CrtcSetMode (crtc=0x9fc720, mode=0x7ff13e09b3e0, 
    rotation=1, x=0, y=0) at xf86Crtc.c:253
#14 0x00007ff13de5597c in i830GetLoadDetectPipe (output=0x9fd200, 
    mode=0x7fff46f71c70, dpms_mode=0x7fff46f720e4) at i830_display.c:1639
#15 0x00007ff13de50c9c in i830_crt_detect (output=0xa78c80) at i830_crt.c:355
#16 0x00000000004a22a5 in xf86ProbeOutputModes (scrn=0x9f75c0, maxX=1680, 
    maxY=1680) at xf86Crtc.c:1378
#17 0x00000000004a92bc in xf86RandR12GetInfo12 (pScreen=0x9ffbe0, 
    rotations=<value optimized out>) at xf86RandR12.c:1011
#18 0x000000000051f873 in RRGetInfo (pScreen=0x9ffbe0) at rrinfo.c:196
#19 0x0000000000523ad9 in ProcRRGetScreenResources (client=0xc86a40)
    at rrscreen.c:345
#20 0x00000000004467d4 in Dispatch () at dispatch.c:454
#21 0x000000000042cc4d in main (argc=4, argv=0x7fff46f72538, 
    envp=<value optimized out>) at main.c:441
Comment 5 Gordon Jin 2008-07-16 01:38:41 UTC
vbetool post generally works, but we don't want to rely on it. So we won't fix vbetool issue. 

But definitely we should add the new feature for driver so it can reset after X crash (without vbetool). This task will probably happen in kernel drm, but not quite soon.
Comment 6 Gordon Jin 2008-12-14 22:31:39 UTC
reassigning, since Eric is working on error recovery recently.
Comment 7 Eric Anholt 2009-10-23 14:24:31 UTC
With Ben Gamari's changes, we now reset the GPU on hang, which can often get things running again, without even crashing the server.

In some cases the hardware gets so wedged we can't reset, though.  Either way, we need to fix the problems instead of relying on reset.