Bug 27497

Summary: xorg crashes after update to 1.8.0
Product: xorg Reporter: Fryderyk Dziarmagowski <fdziarmagowski>
Component: Driver/intelAssignee: Kristian Høgsberg <krh>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: chris, mrgrim, remi, rmcauley, tsdh, vcunat
Version: 7.5 (2009.10)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
simple setup
none
Check if !clientGone before writing swap event
none
Check for null client->osPrivate in DRI2 none

Description Fryderyk Dziarmagowski 2010-04-06 10:33:21 UTC
Created attachment 34721 [details]
Xorg.0.log

How to reproduce:
switch between 3d screensavers in gnome-screensaver preferences dialog

Program received signal SIGSEGV, Segmentation fault.
WriteToClient (who=0xa63e6b8, count=32, __buf=0xbf944a1c) at io.c:702
702	    ConnectionOutputPtr oco = oc->output;
(gdb) thread apply all bt

Thread 1 (Thread 0xb76ec9d0 (LWP 2861)):
#0  WriteToClient (who=0xa63e6b8, count=32, __buf=0xbf944a1c) at io.c:702
#1  0x0807bff8 in WriteEventsToClient (pClient=0xa63e6b8, count=1, events=0xbf944a1c) at events.c:5770
#2  0xb7687897 in DRI2SwapEvent (client=0xa63e6b8, data=0xa31eb48, type=2, ust=1270574518783976, msc=27450, sbc=4658) at dri2ext.c:369
#3  0xb7686d69 in DRI2SwapComplete (client=0xa63e6b8, pDraw=0xa31eb48, frame=27450, tv_sec=1270574518, tv_usec=783976, type=2, swap_complete=0xa63e6b8, 
    swap_data=0xa31eb48) at dri2.c:546
#4  0xb7659247 in I830DRI2FrameEventHandler (frame=27450, tv_sec=1270574518, tv_usec=783976, event_data=0xa8b6820) at i830_dri.c:562
#5  0xb7654a5a in drmmode_vblank_handler (fd=8, frame=27450, tv_sec=1270574518, tv_usec=783976, event_data=0xa8b6820) at drmmode_display.c:1400
#6  0xb76769d6 in drmHandleEvent (fd=8, evctx=0x8c2a140) at xf86drmMode.c:776
#7  0xb76549be in drm_wakeup_handler (data=0x8c2a130, err=1, p=0x81e6120) at drmmode_display.c:1425
#8  0x0807980c in WakeupHandler (result=1, pReadmask=0x81e6120) at dixutils.c:403
#9  0x080a4c17 in WaitForSomething (pClientsReady=0xa2f36a0) at WaitFor.c:232
#10 0x0806d3de in Dispatch () at dispatch.c:375
#11 0x08066795 in main (argc=9, argv=0xbf945414, envp=0xbf94543c) at main.c:286

setup:
Integrated Graphics Chipset: Intel(R) G45/G43
xserver 1.8.0
Mesa 7.8.0
intel driver 2.11.0
linux 2.6.33.2

there is nothing special in Xorg.0.log, nothing in kernel log and i915_error_state stays calm. 

I'm not sure it is related but since upgrade I got hard lock ups :(
hard days for intel users... (well, see my other bugs for details ;)
Comment 1 Fryderyk Dziarmagowski 2010-04-06 10:33:51 UTC
Created attachment 34722 [details]
simple setup
Comment 2 Fryderyk Dziarmagowski 2010-04-08 08:12:43 UTC
removing Mesa dri driver (i965_dri.so) brings stability to my system back (Xorg server crash is no more). Even mentioned hard lock goes away (a deep one, even nmi watchdog does not help).
Comment 3 Fryderyk Dziarmagowski 2010-04-13 06:55:01 UTC
The easiest way to reproduce the crash is "full screen preview" in gnome-screensaver (a 3d one)
Comment 4 Tassilo Horn 2010-04-13 22:46:39 UTC
I suffer from the same problem since when I update to the xf86-video-intel-2.11.0 driver.  After downgrading to 2.10.0, the system is stable again.

The bug has occured for some Gentoo users.  See the bugs

  http://bugs.gentoo.org/show_bug.cgi?id=314935
  http://bugs.gentoo.org/show_bug.cgi?id=310829
Comment 5 Rémi Cardona 2010-04-13 23:30:20 UTC
Please also attach the output of dmesg after the first crash.

Thanks
Comment 6 Fryderyk Dziarmagowski 2010-04-14 09:30:07 UTC
(In reply to comment #4)
> I suffer from the same problem since when I update to the
> xf86-video-intel-2.11.0 driver.  After downgrading to 2.10.0, the system is
> stable again.
> 
> The bug has occured for some Gentoo users.  See the bugs
> 
>   http://bugs.gentoo.org/show_bug.cgi?id=314935
>   http://bugs.gentoo.org/show_bug.cgi?id=310829

First one is a gpu lock up, seems unrelated to this bug (xserver crash)

The second one perfectly matches "hard lockup" I've mentioned before, but it should be a separate bug (I'm about to open a new soon)
Comment 7 Fryderyk Dziarmagowski 2010-04-14 13:14:17 UTC
reported the freeze: https://bugs.freedesktop.org/show_bug.cgi?id=27647
Comment 8 Fryderyk Dziarmagowski 2010-04-18 02:40:26 UTC
Downgrading to xf86-video-intel-2.10.x solves the problem
Comment 9 Fryderyk Dziarmagowski 2010-05-04 02:47:42 UTC
Unfortunately 1.8.0.902 still crashes with xf86-video-intel-2.11.0

Some related changes are present in Fedora 13 already
(mentioned in #27767#c8) but I not sure what they fixed (driver? xorg?)
Comment 10 Chris Wilson 2010-05-15 11:26:59 UTC
Created attachment 35677 [details] [review]
Check if !clientGone before writing swap event
Comment 11 Fryderyk Dziarmagowski 2010-05-16 01:38:27 UTC
With the patch from #10 the issue is still present. This time I was able to catch two different traces:

first catch:
Thread 1 (Thread 0xb77969d0 (LWP 7632)):
#0  0xffffe424 in __kernel_vsyscall ()
#1  0x4feae225 in __libc_writev (fd=<value optimized out>, vector=<value optimized out>, count=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/writev.c:51
#2  0x0809e216 in _XSERVTransSocketWritev (ciptr=0xa639ac0, buf=0xbfb9b108, size=1) at /usr/include/X11/Xtrans/Xtranssock.c:2153
#3  0x0809da7c in _XSERVTransWritev (ciptr=0xa639ac0, buf=0xbfb9b108, size=1) at /usr/include/X11/Xtrans/Xtrans.c:912
#4  0x080a5963 in FlushClient (who=0xa8e6870, oc=0xa783188, __extraBuf=0x0, extraCount=0) at io.c:898
#5  0x0809c885 in CloseDownConnection (client=0xa8e6870) at connection.c:1037
#6  0x08068057 in CloseDownClient (client=0xa8e6870) at dispatch.c:3602
#7  0x0806d585 in Dispatch () at dispatch.c:450
#8  0x080667e5 in main (argc=9, argv=0xbfb9b334, envp=0xbfb9b35c) at main.c:286

(gdb) thread apply all bt

second catch
Thread 1 (Thread 0xb77979d0 (LWP 23996)):
#0  WriteToClient (who=0xb6113b8, count=32, __buf=0xbfbd824c) at io.c:702
#1  0x0807c0ce in WriteEventsToClient (pClient=0xb6113b8, count=1, events=0xbfbd824c) at events.c:5774
#2  0xb779dc04 in DRI2SwapEvent (client=0xb6113b8, data=0xb613bb8, type=2, ust=1273994865087416, msc=218032, sbc=182) at dri2ext.c:372
#3  0xb779d136 in DRI2SwapComplete (client=0xb6113b8, pDraw=0xb613bb8, frame=218032, tv_sec=1273998860, tv_usec=909609, type=2, 
    swap_complete=0xb779db6f <DRI2SwapEvent>, swap_data=0xb613bb8) at dri2.c:573
#4  0xb77153d7 in I830DRI2FrameEventHandler (frame=218032, tv_sec=1273998860, tv_usec=909609, event_data=0xb513530) at i830_dri.c:562
#5  0xb7710bfe in drmmode_vblank_handler (fd=8, frame=218032, tv_sec=1273998860, tv_usec=909609, event_data=0xb513530) at drmmode_display.c:1400
#6  0x4f3659d6 in drmHandleEvent (fd=<value optimized out>, evctx=<value optimized out>) at xf86drmMode.c:776
#7  0xb7710b62 in drm_wakeup_handler (data=0x98c1140, err=2, p=0x81e61a0) at drmmode_display.c:1425
#8  0x080798bc in WakeupHandler (result=2, pReadmask=0x81e61a0) at dixutils.c:403
#9  0x080a4d77 in WaitForSomething (pClientsReady=0xb1f0910) at WaitFor.c:232
#10 0x0806d42e in Dispatch () at dispatch.c:375
#11 0x080667e5 in main (argc=9, argv=0xbfbd8c64, envp=0xbfbd8c8c) at main.c:286
Comment 12 Fryderyk Dziarmagowski 2010-05-16 01:44:11 UTC
and this is the bitter end of a Xorg server (1.8.1)

(gdb) continue 
Continuing.

Program received signal SIGABRT, Aborted.
0xffffe424 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe424 in __kernel_vsyscall ()
#1  0x4fe09e19 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x4fe0b48c in *__GI_abort () at abort.c:92
#3  0x080a1e66 in OsAbort () at utils.c:1321
#4  0x080aa767 in ddxGiveUp () at xf86Init.c:1238
#5  0x080aa835 in AbortDDX () at xf86Init.c:1284
#6  0x0809b45e in AbortServer () at log.c:418
#7  0x0809ba7e in FatalError (f=0x81b15f4 "Caught signal %d (%s). Server aborting\n") at log.c:546
#8  0x0809b0b8 in OsSigHandler (signo=11, sip=0xbfbd7dcc, unused=0xbfbd7e4c) at osinit.c:156
#9  <signal handler called>
#10 WriteToClient (who=0xb6113b8, count=32, __buf=0xbfbd824c) at io.c:702
#11 0x0807c0ce in WriteEventsToClient (pClient=0xb6113b8, count=1, events=0xbfbd824c) at events.c:5774
#12 0xb779dc04 in DRI2SwapEvent (client=0xb6113b8, data=0xb613bb8, type=2, ust=1273994865087416, msc=218032, sbc=182) at dri2ext.c:372
#13 0xb779d136 in DRI2SwapComplete (client=0xb6113b8, pDraw=0xb613bb8, frame=218032, tv_sec=1273998860, tv_usec=909609, type=2, 
    swap_complete=0xb779db6f <DRI2SwapEvent>, swap_data=0xb613bb8) at dri2.c:573
#14 0xb77153d7 in I830DRI2FrameEventHandler (frame=218032, tv_sec=1273998860, tv_usec=909609, event_data=0xb513530) at i830_dri.c:562
#15 0xb7710bfe in drmmode_vblank_handler (fd=8, frame=218032, tv_sec=1273998860, tv_usec=909609, event_data=0xb513530) at drmmode_display.c:1400
#16 0x4f3659d6 in drmHandleEvent (fd=<value optimized out>, evctx=<value optimized out>) at xf86drmMode.c:776
#17 0xb7710b62 in drm_wakeup_handler (data=0x98c1140, err=2, p=0x81e61a0) at drmmode_display.c:1425
#18 0x080798bc in WakeupHandler (result=2, pReadmask=0x81e61a0) at dixutils.c:403
#19 0x080a4d77 in WaitForSomething (pClientsReady=0xb1f0910) at WaitFor.c:232
#20 0x0806d42e in Dispatch () at dispatch.c:375
#21 0x080667e5 in main (argc=9, argv=0xbfbd8c64, envp=0xbfbd8c8c) at main.c:286
Comment 13 Chris Wilson 2010-06-05 12:01:38 UTC
*** Bug 28391 has been marked as a duplicate of this bug. ***
Comment 14 Christopher James Halse Rogers 2010-06-17 18:53:08 UTC
Created attachment 36353 [details] [review]
Check for null client->osPrivate in DRI2

Here's an almost certainly wrong patch that works for me.  It's based on the observation that WriteToClient is SIGSEGVing when accessing osPrivate->output since who->osPrivate is NULL in my backtraces.

I'm not yet sure where this bad ClientPtr comes from.
Comment 15 Christopher James Halse Rogers 2010-06-17 19:52:59 UTC
Bah.  Should read the bug more thoroughly before wandering through code.

The !clientGone patch fixes the crash here.

Tested-By: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Comment 16 Chris Wilson 2010-07-03 01:08:21 UTC
Is the patch still required after the following commit? (I think on consistency grounds that either all dri2 functions check for clientGone or none do.)

commit 660f6ab5494a728c3ca7ba00c305e9ff06c8ecb2
Author: Simon Farnsworth <simon.farnsworth@onelan.com>
Date:   Tue Jun 22 10:13:30 2010 +0100

    Don't crash when asked if a client that has disconnected was local
    
    ProcDRI2Dispatch uses LocalClient to determine if it's safe to respond
    to a client that has made DRI2 requests which aren't sensible for
    remote clients (anything but version). When the client has disappeared
    mid-request stream (e.g. as a result of a kill -9, or a client-side
    bug), LocalClient causes the X server to follow suit, as
    ((OsCommPtr)client->osPrivate)->trans_conn is NULL at this point.
    
    The simple and obvious fix is to just return "not local" when
    trans_conn is NULL, which fixes the crash I was seeing; however Keith
    Packard pointed out that just checking trans_conn isn't enough;
    quoting Keith:
    
    "This looks almost right to me -- I reviewed the os code to see when
    _XSERVTransClose is called (which is what frees the trans_conn data) and
    found that every place which called that immediately set trans_conn to
    NULL, except for the call in CloseDownFileDescriptor which is only
    called from CloseDownConnection and which is immediately followed by
    freeing the OsCommRec and setting client->osPrivate to NULL. So, I'd
    suggest checking client->osPrivate in addition to the above check."
h
Comment 17 Christopher James Halse Rogers 2010-07-05 22:10:42 UTC
It looks like that commit should obsolete the patches.  I'll build 1.9RC4 and check.
Comment 18 Christopher James Halse Rogers 2010-07-14 22:14:35 UTC
The commit 660f6ab5494a728c3ca7ba00c305e9ff06c8ecb2 does fix this without the need for any further patch.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.