Bug 28361

Summary:

"glresize" causes server segfault with single buffering.

Product:

xorg

Reporter:

Nick Bowler <nbowler>

Component:

Driver/intel

Assignee:

Kristian Høgsberg <krh>

Status:

RESOLVED FIXED

QA Contact:

Xorg Project Team <xorg-team>

Severity:

normal

Priority:

medium

CC:

chris

Version:

unspecified

Hardware:

Other

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Full server log.	none
Full server log, crash #2	none

Description Nick Bowler 2010-06-02 11:57:03 UTC

Created attachment 36014 [details]
Full server log.

So here's yet another problem related to my favourite test case, glresize,
attached to bug 27922.  I'm fairly certain this worked when I originally wrote
the test case, but I haven't yet been able to identify a specific component
which might have regressed.

Anyway, the scenario today is the --single option to glresize, which makes it
use single buffering.  After a few moments, the server inevitably segfaults
and prints the following:

  [ 14528.767] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
  [ 14528.767] (EE) intel(0): Disabling acceleration.
  [ 14528.788] 
  Backtrace:
  [ 14528.858] 0: /usr/bin/X (xorg_backtrace+0x28) [0x491818]
  [ 14528.858] 1: /usr/bin/X (0x400000+0x65ca9) [0x465ca9]
  [ 14528.858] 2: /lib/libpthread.so.0 (0x7f9df2dc9000+0xedf0) [0x7f9df2dd7df0]
  [ 14528.858] 3: /usr/local/lib/libdrm_intel.so.1 (drm_intel_bo_flink+0x0) [0x7f9defd60c60]
  [ 14528.858] 4: /usr/local/lib/xorg/modules/drivers/intel_drv.so (0x7f9deff6a000+0x2fdfd) [0x7f9deff99dfd]
  [ 14528.858] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x19e7) [0x7f9df01b99e7]
  [ 14528.858] 6: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x1fdb) [0x7f9df01b9fdb]
  [ 14528.858] 7: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2GetBuffersWithFormat+0x10) [0x7f9df01ba250]
  [ 14528.858] 8: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x3834) [0x7f9df01bb834]
  [ 14528.858] 9: /usr/bin/X (0x400000+0x2fc2c) [0x42fc2c]
  [ 14528.858] 10: /usr/bin/X (0x400000+0x24da5) [0x424da5]
  [ 14528.858] 11: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7f9df1d60a26]
  [ 14528.858] 12: /usr/bin/X (0x400000+0x24959) [0x424959]
  [ 14528.858] Segmentation fault at address 0x20
  [ 14528.858] 
  Fatal server error:
  [ 14528.858] Caught signal 11 (Segmentation fault). Server aborting

After this occurs, the console works fine but there is significant graphical
corruption and log noise if I try to restart X.  A reboot is required to fully
recover.

The attached log is from a machine with a G45, 2.6.34 kernel, xserver 1.8.1,
git libdrm/mesa/xf86-video-intel.  I've also reproduced it on a T500 laptop
with a GM45, running git kernel/xserver/libdrm/mesa/xf86-video-intel.

Comment 1 Chris Wilson 2010-06-02 12:34:48 UTC

* renames glresize to crashme.

Comment 2 Chris Wilson 2010-06-02 15:10:09 UTC

Not seeing a GPU hang from simply running glresize --single. Are you sure that it was the trigger, and not another application?

Comment 3 Nick Bowler 2010-06-02 18:14:25 UTC

Created attachment 36022 [details]
Full server log, crash #2

OK, I'm less sure about what the problem is (or was) now.  I can't reproduce
some of the things I remember seeing at all anymore.  However, it seems like
the hang which caused the original log is actually my fault by accidentally
running with the wrong mesa.  The bonus is that your commit, 6db1e523 ("dri:
Protect against NULL dereference following GPU hang."), has fixed this segfault
anyway.

But while we're at it, I now can produce a different segfault, this time I'm
*definitely* using the right mesa git master, by just repeatedly running and
ctrl+C'ing glresize enough times (when it finally goes, X crashes the moment
I press ctrl+C).  Occurs with both server 1.8.1 and git master...

  Backtrace:
  [   653.281] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4675e8]
  [   653.282] 1: /usr/bin/X (0x400000+0x67549) [0x467549]
  [   653.282] 2: /lib/libpthread.so.0 (0x7fc0d9f6e000+0xedf0) [0x7fc0d9f7cdf0]
  [   653.282] 3: /usr/bin/X (0x400000+0x5c6ac) [0x45c6ac]
  [   653.282] 4: /usr/bin/X (LocalClient+0x2d) [0x46848d]
  [   653.282] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7fc0d735d000+0x3@
  [   653.282] 6: /usr/bin/X (0x400000+0x522c9) [0x4522c9]
  [   653.282] 7: /usr/bin/X (0x400000+0x24bf5) [0x424bf5]
  [   653.282] 8: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7fc0d8f05a26]
  [   653.283] 9: /usr/bin/X (0x400000+0x247b9) [0x4247b9]
  [   653.283] Segmentation fault at address 0x28
  [   653.283]
  Fatal server error:
  [   653.283] Caught signal 11 (Segmentation fault). Server aborting

Comment 4 Nick Bowler 2010-06-03 11:51:14 UTC

Perhaps more useful than the X log backtrace, here's the trace from the core
dump, featuring actual debugging symbols.  Taken from latest X git.

The fault occurs because ciptr (which is 0) is dereferenced.

  [snip]
  #8  <signal handler called>
  #9  0x00000000004c523b in _XSERVTransGetPeerAddr (ciptr=0x0, 
      familyp=0x7fff222d9294, addrlenp=0x7fff222d9298, addrp=0x7fff222d9288)
      at /usr/include/X11/Xtrans/Xtrans.c:987
  #10 0x0000000000482032 in LocalClient (client=0x3bbcad0) at access.c:1126
  #11 0x00007fa498b1ce62 in ProcDRI2Dispatch (client=0x3bbcad0) at dri2ext.c:559
  #12 0x000000000042d0aa in Dispatch () at dispatch.c:432
  #13 0x0000000000424ca6 in main (argc=3, argv=0x7fff222d9458, 
      envp=0x7fff222d9478) at main.c:283

Comment 5 Chris Wilson 2010-06-07 02:49:42 UTC

Hmm, this looks like another racy termination condition. I suspect that this is sufficient to fixup this instance:

diff --git a/os/access.c b/os/access.c
index 36e1b81..ed20e07 100644
--- a/os/access.c
+++ b/os/access.c
@@ -1123,6 +1123,9 @@ Bool LocalClient(ClientPtr client)
     pointer            addr;
     register HOST      *host;
 
+    if (client->clientGone)
+       return FALSE;
+
     if (!_XSERVTransGetPeerAddr (((OsCommPtr)client->osPrivate)->trans_conn,
        &notused, &alen, &from))
     {

Nick, can you try this and if happens again p *client.

Kristian, smells like more dri2 fun, over to you. ;-)

Comment 6 Nick Bowler 2010-06-07 12:58:16 UTC

I applied that patch on top of xserver git master, and the server still crashes
in exactly the same place with an identical trace (modulo line number changes).

In case it's helpful, here's the client structure at the call site of
_XSERVTransGetPeerAddr (frame 10 in the backtrace).  Note that clientGone is
zero.

  (gdb) print *client
  $1 = {index = 9, clientAsMask = 18874368, requestBuffer = 0x2d7a2d4, 
    osPrivate = 0x2bd12b0, swapped = 0, pSwapReplyFunc = 0, 
    errorValue = 18874370, sequence = 43, closeDownMode = 0, clientGone = 0, 
    noClientException = -1, saveSet = 0x0, numSaved = 0, 
    requestVector = 0x862f80, req_len = 5, big_requests = 1, priority = 0, 
    clientState = ClientStateRunning, devPrivates = 0x2cf2eb0, 
    xkbClientFlags = 32768, mapNotifyMask = 0, newKeyboardNotifyMask = 0, 
    vMajor = 1, vMinor = 0, minKC = 8 '\b', maxKC = 255 '\377', 
    replyBytesRemaining = 0, smart_priority = 0, smart_start_tick = 3920, 
    smart_stop_tick = 3920, smart_check_tick = 3920, clientPtr = 0x2a7e480}

Also, here's the osPrivate structure, of which the the trans_conn member is
passed to _XSERVTransGetPeerAddr.

  (gdb) print *(OsCommPtr)client->osPrivate
  $4 = {fd = 22, input = 0x2cf4e30, output = 0x2ca97b0, auth_id = 0, 
    conn_time = 0, trans_conn = 0x0}

Comment 7 Nick Bowler 2010-06-22 13:22:32 UTC

I think the second crash is fixed by xserver commit 660f6ab5494a72 ("Don't crash when asked if a client that has disconnected was local"), so I'm closing this.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.