28361 – "glresize" causes server segfault with single buffering.

Bug 28361 - "glresize" causes server segfault with single buffering.

Summary: "glresize" causes server segfault with single buffering.

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/intel (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Kristian Høgsberg
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-06-02 11:57 UTC by Nick Bowler
Modified:	2010-06-22 13:22 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Full server log. (31.42 KB, text/plain) 2010-06-02 11:57 UTC, Nick Bowler	no flags	Details
Full server log, crash #2 (25.78 KB, text/plain) 2010-06-02 18:14 UTC, Nick Bowler	no flags	Details
View All

Description Nick Bowler 2010-06-02 11:57:03 UTC

Created attachment 36014 [details]
Full server log.

So here's yet another problem related to my favourite test case, glresize,
attached to bug 27922.  I'm fairly certain this worked when I originally wrote
the test case, but I haven't yet been able to identify a specific component
which might have regressed.

Anyway, the scenario today is the --single option to glresize, which makes it
use single buffering.  After a few moments, the server inevitably segfaults
and prints the following:

  [ 14528.767] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
  [ 14528.767] (EE) intel(0): Disabling acceleration.
  [ 14528.788] 
  Backtrace:
  [ 14528.858] 0: /usr/bin/X (xorg_backtrace+0x28) [0x491818]
  [ 14528.858] 1: /usr/bin/X (0x400000+0x65ca9) [0x465ca9]
  [ 14528.858] 2: /lib/libpthread.so.0 (0x7f9df2dc9000+0xedf0) [0x7f9df2dd7df0]
  [ 14528.858] 3: /usr/local/lib/libdrm_intel.so.1 (drm_intel_bo_flink+0x0) [0x7f9defd60c60]
  [ 14528.858] 4: /usr/local/lib/xorg/modules/drivers/intel_drv.so (0x7f9deff6a000+0x2fdfd) [0x7f9deff99dfd]
  [ 14528.858] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x19e7) [0x7f9df01b99e7]
  [ 14528.858] 6: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x1fdb) [0x7f9df01b9fdb]
  [ 14528.858] 7: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2GetBuffersWithFormat+0x10) [0x7f9df01ba250]
  [ 14528.858] 8: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x3834) [0x7f9df01bb834]
  [ 14528.858] 9: /usr/bin/X (0x400000+0x2fc2c) [0x42fc2c]
  [ 14528.858] 10: /usr/bin/X (0x400000+0x24da5) [0x424da5]
  [ 14528.858] 11: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7f9df1d60a26]
  [ 14528.858] 12: /usr/bin/X (0x400000+0x24959) [0x424959]
  [ 14528.858] Segmentation fault at address 0x20
  [ 14528.858] 
  Fatal server error:
  [ 14528.858] Caught signal 11 (Segmentation fault). Server aborting

After this occurs, the console works fine but there is significant graphical
corruption and log noise if I try to restart X.  A reboot is required to fully
recover.

The attached log is from a machine with a G45, 2.6.34 kernel, xserver 1.8.1,
git libdrm/mesa/xf86-video-intel.  I've also reproduced it on a T500 laptop
with a GM45, running git kernel/xserver/libdrm/mesa/xf86-video-intel.

Comment 1 Chris Wilson 2010-06-02 12:34:48 UTC

* renames glresize to crashme.

Comment 2 Chris Wilson 2010-06-02 15:10:09 UTC

Not seeing a GPU hang from simply running glresize --single. Are you sure that it was the trigger, and not another application?

Comment 3 Nick Bowler 2010-06-02 18:14:25 UTC

Created attachment 36022 [details]
Full server log, crash #2

OK, I'm less sure about what the problem is (or was) now.  I can't reproduce
some of the things I remember seeing at all anymore.  However, it seems like
the hang which caused the original log is actually my fault by accidentally
running with the wrong mesa.  The bonus is that your commit, 6db1e523 ("dri:
Protect against NULL dereference following GPU hang."), has fixed this segfault
anyway.

But while we're at it, I now can produce a different segfault, this time I'm
*definitely* using the right mesa git master, by just repeatedly running and
ctrl+C'ing glresize enough times (when it finally goes, X crashes the moment
I press ctrl+C).  Occurs with both server 1.8.1 and git master...

  Backtrace:
  [   653.281] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4675e8]
  [   653.282] 1: /usr/bin/X (0x400000+0x67549) [0x467549]
  [   653.282] 2: /lib/libpthread.so.0 (0x7fc0d9f6e000+0xedf0) [0x7fc0d9f7cdf0]
  [   653.282] 3: /usr/bin/X (0x400000+0x5c6ac) [0x45c6ac]
  [   653.282] 4: /usr/bin/X (LocalClient+0x2d) [0x46848d]
  [   653.282] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7fc0d735d000+0x3@
  [   653.282] 6: /usr/bin/X (0x400000+0x522c9) [0x4522c9]
  [   653.282] 7: /usr/bin/X (0x400000+0x24bf5) [0x424bf5]
  [   653.282] 8: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7fc0d8f05a26]
  [   653.283] 9: /usr/bin/X (0x400000+0x247b9) [0x4247b9]
  [   653.283] Segmentation fault at address 0x28
  [   653.283]
  Fatal server error:
  [   653.283] Caught signal 11 (Segmentation fault). Server aborting

Comment 4 Nick Bowler 2010-06-03 11:51:14 UTC

Perhaps more useful than the X log backtrace, here's the trace from the core
dump, featuring actual debugging symbols.  Taken from latest X git.

The fault occurs because ciptr (which is 0) is dereferenced.

  [snip]
  #8  <signal handler called>
  #9  0x00000000004c523b in _XSERVTransGetPeerAddr (ciptr=0x0, 
      familyp=0x7fff222d9294, addrlenp=0x7fff222d9298, addrp=0x7fff222d9288)
      at /usr/include/X11/Xtrans/Xtrans.c:987
  #10 0x0000000000482032 in LocalClient (client=0x3bbcad0) at access.c:1126
  #11 0x00007fa498b1ce62 in ProcDRI2Dispatch (client=0x3bbcad0) at dri2ext.c:559
  #12 0x000000000042d0aa in Dispatch () at dispatch.c:432
  #13 0x0000000000424ca6 in main (argc=3, argv=0x7fff222d9458, 
      envp=0x7fff222d9478) at main.c:283

Comment 5 Chris Wilson 2010-06-07 02:49:42 UTC

Hmm, this looks like another racy termination condition. I suspect that this is sufficient to fixup this instance:

diff --git a/os/access.c b/os/access.c
index 36e1b81..ed20e07 100644
--- a/os/access.c
+++ b/os/access.c
@@ -1123,6 +1123,9 @@ Bool LocalClient(ClientPtr client)
     pointer            addr;
     register HOST      *host;
 
+    if (client->clientGone)
+       return FALSE;
+
     if (!_XSERVTransGetPeerAddr (((OsCommPtr)client->osPrivate)->trans_conn,
        &notused, &alen, &from))
     {

Nick, can you try this and if happens again p *client.

Kristian, smells like more dri2 fun, over to you. ;-)

Comment 6 Nick Bowler 2010-06-07 12:58:16 UTC

I applied that patch on top of xserver git master, and the server still crashes
in exactly the same place with an identical trace (modulo line number changes).

In case it's helpful, here's the client structure at the call site of
_XSERVTransGetPeerAddr (frame 10 in the backtrace).  Note that clientGone is
zero.

  (gdb) print *client
  $1 = {index = 9, clientAsMask = 18874368, requestBuffer = 0x2d7a2d4, 
    osPrivate = 0x2bd12b0, swapped = 0, pSwapReplyFunc = 0, 
    errorValue = 18874370, sequence = 43, closeDownMode = 0, clientGone = 0, 
    noClientException = -1, saveSet = 0x0, numSaved = 0, 
    requestVector = 0x862f80, req_len = 5, big_requests = 1, priority = 0, 
    clientState = ClientStateRunning, devPrivates = 0x2cf2eb0, 
    xkbClientFlags = 32768, mapNotifyMask = 0, newKeyboardNotifyMask = 0, 
    vMajor = 1, vMinor = 0, minKC = 8 '\b', maxKC = 255 '\377', 
    replyBytesRemaining = 0, smart_priority = 0, smart_start_tick = 3920, 
    smart_stop_tick = 3920, smart_check_tick = 3920, clientPtr = 0x2a7e480}

Also, here's the osPrivate structure, of which the the trans_conn member is
passed to _XSERVTransGetPeerAddr.

  (gdb) print *(OsCommPtr)client->osPrivate
  $4 = {fd = 22, input = 0x2cf4e30, output = 0x2ca97b0, auth_id = 0, 
    conn_time = 0, trans_conn = 0x0}

Comment 7 Nick Bowler 2010-06-22 13:22:32 UTC

I think the second crash is fixed by xserver commit 660f6ab5494a72 ("Don't crash when asked if a client that has disconnected was local"), so I'm closing this.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.