Created attachment 36014 [details] Full server log. So here's yet another problem related to my favourite test case, glresize, attached to bug 27922. I'm fairly certain this worked when I originally wrote the test case, but I haven't yet been able to identify a specific component which might have regressed. Anyway, the scenario today is the --single option to glresize, which makes it use single buffering. After a few moments, the server inevitably segfaults and prints the following: [ 14528.767] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error. [ 14528.767] (EE) intel(0): Disabling acceleration. [ 14528.788] Backtrace: [ 14528.858] 0: /usr/bin/X (xorg_backtrace+0x28) [0x491818] [ 14528.858] 1: /usr/bin/X (0x400000+0x65ca9) [0x465ca9] [ 14528.858] 2: /lib/libpthread.so.0 (0x7f9df2dc9000+0xedf0) [0x7f9df2dd7df0] [ 14528.858] 3: /usr/local/lib/libdrm_intel.so.1 (drm_intel_bo_flink+0x0) [0x7f9defd60c60] [ 14528.858] 4: /usr/local/lib/xorg/modules/drivers/intel_drv.so (0x7f9deff6a000+0x2fdfd) [0x7f9deff99dfd] [ 14528.858] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x19e7) [0x7f9df01b99e7] [ 14528.858] 6: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x1fdb) [0x7f9df01b9fdb] [ 14528.858] 7: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2GetBuffersWithFormat+0x10) [0x7f9df01ba250] [ 14528.858] 8: /usr/lib/xorg/modules/extensions/libdri2.so (0x7f9df01b8000+0x3834) [0x7f9df01bb834] [ 14528.858] 9: /usr/bin/X (0x400000+0x2fc2c) [0x42fc2c] [ 14528.858] 10: /usr/bin/X (0x400000+0x24da5) [0x424da5] [ 14528.858] 11: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7f9df1d60a26] [ 14528.858] 12: /usr/bin/X (0x400000+0x24959) [0x424959] [ 14528.858] Segmentation fault at address 0x20 [ 14528.858] Fatal server error: [ 14528.858] Caught signal 11 (Segmentation fault). Server aborting After this occurs, the console works fine but there is significant graphical corruption and log noise if I try to restart X. A reboot is required to fully recover. The attached log is from a machine with a G45, 2.6.34 kernel, xserver 1.8.1, git libdrm/mesa/xf86-video-intel. I've also reproduced it on a T500 laptop with a GM45, running git kernel/xserver/libdrm/mesa/xf86-video-intel.
* renames glresize to crashme.
Not seeing a GPU hang from simply running glresize --single. Are you sure that it was the trigger, and not another application?
Created attachment 36022 [details] Full server log, crash #2 OK, I'm less sure about what the problem is (or was) now. I can't reproduce some of the things I remember seeing at all anymore. However, it seems like the hang which caused the original log is actually my fault by accidentally running with the wrong mesa. The bonus is that your commit, 6db1e523 ("dri: Protect against NULL dereference following GPU hang."), has fixed this segfault anyway. But while we're at it, I now can produce a different segfault, this time I'm *definitely* using the right mesa git master, by just repeatedly running and ctrl+C'ing glresize enough times (when it finally goes, X crashes the moment I press ctrl+C). Occurs with both server 1.8.1 and git master... Backtrace: [ 653.281] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4675e8] [ 653.282] 1: /usr/bin/X (0x400000+0x67549) [0x467549] [ 653.282] 2: /lib/libpthread.so.0 (0x7fc0d9f6e000+0xedf0) [0x7fc0d9f7cdf0] [ 653.282] 3: /usr/bin/X (0x400000+0x5c6ac) [0x45c6ac] [ 653.282] 4: /usr/bin/X (LocalClient+0x2d) [0x46848d] [ 653.282] 5: /usr/lib/xorg/modules/extensions/libdri2.so (0x7fc0d735d000+0x3@ [ 653.282] 6: /usr/bin/X (0x400000+0x522c9) [0x4522c9] [ 653.282] 7: /usr/bin/X (0x400000+0x24bf5) [0x424bf5] [ 653.282] 8: /lib/libc.so.6 (__libc_start_main+0xe6) [0x7fc0d8f05a26] [ 653.283] 9: /usr/bin/X (0x400000+0x247b9) [0x4247b9] [ 653.283] Segmentation fault at address 0x28 [ 653.283] Fatal server error: [ 653.283] Caught signal 11 (Segmentation fault). Server aborting
Perhaps more useful than the X log backtrace, here's the trace from the core dump, featuring actual debugging symbols. Taken from latest X git. The fault occurs because ciptr (which is 0) is dereferenced. [snip] #8 <signal handler called> #9 0x00000000004c523b in _XSERVTransGetPeerAddr (ciptr=0x0, familyp=0x7fff222d9294, addrlenp=0x7fff222d9298, addrp=0x7fff222d9288) at /usr/include/X11/Xtrans/Xtrans.c:987 #10 0x0000000000482032 in LocalClient (client=0x3bbcad0) at access.c:1126 #11 0x00007fa498b1ce62 in ProcDRI2Dispatch (client=0x3bbcad0) at dri2ext.c:559 #12 0x000000000042d0aa in Dispatch () at dispatch.c:432 #13 0x0000000000424ca6 in main (argc=3, argv=0x7fff222d9458, envp=0x7fff222d9478) at main.c:283
Hmm, this looks like another racy termination condition. I suspect that this is sufficient to fixup this instance: diff --git a/os/access.c b/os/access.c index 36e1b81..ed20e07 100644 --- a/os/access.c +++ b/os/access.c @@ -1123,6 +1123,9 @@ Bool LocalClient(ClientPtr client) pointer addr; register HOST *host; + if (client->clientGone) + return FALSE; + if (!_XSERVTransGetPeerAddr (((OsCommPtr)client->osPrivate)->trans_conn, ¬used, &alen, &from)) { Nick, can you try this and if happens again p *client. Kristian, smells like more dri2 fun, over to you. ;-)
I applied that patch on top of xserver git master, and the server still crashes in exactly the same place with an identical trace (modulo line number changes). In case it's helpful, here's the client structure at the call site of _XSERVTransGetPeerAddr (frame 10 in the backtrace). Note that clientGone is zero. (gdb) print *client $1 = {index = 9, clientAsMask = 18874368, requestBuffer = 0x2d7a2d4, osPrivate = 0x2bd12b0, swapped = 0, pSwapReplyFunc = 0, errorValue = 18874370, sequence = 43, closeDownMode = 0, clientGone = 0, noClientException = -1, saveSet = 0x0, numSaved = 0, requestVector = 0x862f80, req_len = 5, big_requests = 1, priority = 0, clientState = ClientStateRunning, devPrivates = 0x2cf2eb0, xkbClientFlags = 32768, mapNotifyMask = 0, newKeyboardNotifyMask = 0, vMajor = 1, vMinor = 0, minKC = 8 '\b', maxKC = 255 '\377', replyBytesRemaining = 0, smart_priority = 0, smart_start_tick = 3920, smart_stop_tick = 3920, smart_check_tick = 3920, clientPtr = 0x2a7e480} Also, here's the osPrivate structure, of which the the trans_conn member is passed to _XSERVTransGetPeerAddr. (gdb) print *(OsCommPtr)client->osPrivate $4 = {fd = 22, input = 0x2cf4e30, output = 0x2ca97b0, auth_id = 0, conn_time = 0, trans_conn = 0x0}
I think the second crash is fixed by xserver commit 660f6ab5494a72 ("Don't crash when asked if a client that has disconnected was local"), so I'm closing this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.