Bug 36930 - xorg-server crashes in record when DRI2 is sending asynchronous reply to WaitMSC because client->requestBuffer has been already freed
Summary: xorg-server crashes in record when DRI2 is sending asynchronous reply to Wait...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Ext/DRI (show other bugs)
Version: 7.6 (2010.12)
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL: http://bugs.debian.org/cgi-bin/bugrep...
Whiteboard: 2011BRB_Reviewed
Keywords: have-backtrace
: 42475 (view as bug list)
Depends on:
Blocks: xserver-1.12
  Show dependency treegraph
 
Reported: 2011-05-07 02:37 UTC by Hannu Leinonen
Modified: 2011-12-21 10:08 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Various logs hopefully helping to solve the issue (25.20 KB, application/x-gzip)
2011-05-07 02:37 UTC, Hannu Leinonen
no flags Details
Failed attempt to write piglit test case for the crash (6.54 KB, text/plain)
2011-11-06 01:52 UTC, Pauli
no flags Details

Description Hannu Leinonen 2011-05-07 02:37:41 UTC
Created attachment 46416 [details]
Various logs hopefully helping to solve the issue

I'm using the Ubuntu 11.04 64-bit. It usually happens straight or shortly after login, and I have to log in again. Once it has crashed, it doesn't seem to do it again until next reboot. It's been discussed in Ubuntu Lauchpad in here: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/768159 and here: https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/763313. As not being Linux bug reporting expert, please forgive my mistakes. The instructions on the former bug report led me here.

From Xorg.0.log.old I can see:

Backtrace:
[    34.123] 0: /usr/bin/X (xorg_backtrace+0x26) [0x4a2626]
[    34.123] 1: /usr/bin/X (0x400000+0x6219a) [0x46219a]
[    34.123] 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fc6179c0000+0xfc60) [0x7fc6179cfc60]
[    34.123] 3: /usr/lib/xorg/modules/extensions/librecord.so (0x7fc615375000+0x2920) [0x7fc615377920]
[    34.123] 4: /usr/bin/X (_CallCallbacks+0x34) [0x432af4]
[    34.123] 5: /usr/bin/X (WriteToClient+0x21a) [0x461c9a]
[    34.123] 6: /usr/lib/xorg/modules/extensions/libdri2.so (ProcDRI2WaitMSCReply+0x52) [0x7fc614d5cd82]
[    34.123] 7: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2WaitMSCComplete+0x59) [0x7fc614d5b479]
[    34.123] 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fc614b08000+0x25030) [0x7fc614b2d030]
[    34.123] 9: /lib/x86_64-linux-gnu/libdrm.so.2 (drmHandleEvent+0x108) [0x7fc614f66478]
[    34.123] 10: /usr/bin/X (WakeupHandler+0x4b) [0x4322fb]
[    34.123] 11: /usr/bin/X (WaitForSomething+0x1b6) [0x45c786]
[    34.123] 12: /usr/bin/X (0x400000+0x2e032) [0x42e032]
[    34.123] 13: /usr/bin/X (0x400000+0x21a7e) [0x421a7e]
[    34.123] 14: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xff) [0x7fc616909eff]
[    34.123] 15: /usr/bin/X (0x400000+0x21629) [0x421629]
[    34.124] Segmentation fault at address 0x7fc6185a2010
[    34.124] 
Caught signal 11 (Segmentation fault). Server aborting
[    34.124] 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help.
Comment 1 Pauli 2011-10-28 23:45:42 UTC
static void
RecordAReply(CallbackListPtr *pcbl, pointer nulldata, pointer calldata)
{
	RecordContextPtr pContext;
	RecordClientsAndProtocolPtr pRCAP;
	int eci;
	int majorop;
	ReplyInfoRec *pri = (ReplyInfoRec *)calldata;
	ClientPtr client = pri->client;
	REQUEST(xReq); // <- gets stuff from freed pointer

	majorop = stuff->reqType; // <- crash

Same in asm:
    2820:       41 57                   push   %r15
    2822:       41 56                   push   %r14
    2824:       41 55                   push   %r13
    2826:       49 89 d5                mov    %rdx,%r13
    2829:       41 54                   push   %r12
    282b:       55                      push   %rbp
    282c:       53                      push   %rbx
    282d:       48 83 ec 28             sub    $0x28,%rsp
    2831:       4c 8b 3a                mov    (%rdx),%r15
    // client = pri->client
    2834:       49 8b 47 08             mov    0x8(%r15),%rax
    // stuff = client->requestBuffer
    2838:       44 0f b6 30             movzbl (%rax),%r14d
    // majorop = stuff->reqType


client->requestBuffer doesn't hold any more original request because
reply is only send from WakeupHandler that is handling drmWaitVBlank
event for DRI2WaitMSC.

I don't know record&callback system enough to figure out how to fix the
crash quickly. It feels a bit like needing larger refactoring.
Comment 2 Jeremy Huddleston Sequoia 2011-10-31 17:13:18 UTC
Is this a regression?
Comment 3 Julien Cristau 2011-11-01 08:26:29 UTC
*** Bug 42475 has been marked as a duplicate of this bug. ***
Comment 4 Pauli 2011-11-06 01:33:51 UTC
(In reply to comment #2)
> Is this a regression?

No. It is old bug that was exposed by loosely related changes.
Comment 5 Pauli 2011-11-06 01:41:14 UTC
The bug is in record callback that happens to be hit by DRI2 because it is using IgnoreClient and later sending reply from WakeupHandler.

I guess simple fix for DRI2 caused crash would be changing WaitMSC to reset the current request. Later on wakeup handler would simple attend the client to handle same WaitMSC again.

But that would still leave record crashing for any other asynchronous reply.
Comment 6 Pauli 2011-11-06 01:52:57 UTC
Created attachment 53208 [details]
Failed attempt to write piglit test case for the crash
Comment 7 Rami Ylimaki 2011-11-07 01:54:23 UTC
(In reply to comment #5)
> But that would still leave record crashing for any other asynchronous reply.

In my opinion option 1 from http://lists.x.org/archives/xorg-devel/2011-October/026017.html is the best way to fix this problem.

+ very easy it implement (just store op-codes and length in ClientRec and use them in RecordAReply)
+ protects all problematic requests automatically (RecordEnableContext, ListFontsWithInfo, DRI2WaitMSC, ...)
- ClientRec is amended with data that is only needed by Record extension

I have spent some time thinking of alternative approaches, but they are usually clumsy and require a lot more code fix the problem. I haven't provided a v2 patch yet, because I've been busy with other tasks lately.
Comment 8 Jeremy Huddleston Sequoia 2011-12-21 10:08:12 UTC
Based on an email from Julien, this should now be fixed in master and server-1.11-branch with fb22a408c69a84f81905147de9e82cf66ffb6eb2


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.