Bug 55260

Summary: [sna ZaphodHeads] X crashes when displayport adapter unplugged/plugged in
Product: xorg Reporter: Stephen Liang <inteldriver>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg logs, syslog, dmesg, and xorg conf
none
lightdm logs
none
GDB Backtrace
none
Full xorg debug logs
none
Full xorg debug logs updated
none
Watchpoint on rrPrivKeyRec.initialized
none
Second watch, this time while unplugged then plug
none
Test run 2 with more info
none
Step through of rrScrPriv.initialized
none
Disable RandR hotplug events with Xinerama
none
Disable RandR hotplug events with Xinerama none

Description Stephen Liang 2012-09-23 22:43:19 UTC
Created attachment 67596 [details]
Xorg logs, syslog, dmesg, and xorg conf

X seems to restart when I unplug the displayport adapter or when I plug in the displayport adapter. This has occurred on the latest git commit, c6008068372709c73034163eddc902b47bf87d24, and on the latest release, 2.20.8. 

Each time X crashes, it successfully restarts. Upon X restarting, everything is working fine. However, it is a bit annoying since all my running applications are killed when it restarts and I have to log in again.

Xorg.1.freshboot contains the logging information from a fresh start of X
Xorg.1.unplug contains the logging information from fresh start all the way to its crash after I unplug the adapter
Xorg.2.log is the logging information from the restarted X server

Please let me know if you need any more information. There aren't any seg faults and syslog/dmesg doesn't have anything out of the ordinary.
Comment 1 Chris Wilson 2012-09-24 07:28:26 UTC
There's no crash recorded. Looked for a capture of the stderr perhaps.
Comment 2 Stephen Liang 2012-09-24 13:52:15 UTC
Created attachment 67632 [details]
lightdm logs

Found something in the lightdm logs for X. 

X: ../../include/privates.h:116: dixGetPrivateAddr: Assertion `key->initialized' failed.

Here's privates.h for my machine:

113 static inline void *
114 dixGetPrivateAddr(PrivatePtr *privates, const DevPrivateKey key)
115 {
116     assert(key->initialized);
117     return (char *) (*privates) + key->offset;
118 }

I attached the lightdm logs as well.
Comment 3 Chris Wilson 2012-09-24 14:41:37 UTC
Hmm, can you attach gdb and get a full bt? Hopefully I'll be able to test tomorrow but if you can grab that full backtrace I should be able to find the bug. Thanks.
Comment 4 Stephen Liang 2012-09-25 23:07:33 UTC
Created attachment 67698 [details]
GDB Backtrace

Here you go, the full backtrace. Let me know if you need anything else!
Comment 5 Stephen Liang 2012-09-25 23:10:18 UTC
Forgot to mention, this particular backtrace is for the act of plugging in the displayport adapter, although I think the action may be irrelevant.
Comment 6 Chris Wilson 2012-09-26 09:22:53 UTC
Odd, can't spot a means for that assertion to be triggered. Tried reproducing locally, with no success so far, with and without server regeneration (and confirmed that sna_handle_uevents is being invoked). Is there a required step before it crashes?
Comment 7 Stephen Liang 2012-09-26 12:23:58 UTC
There are no additional steps. It will crash no matter where I am in X. For that matter, this backtrace was done while I was at the login screen. 

Here's some additional information:

Ubuntu 12.04
Linux tigger 3.5.0-15-generic #20-Ubuntu SMP Mon Sep 17 21:37:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Display manager: Lightdm

Intel driver commit: 74f930fd80c3f97a1b6213e9e79e02f8f51c64b9

Hardware:
Dell XPS 13
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
Intel HD Graphics 3000
Comment 8 Stephen Liang 2012-09-29 23:33:27 UTC
I ended up upgrading to xorg-server 1.13 and it still crashes with SIGABRT (on different lines, of course)

Xorg: ../../include/privates.h:123: dixGetPrivateAddr: Assertion `key->initialized' failed.
Comment 9 Stephen Liang 2012-10-22 13:45:51 UTC
Are there any updates to this bug? The issue still occurs with 2.20.12. Thanks!
Comment 10 Chris Wilson 2012-10-22 14:31:29 UTC
No, I haven't see this locally at all. Can you please try to reproduce the issue with --enable-debug=full and attach the Xorg.log?
Comment 11 Stephen Liang 2012-10-22 17:03:25 UTC
Created attachment 68919 [details]
Full xorg debug logs

I attached the debug logs, let me know if you need anything else.

Xorg.7.log contains everything from start to the hotplug event. Xorg.8.log is the restarted session.

Everything was done on the Ubuntu login screen.
Comment 12 Stephen Liang 2012-10-22 17:11:24 UTC
Created attachment 68920 [details]
Full xorg debug logs updated

Sorry about that, just noticed there are more debug statements in git. I pulled in the commit and reran the steps. Here are the updated logs.

Xorg.1.log contains everything from start to hotplug event
Xorg.2.log contains the restarted session

Again, everything was done on the Ubuntu login screen with the Displayport adapter already in there and removed to trigger the restart.
Comment 13 Chris Wilson 2012-10-22 17:36:15 UTC
Hmm, everything points towards rrPrivKeyRec being initialised -- it is used in the function that is called before the assert (and the initialisation code itself looks solid). So presumably we then have some corruption inside RRGetInfo(). In particular the function calls pScrPriv->rrGetInfo() between passing its own assertion on rrPrivKeyRec and just before calling RRScanOldConfig() and failing.

Can you attach gdb and set a watchpoint on *&rrPrivKeyRec.initialized?
Comment 14 Stephen Liang 2012-10-23 02:59:35 UTC
Created attachment 68940 [details]
Watchpoint on rrPrivKeyRec.initialized

Attached the logs, hope that helps.
Comment 15 Stephen Liang 2012-10-23 03:26:15 UTC
Created attachment 68941 [details]
Second watch, this time while unplugged then plug

I just observed something very interesting. In the watch log I attached above, this was following this scenario:

1. Have displayport already plugged in
2. Launch gdb for Xorg
3. Set watchpoint & run
4. No watchpoints encountered until I unplugged the displayport adapter

In this scenario, I attached a log where I followed these steps:

1. Have displayport not already plugged in
2. Launch gdb for Xorg
3. Set watchpoint & run
4. I encounter a watchpoint almost immediately (I followed this up with a bt full)
5. Plug in the displayport adapter
6. Nothing happens

When running X as usual, I repeated the steps above. In the first scenario, X crashes and restarts. In the second scenario, nothing happens but the second monitor does not display anything. Upon restarting X (essentially following the first scenario), the second monitor begins displaying like a second monitor would but would crash again like the first scenario.
Comment 16 Chris Wilson 2012-10-23 08:20:13 UTC
This is the one that is obviously an issue:

Hardware access (read/write) watchpoint 1: *&rrPrivKeyRec.initialized
Num     Type           Disp Enb Address    What
1       acc watchpoint keep y              *&rrPrivKeyRec.initialized
Starting program: /usr/bin/Xorg 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[tcsetpgrp failed in terminal_inferior: Operation not permitted]
Hardware access (read/write) watchpoint 1: *&rrPrivKeyRec.initialized

Value = 0
dixGetPrivate (privates=0x5555559ccec0, key=<optimized out>)
    at ../../randr/rrinfo.c:323
323	../../randr/rrinfo.c: No such file or directory.
Continuing.


line 323 of rrinfo.c is bogus. Can you attach the complete transcript, starting X under gdb? I expect to see the bt as we initialize the Screens and Crtcs, and never this second one. I just want to verify that we get the first bt and then nothing in between before it reads 0 - everytime it hits the watchpoint grab the bt. Also a good thing to check would be 'p &rrPrivKeyRec.initialized', I suspect that is not constant...
Comment 17 Chris Wilson 2012-10-23 10:35:58 UTC
Alternatively, you could 'p &rrPrivKeyRec.initialized' and then 'watch *<address>'
Comment 18 Chris Wilson 2012-10-23 10:36:37 UTC
Hmm, 'p &rrPrivKeyRec.initialized; watch *$'
Comment 19 Stephen Liang 2012-10-23 14:35:46 UTC
Created attachment 68953 [details]
Test run 2 with more info

Reran the test scenarios with the steps provided. 

Some additional info:

In scenario 1 (already plugged in then unplug):

I noticed it would not break until I pulled the displayport. I set a new watchpoint on (Bool *) 0x55555598d388 <rrPrivKeyRec+8> but it never tripped that watchpoint.

In scenario 2 (not plugged in then plug in):

I noticed it would almost immediately break on the watchpoint just like above. I printed & set a new watchpoint but no new breakpoints were ever encountered. Plugging in the DP did nothing and unplugging it did nothing.
Comment 20 Chris Wilson 2012-10-23 14:59:24 UTC
Can you please keep experimenting? My fear is that this is a build issue.

How about if you break on RRGetInfo() and manually step through p rrScrPriv.initialized at each point?
Comment 21 Stephen Liang 2012-10-23 19:31:21 UTC
Created attachment 68966 [details]
Step through of rrScrPriv.initialized

I ended up breaking on RRGetInfo() and printed out the value of rrPrivKeyRec.initialized step-by-step using:

p rrPrivKeyRec.initialized

Hopefully I'm doing this correctly. There is no rrScrPriv but there is a rrScrPrivPtr (so I just assumed it was rrPrivKeyRec.initialized, which is what I'm printing out here)

This is using only Scenario 1.
Comment 22 Chris Wilson 2012-10-23 20:00:19 UTC
Is there a chance you can come onto IRC (irc.freenode.net #intel-gfx) and I can ask you check a few more things in gdb? (I'm ickle on #intel-gfx)
Comment 23 Stephen Liang 2012-10-23 20:19:01 UTC
Unfortunately, not for another couple of hours. 3-4 hours. If that's not too good for you, just let me know what to run for now and I can grab the data later today.
Comment 24 Chris Wilson 2012-10-23 20:27:54 UTC
(In reply to comment #23)
> Unfortunately, not for another couple of hours. 3-4 hours. 

Heh, had been planning on an early night.

> If that's not too
> good for you, just let me know what to run for now and I can grab the data
> later today.

For scenario 1, I really want to sanity check that RRInit() is being called during sna_pre_init() [that's right at the very start of X loading]. That initalizes the rrPrivKeyRec and so should be preventing the assertions. If it is being called, then the question is where does it get cleared and so triggering the assertion. Which is partly why I'm wondering if that memory address is being clobbered elsewhere.

If you could build X with -O0 -g3, that would be a big help when running it through gdb. And also running it under valgrind.
Comment 25 Chris Wilson 2012-10-23 22:49:34 UTC
Created attachment 68970 [details] [review]
Disable RandR hotplug events with Xinerama
Comment 26 Chris Wilson 2012-10-23 23:05:12 UTC
Created attachment 68971 [details] [review]
Disable RandR hotplug events with Xinerama

Better patch, no ifdef required.
Comment 27 Stephen Liang 2012-10-23 23:50:25 UTC
Patch works. There is another bug that I'm noticing but I will open new bugs for those. Thanks for your patience while working through this! :)
Comment 28 Chris Wilson 2012-10-24 07:56:55 UTC
commit 1a489142c8e6a4828348cc9afbd0f430d3b1e2d8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 23 23:43:50 2012 +0100

    sna: Disable RandR hotplug events if Xinerama is enabled
    
    Since RandR itself is disabled if Xinerama is enabled, for example with
    ZaphodHeads, calling RRGetInfo() upon a hotplug event generates an
    assertion.
    
    Reported-by: Stephen Liang <inteldriver@angrywalls.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55260
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.