Bug 56608

Summary: xorg-server-1.13.0: core dump on regen
Product: xorg Reporter: Martin Mokrejs <mmokrejs>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: medium CC: alexandre.nunes
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
gdb stacktrace (xorg-server-1.13.0, xf86-video-intel-2.20.12)
none
.config
none
Xorg.0.log with --enable-debug=full
none
Xorg.0.log (no intel driver, only fbdev)
none
Xorg.0.log
none
xdm.log
none
Xorg.0.log
none
xdm.log
none
Xorg.0.log with '-7 SELECTIONs'
none
Xorg.0.log (working, patched driver)
none
Xorg.0.log (still 2.20.15 crashing)
none
gdb stacktrace (xorg-server-1.13.0-r1, xf86-video-intel-2.20.15)
none
Xorg.0.log (working, 1.12.4 server, xf86-video-intel-2.20.15)
none
Xorg.0.log (crash, xorg-server-1.12.4+xf86-video-intel-2.20.15)
none
gdb stacktrace (xorg-server-1.12.4, xf86-video-intel-2.20.15)
none
Valgrind log from a session run
none
Valgrind on the non-stripped version Xorg binary
none
Xorg.log from the --enable-debug=full scenario
none
valgrind log from the latest run
none
Xorg.log from the d4164de5ccb82068e2858a90b2cd44eef82b6037 revision
none
startx.log
none
.xsession-errors none

Description Martin Mokrejs 2012-10-31 10:13:04 UTC
I experience the following problem on Gentoo Linux with:

x11-base/xorg-server-1.13.0  USE="nptl udev xorg -dmx -doc -ipv6 -kdrive -minimal (-selinux) -static-libs -tslib -xnest -xvfb"

Since I moved to linux-3.6.4 I have problem with X crashes after xdm login. I think it is because xfce4-session or whichever application call something in X server and makes it crash. In other words, I start run "startx" with fvwm fine, also the xdm login screen works fine. The crash happens only after successfull login.

I will attach disassembled coredump file and the Xorg.0.log file. I have no /etc/X11/xorg.conf file anymore so this a full automatic self-autoconfig.


[  9912.125] (II) Open ACPI successful (/var/run/acpid.socket)
[  9912.125] (II) APM registered successfully
[  9912.125] (II) intel(0): SNA initialized with SandyBridge backend
[  9912.125] (II) intel(0): HW Cursor enabled
[  9912.125] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
[  9912.126] (==) intel(0): DPMS enabled
[  9912.126] (II) intel(0): Overlay video not supported on this hardware
[  9912.126] (II) intel(0): [DRI2] Setup complete
[  9912.126] (II) intel(0): [DRI2]   DRI driver: i965
[  9912.126] (II) intel(0): direct rendering: DRI2 Enabled
[  9912.126] (==) intel(0): hotplug detection: "enabled"
[  9912.126] (--) RandR disabled
[  9912.136] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer
[  9912.136] (II) AIGLX: enabled GLX_INTEL_swap_event
[  9912.136] (II) AIGLX: enabled GLX_ARB_create_context
[  9912.136] (II) AIGLX: enabled GLX_ARB_create_context_profile
[  9912.136] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile
[  9912.136] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control
[  9912.136] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects
[  9912.137] (II) AIGLX: Loaded and initialized i965
[  9912.137] (II) GLX: Initialized DRI2 GL provider for screen 0
[  9912.137] (EE) 
[  9912.137] (EE) Backtrace:
[  9912.137] (EE) 0: /usr/bin/X (xorg_backtrace+0x41) [0x60ebb9]
[  9912.137] (EE) 1: /usr/bin/X (0x400000+0x214c10) [0x614c10]
[  9912.137] (EE) 2: /lib64/libpthread.so.0 (0x7f021f48a000+0x10bf0) [0x7f021f49abf0]
[  9912.137] (EE) 3: /usr/bin/X (0x400000+0x158711) [0x558711]
[  9912.137] (EE) 4: /usr/bin/X (0x400000+0x1587d8) [0x5587d8]
[  9912.137] (EE) 5: /usr/bin/X (CreatePicture+0x3f) [0x55aa84]
[  9912.137] (EE) 6: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f021ce5f000+0x66130) [0x7f021cec5130]
[  9912.137] (EE) 7: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f021ce5f000+0x4ea5f) [0x7f021ceada5f]
[  9912.137] (EE) 8: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f021ce5f000+0x63756) [0x7f021cec2756]
[  9912.138] (EE) 9: /usr/bin/X (0x400000+0xde545) [0x4de545]
[  9912.138] (EE) 10: /usr/bin/X (0x400000+0x27ee6) [0x427ee6]
[  9912.138] (EE) 11: /lib64/libc.so.6 (__libc_start_main+0xed) [0x7f021e13160d]
[  9912.138] (EE) 12: /usr/bin/X (0x400000+0x27a49) [0x427a49]
[  9912.138] (EE) 
[  9912.138] (EE) Segmentation fault at address 0x0
[  9912.138] 


(gdb) where
#0  0x00007f021e144b65 in raise () from /lib64/libc.so.6
#1  0x00007f021e145fdb in abort () from /lib64/libc.so.6
#2  0x0000000000617e86 in OsAbort () at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/os/utils.c:1266
#3  0x000000000049adc8 in ddxGiveUp (error=EXIT_ERR_ABORT) at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/hw/xfree86/common/xf86Init.c:1060
#4  0x000000000049aef0 in AbortDDX (error=EXIT_ERR_ABORT) at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/hw/xfree86/common/xf86Init.c:1104
#5  0x000000000061f555 in AbortServer () at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/os/log.c:652
#6  0x000000000061fa2b in FatalError (f=0x64aca8 "Caught signal %d (%s). Server aborting\n")
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/os/log.c:793
#7  0x0000000000614cb2 in OsSigHandler (signo=11, sip=0x7fffca762e70, unused=0x7fffca762d40)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/os/osinit.c:146
#8  <signal handler called>
#9  0x0000000000558711 in dixGetPrivate (privates=0x218d5b0, key=0x8ae560 <PictureScreenPrivateKeyRec>)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/include/privates.h:138
#10 0x00000000005587d8 in dixLookupPrivate (privates=0x218d5b0, key=0x8ae560 <PictureScreenPrivateKeyRec>)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/include/privates.h:168
#11 0x000000000055aa84 in CreatePicture (pid=0, pDrawable=0x24a62b0, pFormat=0x21cb388, vmask=4096, vlist=0x7fffca7633b8, client=0x2172a60, error=0x7fffca7633bc)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/render/picture.c:764
#12 0x00007f021cec5130 in sna_glyphs_create (sna=0x7f021c7e3010)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.12/work/xf86-video-intel-2.20.12/src/sna/sna_glyphs.c:228
#13 0x00007f021ceada5f in sna_accel_create (sna=0x7f021c7e3010)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.12/work/xf86-video-intel-2.20.12/src/sna/sna_accel.c:14448
#14 0x00007f021cec2756 in sna_create_screen_resources (screen=0x21ae930)
    at /var/tmp/portage/x11-drivers/xf86-video-intel-2.20.12/work/xf86-video-intel-2.20.12/src/sna/sna_driver.c:170
#15 0x00000000004de545 in xf86CrtcCreateScreenResources (screen=0x21ae930)
    at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/hw/xfree86/modes/xf86Crtc.c:706
#16 0x0000000000427ee6 in main (argc=5, argv=0x7fffca7635e8, envp=0x7fffca763618) at /var/tmp/portage/x11-base/xorg-server-1.13.0/work/xorg-server-1.13.0/dix/main.c:222
(gdb)
Comment 1 Martin Mokrejs 2012-10-31 10:14:01 UTC
Created attachment 69344 [details]
Xorg.0.log
Comment 2 Martin Mokrejs 2012-10-31 10:14:18 UTC
Created attachment 69345 [details]
gdb stacktrace (xorg-server-1.13.0, xf86-video-intel-2.20.12)
Comment 3 Martin Mokrejs 2012-10-31 10:22:02 UTC
Created attachment 69346 [details]
.config

This is an Intel i7-based laptop with just the builtin graphics card form Intel. I suspect this could be related to a HW misconfiguration because it seems to have a framebuffer on console I have to have all of these enabled or disabled (as shown, notably must have the KMS enabled) :-((:

CONFIG_AGP_INTEL=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=2
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_TTM=m
# CONFIG_DRM_NOUVEAU is not set
# CONFIG_DRM_I810 is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_KMS=y
# CONFIG_DRM_GMA500 is not set
# CONFIG_STUB_POULSBO is not set
Comment 4 Chris Wilson 2012-11-01 09:35:23 UTC
Can't see a reason for this, everything looks like it should be reintialiased in the right order. Can you try compiling with --enable-debug=full and attaching the voluminous logfile after the crash?
Comment 5 Martin Mokrejs 2012-11-01 11:10:51 UTC
Created attachment 69390 [details]
Xorg.0.log with --enable-debug=full

I also added "-DDEBUG -DNDEBUG" CFLAGS as in https://bugs.gentoo.org/show_bug.cgi?id=256034. I don't see the Xorg.log.0 file to be much longer but I hope it helps you. I don't see any difference in gdb stacktrace from the core dump file generated by the DEBUG binary.

No kernel messages in dmesg as before, but looks I overlooked that I have after each xdm login in /var/log/messages:

Nov  1 11:34:04 vostro xdm[3086]: pam_lastlog(xdm:session): conversation failed
Nov  1 11:34:04 vostro xdm[3086]: pam_unix(xdm:session): session opened for user mmokrejs by (uid=0)
Nov  1 11:34:04 vostro xdm[3086]: pam_ck_connector(xdm:session): nox11 mode, ignoring PAM_TTY :0
Nov  1 11:34:04 vostro xdm[3086]: pam_motd(xdm:session): conversation failed
Nov  1 11:34:04 vostro xdm[3086]: pam_unix(xdm:session): session closed for user mmokrejs
Nov  1 11:34:04 vostro acpid: client 3080[0:0] has disconnected
Nov  1 11:34:04 vostro acpid: client connected from 3080[0:0]
Nov  1 11:34:04 vostro acpid: 1 client rule loaded
Nov  1 11:34:05 vostro acpid: client 3080[0:0] has disconnected
Nov  1 11:34:05 vostro acpid: client connected from 3147[0:0]
Nov  1 11:34:05 vostro acpid: 1 client rule loaded
Comment 6 Chris Wilson 2012-11-01 11:13:15 UTC
Apologies, I was referring to -intel. But that did add some useful information to the Xorg.log as well. Can I have both? And a pony?
Comment 7 Martin Mokrejs 2012-11-01 12:46:26 UTC
Created attachment 69391 [details]
Xorg.0.log (no intel driver, only fbdev)

So I uninstalled xf86-video-intel package and re-compiled server to be sure does not look for the driver (was probably not necessary):

root# VIDEO_CARDS="vesa vmware fbdev" USE=debug CFLAGS="-ggdb  -pipe -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx -maes -march=native" CXXFLAGS="${CFLAGS}" emerge xorg-server

For completeness, let me emphasize that I had so far:

root# grep VIDEO_CARDS /etc/make.conf
VIDEO_CARDS="vesa intel i915 vmware fbdev i965 nouveau"
root#
Comment 8 Martin Mokrejs 2012-11-01 12:50:41 UTC
One more comment regarding the "Xorg.0.log (no intel driver, only fbdev)". A successful login into xdm as before restarts X but the Xorg.0.log file is not renamed. I believe this confirms that X did NOT crash. So maybe I really have two issues? One with intel driver and the other with PAM module? Then some new error messages in Xorg.0.log should be introduced.
Comment 9 Chris Wilson 2012-11-01 13:05:25 UTC
As expected, the fbdev log demonstrates that the crash is due to some interaction between xf86-video-intel and Xorg. I don't believe you have any other bug here yet...
Comment 10 Martin Mokrejs 2012-11-04 01:04:17 UTC
Created attachment 69508 [details]
Xorg.0.log

I still do NOT understand what is going on. I tried downgrade to previous xorg-server I used to have (1.12.3), to previous xf86-video-intel-2.20.9, many mesa and other x11-drivers/ packages. No luck although once xdm login worked for me after a fresh bootup using an older kernel. I thought maybe once an xdm-related crash happens something in the kernel breaks and no subsequent attemps could succeed. But this does NOT look to be the case.

I still wonder why I can start as a normal user my X session, also with DRI (confirmed by glxinfo), both via startx and xinit. So the only way when I get the crash is through xdm login screen -> correct username/password -> enter -> blink to console - crash -> new xdm login window.
Comment 11 Martin Mokrejs 2012-11-04 01:05:08 UTC
Created attachment 69509 [details]
xdm.log
Comment 12 Martin Mokrejs 2012-11-04 01:22:19 UTC
Created attachment 69510 [details]
Xorg.0.log

So, now I tried another, older kernel, 3.4.17, rebooted, during bootup the external HDMI display was NOT detected in console fb console mode. xdm login window started up. I unplugged the HDMI cable form the external LCD panel, and re-plugged it in. Now the screen came also on the external display. But hey, I could login using xdm, no crash! I am attaching the logfile. It shows that maybe some "stale" device for video card was removed from xfce4 ^H^H^H^H^H Xserver settings? Or was that related to the external monitor being unplugged?

For completeness, glxinfo shows DRI works, so I am happy but who knows if on next restart will the problem re-appear? ;)
Comment 13 Martin Mokrejs 2012-11-04 01:24:36 UTC
Created attachment 69511 [details]
xdm.log

And xdm.log for the running instance.
Comment 14 Martin Mokrejs 2012-11-04 11:27:04 UTC
Created attachment 69512 [details]
Xorg.0.log with '-7 SELECTIONs'

So, with more tests using 3.4.17 kernel I think I can conclude that the "xdm issue" I have has to do with some broken exit status of some module. That is something different then the core dump, of course. Upon fresh bootup I can login using xdm. It starts xfce4-session in my case. After logout, I get back the login screen but all attempts to login fail. If I look into the xorg.0.log files I see 

[    60.153] -7 SELECTIONs still allocated at reset
[    60.153] WINDOW: 0 objects of 40 bytes = 0 total bytes 0 private allocs

I recompile drm/agpgart/fbcon drivers as modules but I cannot unload them to test which one is being left in a wrong state. Would you help?

# lsmod
Module                  Size  Used by
ppp_async               6113  1 
ppp_generic            22288  5 ppp_async
slhc                    4898  1 ppp_generic
i915                  361339  2 
fbcon                  33482  76 
cfbfillrect             3125  1 i915
cfbimgblt               2164  1 i915
bitblit                 4548  1 fbcon
i2c_algo_bit            5055  1 i915
softcursor              1285  1 bitblit
cfbcopyarea             3129  1 i915
font                    7885  1 fbcon
drm_kms_helper         28642  1 i915
drm                   218563  2 i915,drm_kms_helper
iwlwifi               232294  0 
option                 20769  1 
usb_wwan               11266  1 option
usbserial              34653  4 option,usb_wwan
fb                     40138  7 i915,fbcon,drm_kms_helper,softcursor,bitblit
fbdev                    921  2 fb,fbcon
intel_agp              11204  1 i915
intel_gtt              14038  3 i915,intel_agp
Comment 15 Chris Wilson 2012-11-26 15:01:51 UTC
We went off in a tangent because the intel_drv.so is corrupt and so caused the xf86-video-fbdev driver to be used instead with a completely different set of behaviour and bugs.
Comment 16 Chris Wilson 2012-11-26 15:50:58 UTC
commit 1e06d19a00f5a5a05369deeb3c5ae15b282c0f92
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 26 15:30:09 2012 +0000

    sna: Disable shadow tracking upon regen
Comment 17 Martin Mokrejs 2012-12-03 10:24:47 UTC
(In reply to comment #16)
> commit 1e06d19a00f5a5a05369deeb3c5ae15b282c0f92
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Nov 26 15:30:09 2012 +0000
> 
>     sna: Disable shadow tracking upon regen

Confirming the patch helps when applied to 2.20.13 which I had so far. Will attach an Xorg.0.log from the running instance.

# emerge -pv xf86-video-intel

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild     U  ] x11-drivers/xf86-video-intel-2.20.14 [2.20.13] USE="dri sna udev -glamor -uxa -xvmc" 1,625 kB

# grep VIDEO /etc/make.conf
VIDEO_CARDS="vesa vmware fbdev intel i915 i965 nouveau"
#

Per answer from Chris should appear in 2.20.15.
Comment 18 Martin Mokrejs 2012-12-03 10:25:36 UTC
Created attachment 70962 [details]
Xorg.0.log (working, patched driver)
Comment 19 Martin Mokrejs 2012-12-10 09:08:16 UTC
I was puzzled initially why I had to reboot my laptop sometimes to be able to login through xdm even after the patch "sna: Disable shadow tracking upon regen". I still had issues last week and kept starting up X manually from a user account through 'startx'. I tried to patch 2.20.14, recompiled xorg-server, then 2.20.15 was out but that also seemed broken.

I am re-opening the issue and providing X server log and new gdb for 2.20.15.
Comment 20 Martin Mokrejs 2012-12-10 09:09:25 UTC
Created attachment 71258 [details]
Xorg.0.log (still 2.20.15 crashing)
Comment 21 Martin Mokrejs 2012-12-10 09:10:36 UTC
Created attachment 71259 [details]
gdb stacktrace (xorg-server-1.13.0-r1, xf86-video-intel-2.20.15)
Comment 22 Chris Wilson 2012-12-10 09:53:48 UTC
Have you tried xorg-1.12?
Comment 23 Martin Mokrejs 2012-12-10 10:19:06 UTC
(In reply to comment #22)
> Have you tried xorg-1.12?

Actually the problem appeared for me at larger system-wide upgrade, so ... now after I downgraded to 1.12.4 and recompiled all x11-drivers I can login through xdm (no recompilation of x11 libs, apps like xdm etc.).
Comment 24 Martin Mokrejs 2012-12-10 10:21:51 UTC
Created attachment 71260 [details]
Xorg.0.log (working, 1.12.4 server, xf86-video-intel-2.20.15)

I am attaching logfile of the running instance of now downgraded 1.12.4 xorg-server.
Comment 25 Martin Mokrejs 2012-12-11 11:24:00 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > Have you tried xorg-1.12?
> 
> Actually the problem appeared for me at larger system-wide upgrade, so ...
> now after I downgraded to 1.12.4 and recompiled all x11-drivers I can login
> through xdm (no recompilation of x11 libs, apps like xdm etc.).

No. After a fresh bootup even with xorg-server-1.12.4 I have same xdm crash. Will attach stacktrace and log. I really don't know what is the cause.
Comment 26 Martin Mokrejs 2012-12-11 11:26:06 UTC
Created attachment 71328 [details]
Xorg.0.log (crash, xorg-server-1.12.4+xf86-video-intel-2.20.15)

Logfile of 1.12.4 server crash with xf86-video-intel-2.20.15
Comment 27 Martin Mokrejs 2012-12-11 11:26:50 UTC
Created attachment 71329 [details]
gdb stacktrace (xorg-server-1.12.4, xf86-video-intel-2.20.15)
Comment 28 Chris Wilson 2013-02-10 14:58:09 UTC
Did you ever try valgrind to see if that helps identify the cause? Otherwise it looks peculiar to your setup.
Comment 29 Martin Mokrejs 2013-02-20 15:29:53 UTC
(In reply to comment #28)
> Did you ever try valgrind to see if that helps identify the cause? Otherwise
> it looks peculiar to your setup.

No, I would need a guidance what todo and how to use valgrind. What is puzzling me that still, sometimes I get a crash but sometimes I do not (same kernel, same apps/libs, just another fresh boot or xdm restart). Your never explained whether that is related to the automatic configuration (I have no xorg.conf file). The only thing I ever saw that the order of drivers loaded during startup is different (from comparing Xorg.log.0 files when things worked versus when the server crashed).
Comment 30 Chris Wilson 2013-02-20 15:37:59 UTC
Hmm, the order in which the drivers load should be stable on a system already up and running. But at any rate, it should be immaterial...

To run X under valgrind, first make sure you have the complete set of debugging symbols. Then:

# cp /usr/bin/Xorg /tmp/Xorg # remove the setuid bit
# valgrind --trace-chilren=yes /usr/bin/Xorg -ac

$ DISPLAY=:0 gnome-session # or your favourite DE
Comment 31 Chris Wilson 2013-02-28 12:09:42 UTC
*** Bug 61613 has been marked as a duplicate of this bug. ***
Comment 32 Alexandre 2013-02-28 12:28:33 UTC
Created attachment 75688 [details]
Valgrind log from a session run

I had simulated the crash again, this time with valgrind on it.
Comment 33 Chris Wilson 2013-02-28 13:07:03 UTC
(In reply to comment #32)
> Created attachment 75688 [details]
> Valgrind log from a session run
> 
> I had simulated the crash again, this time with valgrind on it.

Hmm. Can you try again with an unstripped Xorg (i.e. so that we can see the debug symbols and the line numbers were the dangling pointers were freed)? You can also compile with xf86-video-intel with --enable-debug to reduce the number of false positives from valgirnd.
Comment 34 Alexandre 2013-02-28 19:01:01 UTC
Created attachment 75705 [details]
Valgrind on the non-stripped version Xorg binary
Comment 35 Alexandre 2013-02-28 19:02:20 UTC
(In reply to comment #33)
> 
> Hmm. Can you try again with an unstripped Xorg (i.e. so that we can see the
> debug symbols and the line numbers were the dangling pointers were freed)?
> You can also compile with xf86-video-intel with --enable-debug to reduce the
> number of false positives from valgirnd.

Done, had to compile the server just to get the symbols, hope it was worthy :P

I didn't recompile the xf86-video-intel with --enable-debug, though.
Comment 36 Chris Wilson 2013-02-28 21:26:11 UTC
Alexandre, can I ask you do one last task: recompile xf86-video-intel with --enable-debug=full and attach the Xorg.log?
Comment 37 Alexandre 2013-03-01 11:24:37 UTC
Yep, I'll do that.
Comment 38 Alexandre 2013-03-01 11:41:40 UTC
Created attachment 75734 [details]
Xorg.log from the --enable-debug=full scenario
Comment 39 Alexandre 2013-03-01 11:42:30 UTC
Created attachment 75735 [details]
valgrind log from the latest run
Comment 40 Chris Wilson 2013-03-01 13:52:02 UTC
The only way I can see it crash like that is if ScrnInfo->pScreen is stale. I've reviewed all the code and cannot see how that would be possible. Anyway, I've thrown in a couple more assertions to test that hypothesis:

commit d4164de5ccb82068e2858a90b2cd44eef82b6037
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 1 12:13:47 2013 +0000

    sna: Assert that the ScrnInfo and ScreenPtr relationship is correct
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=56608

Can you please try to reproduce again with --enable-debug[=full] and see it hits those assertions?
Comment 41 Alexandre 2013-03-01 14:58:33 UTC
Created attachment 75742 [details]
Xorg.log from the d4164de5ccb82068e2858a90b2cd44eef82b6037 revision

This is from the extra asserts added by Chris.

It seems they weren't hit.
Comment 42 Alexandre 2013-03-01 15:00:46 UTC
Chris,

I don't poke around in X server code for quite a few years. Last time it was on xfree86 on an i486, so please enlight me: is Xorg (or xf86-video-intel driver) multi-threaded these days? Because this behavior seems consistent with multiple threads loosing sync.
Comment 43 Chris Wilson 2013-03-01 15:38:59 UTC
 > I don't poke around in X server code for quite a few years. Last time it was
> on xfree86 on an i486, so please enlight me: is Xorg (or xf86-video-intel
> driver) multi-threaded these days? Because this behavior seems consistent
> with multiple threads loosing sync.

There are the occasional use of threads, but the phase you are in is explicitly single-threaded (teardown/setup). Yes, the symptoms would appear to be the same as in both cases we are chasing stale pointers.

Another theory to test:

commit 904af323914e05830f17621a78b6e55b371ae5fc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 1 15:34:45 2013 +0000

    sna: Assert that we do not resurrect stale pixmap across a server regen
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=56608
Comment 44 Alexandre 2013-03-01 15:51:57 UTC
Chris,

It exited w/ this assertion:


Xorg: sna_driver.c:924: sna_screen_init: Assertion `sna->freed_pixmap ==
((void *)0)' failed.
Comment 45 Chris Wilson 2013-03-01 16:02:12 UTC
And now that I know what I am looking for, the lack of sna_late_close_screen() in your logs is now obvious!

commit fef6cdae9ebd217843bab2f65c87b59f8a9f782e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 1 15:58:42 2013 +0000

    sna: Chain up CloseScreen
    
    Remember to call into the chained CloseScreen destructors!
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56608
Comment 46 Alexandre 2013-03-01 16:06:16 UTC
Works for me! Thanks! :D
Comment 47 Martin Mokrejs 2013-07-30 20:41:41 UTC
The whole year I used to live with logging in as a user and starting up startx manually.

I tried to whether I could finally loging in through xdm window like years before. No, still not. I upgraded from x11-drivers/xf86-video-intel-2.21.6 to x11-drivers/xf86-video-intel-2.21.12 . I do NOT get core dumps anymore but still, my xdm window disappears after entering the Password[enter] and reappears back again with Login. I am not arguing the issue (core dumps) is fixed or not.

So, what I want to report here is that at least, startx gives now some more interesting messages (to its STDOUT and STDERR, so they are in the VT console):


<quote>
(xfce4-session:3876): GLib-WARNING **: GError set over the top of a previous GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
The overwriting error message was: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name

** (xfce4-session:3876): CRITICAL **: dbus_set_g_error: assertion `gerror == NULL || *gerror == NULL' failed
xfce4-session: Querying suspend failed: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name

xinit: connection to X server lost

waiting for X server to shut down .Server terminated successfully (0). Closing log file.
</quote>

I will attach this as a full file.

Second, I see that at least at the very moment, I do NOT have installed consolekit. I fear that X/xdm/pam/whoever is looking for consolekit and because of that /var/log/xdm.log says:

<quote>
xdm info (pid 3255): sourcing /usr/lib64/X11/xdm/TakeConsole
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 286 requests (209 known processed) with 0 events remaining.
xdm info (pid 3202): Starting X server on :0

...
xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/Xsetup_0
xdm error (pid 3553): pam_authenticate failure: Error in service module
xdm error (pid 3553): pam_authenticate failure: Error in service module
xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/GiveConsole
xdm info (pid 3682): executing session /usr/lib64/X11/xdm/Xsession
xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/TakeConsole
xdm info (pid 3202): Starting X server on :0
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 118 requests (111 known processed) with 0 events remaining.
</quote>


Here is what the package handling tool on Gentoo asks me to do now:

<quote>
[ebuild     U  ] xfce-base/libxfce4util-4.10.1 [4.10.0]
[ebuild  N     ] x11-misc/xscreensaver-5.21  USE="jpeg opengl pam perl -gdm -new-login (-selinux) -suid -xinerama"
[ebuild  N     ] sys-auth/polkit-0.111  USE="gtk introspection nls pam -examples -kde (-selinux) -systemd"
[ebuild  N     ] gnome-extra/polkit-gnome-0.105
[ebuild  N     ] sys-auth/consolekit-0.4.5_p20120320-r2  USE="acl pam policykit -debug -doc (-selinux) {-test}"
[ebuild     U  ] sys-auth/pambase-20120417-r2 [20120417-r1] USE="consolekit*"
[ebuild  N     ] sys-power/upower-0.9.20-r2  USE="deprecated introspection -doc -ios -systemd"
[ebuild     U  ] xfce-base/xfce4-session-4.10.1 [4.10.0-r1] USE="-systemd%"

The following USE changes are necessary to proceed:
 (see "package.use" in the portage(5) man page for more details)
# required by sys-auth/polkit-0.111[-systemd,pam]
# required by sys-auth/consolekit-0.4.5_p20120320-r2[policykit]
>=sys-auth/pambase-20120417-r2 consolekit
# required by sys-auth/polkit-0.111[-systemd]
# required by sys-power/upower-0.9.20-r2
# required by xfce-base/xfce4-session-4.10.1[udev]
# required by @selected
# required by @world (argument)
>=sys-auth/consolekit-0.4.5_p20120320-r2 policykit
</quote>

So, could anything be done so that xdm would NOT even start if it cannot use PAM/consolekit? Or at least give a useful error message back to the XDM login window? Or xorg-server could complain during its startup?


Possible similar issues:
http://lists.freebsd.org/pipermail/freebsd-xfce/2012-November/000599.html
http://lists.freebsd.org/pipermail/freebsd-questions/2013-June/251744.html
https://bbs.archlinux.org/viewtopic.php?id=132922
Comment 48 Martin Mokrejs 2013-07-30 20:42:50 UTC
Created attachment 83332 [details]
startx.log
Comment 49 Martin Mokrejs 2013-07-30 20:43:50 UTC
Created attachment 83333 [details]
.xsession-errors

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.