Description
Martin Mokrejs
2012-10-31 10:13:04 UTC
Created attachment 69344 [details]
Xorg.0.log
Created attachment 69345 [details]
gdb stacktrace (xorg-server-1.13.0, xf86-video-intel-2.20.12)
Created attachment 69346 [details]
.config
This is an Intel i7-based laptop with just the builtin graphics card form Intel. I suspect this could be related to a HW misconfiguration because it seems to have a framebuffer on console I have to have all of these enabled or disabled (as shown, notably must have the KMS enabled) :-((:
CONFIG_AGP_INTEL=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=2
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_TTM=m
# CONFIG_DRM_NOUVEAU is not set
# CONFIG_DRM_I810 is not set
CONFIG_DRM_I915=y
CONFIG_DRM_I915_KMS=y
# CONFIG_DRM_GMA500 is not set
# CONFIG_STUB_POULSBO is not set
Can't see a reason for this, everything looks like it should be reintialiased in the right order. Can you try compiling with --enable-debug=full and attaching the voluminous logfile after the crash? Created attachment 69390 [details] Xorg.0.log with --enable-debug=full I also added "-DDEBUG -DNDEBUG" CFLAGS as in https://bugs.gentoo.org/show_bug.cgi?id=256034. I don't see the Xorg.log.0 file to be much longer but I hope it helps you. I don't see any difference in gdb stacktrace from the core dump file generated by the DEBUG binary. No kernel messages in dmesg as before, but looks I overlooked that I have after each xdm login in /var/log/messages: Nov 1 11:34:04 vostro xdm[3086]: pam_lastlog(xdm:session): conversation failed Nov 1 11:34:04 vostro xdm[3086]: pam_unix(xdm:session): session opened for user mmokrejs by (uid=0) Nov 1 11:34:04 vostro xdm[3086]: pam_ck_connector(xdm:session): nox11 mode, ignoring PAM_TTY :0 Nov 1 11:34:04 vostro xdm[3086]: pam_motd(xdm:session): conversation failed Nov 1 11:34:04 vostro xdm[3086]: pam_unix(xdm:session): session closed for user mmokrejs Nov 1 11:34:04 vostro acpid: client 3080[0:0] has disconnected Nov 1 11:34:04 vostro acpid: client connected from 3080[0:0] Nov 1 11:34:04 vostro acpid: 1 client rule loaded Nov 1 11:34:05 vostro acpid: client 3080[0:0] has disconnected Nov 1 11:34:05 vostro acpid: client connected from 3147[0:0] Nov 1 11:34:05 vostro acpid: 1 client rule loaded Apologies, I was referring to -intel. But that did add some useful information to the Xorg.log as well. Can I have both? And a pony? Created attachment 69391 [details]
Xorg.0.log (no intel driver, only fbdev)
So I uninstalled xf86-video-intel package and re-compiled server to be sure does not look for the driver (was probably not necessary):
root# VIDEO_CARDS="vesa vmware fbdev" USE=debug CFLAGS="-ggdb -pipe -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx -maes -march=native" CXXFLAGS="${CFLAGS}" emerge xorg-server
For completeness, let me emphasize that I had so far:
root# grep VIDEO_CARDS /etc/make.conf
VIDEO_CARDS="vesa intel i915 vmware fbdev i965 nouveau"
root#
One more comment regarding the "Xorg.0.log (no intel driver, only fbdev)". A successful login into xdm as before restarts X but the Xorg.0.log file is not renamed. I believe this confirms that X did NOT crash. So maybe I really have two issues? One with intel driver and the other with PAM module? Then some new error messages in Xorg.0.log should be introduced. As expected, the fbdev log demonstrates that the crash is due to some interaction between xf86-video-intel and Xorg. I don't believe you have any other bug here yet... Created attachment 69508 [details]
Xorg.0.log
I still do NOT understand what is going on. I tried downgrade to previous xorg-server I used to have (1.12.3), to previous xf86-video-intel-2.20.9, many mesa and other x11-drivers/ packages. No luck although once xdm login worked for me after a fresh bootup using an older kernel. I thought maybe once an xdm-related crash happens something in the kernel breaks and no subsequent attemps could succeed. But this does NOT look to be the case.
I still wonder why I can start as a normal user my X session, also with DRI (confirmed by glxinfo), both via startx and xinit. So the only way when I get the crash is through xdm login screen -> correct username/password -> enter -> blink to console - crash -> new xdm login window.
Created attachment 69509 [details]
xdm.log
Created attachment 69510 [details]
Xorg.0.log
So, now I tried another, older kernel, 3.4.17, rebooted, during bootup the external HDMI display was NOT detected in console fb console mode. xdm login window started up. I unplugged the HDMI cable form the external LCD panel, and re-plugged it in. Now the screen came also on the external display. But hey, I could login using xdm, no crash! I am attaching the logfile. It shows that maybe some "stale" device for video card was removed from xfce4 ^H^H^H^H^H Xserver settings? Or was that related to the external monitor being unplugged?
For completeness, glxinfo shows DRI works, so I am happy but who knows if on next restart will the problem re-appear? ;)
Created attachment 69511 [details]
xdm.log
And xdm.log for the running instance.
Created attachment 69512 [details]
Xorg.0.log with '-7 SELECTIONs'
So, with more tests using 3.4.17 kernel I think I can conclude that the "xdm issue" I have has to do with some broken exit status of some module. That is something different then the core dump, of course. Upon fresh bootup I can login using xdm. It starts xfce4-session in my case. After logout, I get back the login screen but all attempts to login fail. If I look into the xorg.0.log files I see
[ 60.153] -7 SELECTIONs still allocated at reset
[ 60.153] WINDOW: 0 objects of 40 bytes = 0 total bytes 0 private allocs
I recompile drm/agpgart/fbcon drivers as modules but I cannot unload them to test which one is being left in a wrong state. Would you help?
# lsmod
Module Size Used by
ppp_async 6113 1
ppp_generic 22288 5 ppp_async
slhc 4898 1 ppp_generic
i915 361339 2
fbcon 33482 76
cfbfillrect 3125 1 i915
cfbimgblt 2164 1 i915
bitblit 4548 1 fbcon
i2c_algo_bit 5055 1 i915
softcursor 1285 1 bitblit
cfbcopyarea 3129 1 i915
font 7885 1 fbcon
drm_kms_helper 28642 1 i915
drm 218563 2 i915,drm_kms_helper
iwlwifi 232294 0
option 20769 1
usb_wwan 11266 1 option
usbserial 34653 4 option,usb_wwan
fb 40138 7 i915,fbcon,drm_kms_helper,softcursor,bitblit
fbdev 921 2 fb,fbcon
intel_agp 11204 1 i915
intel_gtt 14038 3 i915,intel_agp
We went off in a tangent because the intel_drv.so is corrupt and so caused the xf86-video-fbdev driver to be used instead with a completely different set of behaviour and bugs. commit 1e06d19a00f5a5a05369deeb3c5ae15b282c0f92 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 26 15:30:09 2012 +0000 sna: Disable shadow tracking upon regen (In reply to comment #16) > commit 1e06d19a00f5a5a05369deeb3c5ae15b282c0f92 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Nov 26 15:30:09 2012 +0000 > > sna: Disable shadow tracking upon regen Confirming the patch helps when applied to 2.20.13 which I had so far. Will attach an Xorg.0.log from the running instance. # emerge -pv xf86-video-intel These are the packages that would be merged, in order: Calculating dependencies... done! [ebuild U ] x11-drivers/xf86-video-intel-2.20.14 [2.20.13] USE="dri sna udev -glamor -uxa -xvmc" 1,625 kB # grep VIDEO /etc/make.conf VIDEO_CARDS="vesa vmware fbdev intel i915 i965 nouveau" # Per answer from Chris should appear in 2.20.15. Created attachment 70962 [details]
Xorg.0.log (working, patched driver)
I was puzzled initially why I had to reboot my laptop sometimes to be able to login through xdm even after the patch "sna: Disable shadow tracking upon regen". I still had issues last week and kept starting up X manually from a user account through 'startx'. I tried to patch 2.20.14, recompiled xorg-server, then 2.20.15 was out but that also seemed broken. I am re-opening the issue and providing X server log and new gdb for 2.20.15. Created attachment 71258 [details]
Xorg.0.log (still 2.20.15 crashing)
Created attachment 71259 [details]
gdb stacktrace (xorg-server-1.13.0-r1, xf86-video-intel-2.20.15)
Have you tried xorg-1.12? (In reply to comment #22) > Have you tried xorg-1.12? Actually the problem appeared for me at larger system-wide upgrade, so ... now after I downgraded to 1.12.4 and recompiled all x11-drivers I can login through xdm (no recompilation of x11 libs, apps like xdm etc.). Created attachment 71260 [details]
Xorg.0.log (working, 1.12.4 server, xf86-video-intel-2.20.15)
I am attaching logfile of the running instance of now downgraded 1.12.4 xorg-server.
(In reply to comment #23) > (In reply to comment #22) > > Have you tried xorg-1.12? > > Actually the problem appeared for me at larger system-wide upgrade, so ... > now after I downgraded to 1.12.4 and recompiled all x11-drivers I can login > through xdm (no recompilation of x11 libs, apps like xdm etc.). No. After a fresh bootup even with xorg-server-1.12.4 I have same xdm crash. Will attach stacktrace and log. I really don't know what is the cause. Created attachment 71328 [details]
Xorg.0.log (crash, xorg-server-1.12.4+xf86-video-intel-2.20.15)
Logfile of 1.12.4 server crash with xf86-video-intel-2.20.15
Created attachment 71329 [details]
gdb stacktrace (xorg-server-1.12.4, xf86-video-intel-2.20.15)
Did you ever try valgrind to see if that helps identify the cause? Otherwise it looks peculiar to your setup. (In reply to comment #28) > Did you ever try valgrind to see if that helps identify the cause? Otherwise > it looks peculiar to your setup. No, I would need a guidance what todo and how to use valgrind. What is puzzling me that still, sometimes I get a crash but sometimes I do not (same kernel, same apps/libs, just another fresh boot or xdm restart). Your never explained whether that is related to the automatic configuration (I have no xorg.conf file). The only thing I ever saw that the order of drivers loaded during startup is different (from comparing Xorg.log.0 files when things worked versus when the server crashed). Hmm, the order in which the drivers load should be stable on a system already up and running. But at any rate, it should be immaterial... To run X under valgrind, first make sure you have the complete set of debugging symbols. Then: # cp /usr/bin/Xorg /tmp/Xorg # remove the setuid bit # valgrind --trace-chilren=yes /usr/bin/Xorg -ac $ DISPLAY=:0 gnome-session # or your favourite DE *** Bug 61613 has been marked as a duplicate of this bug. *** Created attachment 75688 [details]
Valgrind log from a session run
I had simulated the crash again, this time with valgrind on it.
(In reply to comment #32) > Created attachment 75688 [details] > Valgrind log from a session run > > I had simulated the crash again, this time with valgrind on it. Hmm. Can you try again with an unstripped Xorg (i.e. so that we can see the debug symbols and the line numbers were the dangling pointers were freed)? You can also compile with xf86-video-intel with --enable-debug to reduce the number of false positives from valgirnd. Created attachment 75705 [details]
Valgrind on the non-stripped version Xorg binary
(In reply to comment #33) > > Hmm. Can you try again with an unstripped Xorg (i.e. so that we can see the > debug symbols and the line numbers were the dangling pointers were freed)? > You can also compile with xf86-video-intel with --enable-debug to reduce the > number of false positives from valgirnd. Done, had to compile the server just to get the symbols, hope it was worthy :P I didn't recompile the xf86-video-intel with --enable-debug, though. Alexandre, can I ask you do one last task: recompile xf86-video-intel with --enable-debug=full and attach the Xorg.log? Yep, I'll do that. Created attachment 75734 [details]
Xorg.log from the --enable-debug=full scenario
Created attachment 75735 [details]
valgrind log from the latest run
The only way I can see it crash like that is if ScrnInfo->pScreen is stale. I've reviewed all the code and cannot see how that would be possible. Anyway, I've thrown in a couple more assertions to test that hypothesis: commit d4164de5ccb82068e2858a90b2cd44eef82b6037 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 1 12:13:47 2013 +0000 sna: Assert that the ScrnInfo and ScreenPtr relationship is correct References: https://bugs.freedesktop.org/show_bug.cgi?id=56608 Can you please try to reproduce again with --enable-debug[=full] and see it hits those assertions? Created attachment 75742 [details]
Xorg.log from the d4164de5ccb82068e2858a90b2cd44eef82b6037 revision
This is from the extra asserts added by Chris.
It seems they weren't hit.
Chris, I don't poke around in X server code for quite a few years. Last time it was on xfree86 on an i486, so please enlight me: is Xorg (or xf86-video-intel driver) multi-threaded these days? Because this behavior seems consistent with multiple threads loosing sync. > I don't poke around in X server code for quite a few years. Last time it was > on xfree86 on an i486, so please enlight me: is Xorg (or xf86-video-intel > driver) multi-threaded these days? Because this behavior seems consistent > with multiple threads loosing sync. There are the occasional use of threads, but the phase you are in is explicitly single-threaded (teardown/setup). Yes, the symptoms would appear to be the same as in both cases we are chasing stale pointers. Another theory to test: commit 904af323914e05830f17621a78b6e55b371ae5fc Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 1 15:34:45 2013 +0000 sna: Assert that we do not resurrect stale pixmap across a server regen References: https://bugs.freedesktop.org/show_bug.cgi?id=56608 Chris, It exited w/ this assertion: Xorg: sna_driver.c:924: sna_screen_init: Assertion `sna->freed_pixmap == ((void *)0)' failed. And now that I know what I am looking for, the lack of sna_late_close_screen() in your logs is now obvious! commit fef6cdae9ebd217843bab2f65c87b59f8a9f782e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 1 15:58:42 2013 +0000 sna: Chain up CloseScreen Remember to call into the chained CloseScreen destructors! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56608 Works for me! Thanks! :D The whole year I used to live with logging in as a user and starting up startx manually. I tried to whether I could finally loging in through xdm window like years before. No, still not. I upgraded from x11-drivers/xf86-video-intel-2.21.6 to x11-drivers/xf86-video-intel-2.21.12 . I do NOT get core dumps anymore but still, my xdm window disappears after entering the Password[enter] and reappears back again with Login. I am not arguing the issue (core dumps) is fixed or not. So, what I want to report here is that at least, startx gives now some more interesting messages (to its STDOUT and STDERR, so they are in the VT console): <quote> (xfce4-session:3876): GLib-WARNING **: GError set over the top of a previous GError or uninitialized memory. This indicates a bug in someone's code. You must ensure an error is NULL before it's set. The overwriting error message was: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name ** (xfce4-session:3876): CRITICAL **: dbus_set_g_error: assertion `gerror == NULL || *gerror == NULL' failed xfce4-session: Querying suspend failed: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name xinit: connection to X server lost waiting for X server to shut down .Server terminated successfully (0). Closing log file. </quote> I will attach this as a full file. Second, I see that at least at the very moment, I do NOT have installed consolekit. I fear that X/xdm/pam/whoever is looking for consolekit and because of that /var/log/xdm.log says: <quote> xdm info (pid 3255): sourcing /usr/lib64/X11/xdm/TakeConsole XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 286 requests (209 known processed) with 0 events remaining. xdm info (pid 3202): Starting X server on :0 ... xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/Xsetup_0 xdm error (pid 3553): pam_authenticate failure: Error in service module xdm error (pid 3553): pam_authenticate failure: Error in service module xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/GiveConsole xdm info (pid 3682): executing session /usr/lib64/X11/xdm/Xsession xdm info (pid 3553): sourcing /usr/lib64/X11/xdm/TakeConsole xdm info (pid 3202): Starting X server on :0 XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 118 requests (111 known processed) with 0 events remaining. </quote> Here is what the package handling tool on Gentoo asks me to do now: <quote> [ebuild U ] xfce-base/libxfce4util-4.10.1 [4.10.0] [ebuild N ] x11-misc/xscreensaver-5.21 USE="jpeg opengl pam perl -gdm -new-login (-selinux) -suid -xinerama" [ebuild N ] sys-auth/polkit-0.111 USE="gtk introspection nls pam -examples -kde (-selinux) -systemd" [ebuild N ] gnome-extra/polkit-gnome-0.105 [ebuild N ] sys-auth/consolekit-0.4.5_p20120320-r2 USE="acl pam policykit -debug -doc (-selinux) {-test}" [ebuild U ] sys-auth/pambase-20120417-r2 [20120417-r1] USE="consolekit*" [ebuild N ] sys-power/upower-0.9.20-r2 USE="deprecated introspection -doc -ios -systemd" [ebuild U ] xfce-base/xfce4-session-4.10.1 [4.10.0-r1] USE="-systemd%" The following USE changes are necessary to proceed: (see "package.use" in the portage(5) man page for more details) # required by sys-auth/polkit-0.111[-systemd,pam] # required by sys-auth/consolekit-0.4.5_p20120320-r2[policykit] >=sys-auth/pambase-20120417-r2 consolekit # required by sys-auth/polkit-0.111[-systemd] # required by sys-power/upower-0.9.20-r2 # required by xfce-base/xfce4-session-4.10.1[udev] # required by @selected # required by @world (argument) >=sys-auth/consolekit-0.4.5_p20120320-r2 policykit </quote> So, could anything be done so that xdm would NOT even start if it cannot use PAM/consolekit? Or at least give a useful error message back to the XDM login window? Or xorg-server could complain during its startup? Possible similar issues: http://lists.freebsd.org/pipermail/freebsd-xfce/2012-November/000599.html http://lists.freebsd.org/pipermail/freebsd-questions/2013-June/251744.html https://bbs.archlinux.org/viewtopic.php?id=132922 Created attachment 83332 [details]
startx.log
Created attachment 83333 [details]
.xsession-errors
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.