Bug 49211 - v0.6.16 and later cause gdm to hang on start
v0.6.16 and later cause gdm to hang on start
Status: RESOLVED NOTOURBUG
Product: accountsservice
Classification: Unclassified
Component: general
unspecified
All Linux (All)
: medium major
Assigned To: Matthias Clasen
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-27 03:09 UTC by Mantas Mikulėnas
Modified: 2013-01-02 14:49 UTC (History)
6 users (show)

See Also:


Attachments
/var/log/gdm/:0-slave.log in debug mode (21.30 KB, text/plain)
2012-05-11 07:10 UTC, Mantas Mikulėnas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mantas Mikulėnas 2012-04-27 03:09:21 UTC
After upgrading to accountsservice 0.6.18, `gdm` hangs at black screen before showing the user list, and I'm getting the following in syslog:

accounts-daemon[14618]: (accounts-daemon:14618): GLib-GObject-WARNING **: g_object_notify: object class `User' has no property named `x-session'

accountsservice 0.6.18
(accountsservice 0.6.15 does not have this problem.)
gdm 3.4.1-2
Comment 1 Mantas Mikulėnas 2012-05-04 13:31:02 UTC
The GLib warning message is apparently irrelevant -- it has been fixed in git, but the hang problem persists even in 0.6.20. Playing around with `git bisect` points at commits 5fc9a51 and 2c51bd8 (conversion to GDBus). Another user has suggested that this is caused by systemd-logind...

consolekit 0.4.6
gdm 3.4.1
systemd-git a26336da
Comment 2 Ray Strode [halfline] 2012-05-07 07:27:10 UTC
what's pam configuration look like in gdm-welcome ?  it's probably some sort of gdm/pam issue.  Looking at the GDM logs would be useful, but maybe we should do this in gnome bugzilla ?
Comment 3 Mantas Mikulėnas 2012-05-09 02:14:26 UTC
(In reply to comment #2)
> what's pam configuration look like in gdm-welcome ?  it's probably some sort of
> gdm/pam issue.

It's fairly simple:

<<EOF

#%PAM-1.0
auth		required	pam_env.so
auth		required	pam_permit.so
account		required	pam_nologin.so
account		required	pam_unix.so
session		required	pam_loginuid.so
session		required	pam_keyinit.so		force revoke
-session	optional	pam_systemd.so		kill-session-processes=1
-session	optional	pam_ck_connector.so
password	required	pam_deny.so
EOF


> Looking at the GDM logs would be useful, but maybe we should do
> this in gnome bugzilla ?

I could resubmit the bug report, but I'm still not sure it is a GDM problem.
Comment 4 Ray Strode [halfline] 2012-05-09 07:46:46 UTC
Well, I guess if you can narrow it down to specific accountsservice commits then it must be in some way accountsservice related.

Can you attach /var/log/gdm/\:0-slave.log  and /var/log/gdm/\:0-greeter.log ?
Comment 5 Mantas Mikulėnas 2012-05-09 11:36:15 UTC
(In reply to comment #4)
> Well, I guess if you can narrow it down to specific accountsservice commits
> then it must be in some way accountsservice related.
> 
> Can you attach /var/log/gdm/\:0-slave.log  and /var/log/gdm/\:0-greeter.log ?

:0-slave.log is empty.
:0-greeter.log does not exist; no greeter processes seem to be started. Only the following are running:

12460 /usr/sbin/gdm-binary -nodaemon
12465 /usr/lib/gdm/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1
12467 /usr/bin/Xorg :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-66Ho1I/database -nolisten tcp vt7
12490 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
12491 gdm-session-worker [pam/gdm-welcome]
Comment 6 Ray Strode [halfline] 2012-05-09 12:10:01 UTC
you should get more gdm messages if you set Enable=true in the [debug] section of /etc/gdm/custom.conf
Comment 7 Mantas Mikulėnas 2012-05-11 07:10:25 UTC
Created attachment 61453 [details]
/var/log/gdm/:0-slave.log in debug mode

Log attached.

---

I don't know how much the GetAll() errors are related to this, but just in case, I tried calling the method from command line and it appears to work:

^ gdbus call -y -d org.freedesktop.Accounts -o /org/freedesktop/Accounts/User120 -m org.freedesktop.DBus.Properties.GetAll 'org.freedesktop.Accounts.User'
({'Uid': <uint64 120>, 'UserName': <'gdm'>, 'RealName': <'Gnome Display Manager'>, 'AccountType': <0>, 'HomeDirectory': <'/var/lib/gdm'>, 'Shell': <'/sbin/nologin'>, 'Email': <''>, 'Language': <''>, 'XSession': <''>, 'Location': <''>, 'LoginFrequency': <uint64 0>, 'IconFile': <'/var/lib/gdm/.face'>, 'Locked': <true>, 'PasswordMode': <0>, 'PasswordHint': <''>, 'AutomaticLogin': <false>, 'SystemAccount': <true>},)
Comment 8 Ray Strode [halfline] 2012-05-11 09:14:50 UTC
ah you're getting hit by:

http://cgit.freedesktop.org/accountsservice/commit/?id=52544a619102ca9f7ea4ef633271dc935f5d47fd

need to update to 0.6.20
Comment 9 Mantas Mikulėnas 2012-05-11 10:47:08 UTC
(In reply to comment #8)
> ah you're getting hit by:
> 
> http://cgit.freedesktop.org/accountsservice/commit/?id=52544a619102ca9f7ea4ef633271dc935f5d47fd
> 
> need to update to 0.6.20

That's strange – I *am* running 0.6.20...
Comment 10 Ray Strode [halfline] 2012-05-11 11:29:46 UTC
Oh, I guess the error message still happens, but it's not fatal in 0.6.20.

Okay, so looking closer I see this:
attempting to change state to SESSION_OPENED
GdmSessionWorker: uninitializing PAM
Emitting 'session-open-failed' signal

So that means the session stack is failing.  You have two required lines in your pam configuration:

session        required    pam_loginuid.so
session        required    pam_keyinit.so        force revoke

pam_loginuid seems like the most likely candidate.

Did you boot a kernel that lacks loginuid support?

if you log in at a terminal, what does cat /proc/self/loginuid say?
Comment 11 Mantas Mikulėnas 2012-05-11 11:45:26 UTC
(In reply to comment #10)
> Oh, I guess the error message still happens, but it's not fatal in 0.6.20.
> 
> Okay, so looking closer I see this:
> attempting to change state to SESSION_OPENED
> GdmSessionWorker: uninitializing PAM
> Emitting 'session-open-failed' signal
> 
> So that means the session stack is failing.  You have two required lines in
> your pam configuration:
> 
> session        required    pam_loginuid.so
> session        required    pam_keyinit.so        force revoke
> 
> pam_loginuid seems like the most likely candidate.
> 
> Did you boot a kernel that lacks loginuid support?

> if you log in at a terminal, what does cat /proc/self/loginuid say?

No, all Arch Linux kernels have loginuid and keyrings enabled, and these features are used by systemd-logind, consolekit, ecryptfs, etc.

    CONFIG_AUDIT=y
    CONFIG_KEYS=y

Right now, /proc/self/loginuid contains "1000", `keyctl show` shows my keyring with ecryptfs keys.
Comment 12 Ray Strode [halfline] 2012-05-11 12:00:55 UTC
is auditd running?

You might try adding *.* /var/log/all_messages to /etc/rsyslog.conf (or equivalent), adding "debug" to the two pam modules command lines and seeing if /var/log/all_messages has more useful information.
Comment 13 Mantas Mikulėnas 2012-05-11 12:01:34 UTC
Well, this is interesting – it seems that pam_keyinit.so causes the failure,  I changed the pam_keyinit line to "optional" and GDM started working:

    session optional pam_keyinit.so force revoke debug

However, pam_keyinit.so now becomes ineffective -- both for the GDM greeter and for user sessions...

    pam_keyinit(gdm-welcome:session): OPEN 1
    pam_keyinit(gdm-welcome:session): UID:120 [0]  GID:120 [120]
    pam_keyinit(gdm-welcome:session): JOIN = -1
    pam_keyinit(gdm-password:session): OPEN 1
    pam_keyinit(gdm-password:session): UID:1000 [0]  GID:1000 [1000]
    pam_keyinit(gdm-password:session): JOIN = -1

Maybe it's a kernel bug, or a pam_keyinit bug, I don't know. But I'm still very curious why does this only happen with accountsservice 0.6.18 and later.
Comment 14 Mantas Mikulėnas 2012-05-11 12:02:25 UTC
(In reply to comment #12)
> is auditd running?

No, auditd has never been necessary for the loginuid feature.

> You might try adding *.* /var/log/all_messages to /etc/rsyslog.conf (or
> equivalent), adding "debug" to the two pam modules command lines and seeing if
> /var/log/all_messages has more useful information.

Just tried that -- see #13.
Comment 15 Ray Strode [halfline] 2012-05-11 12:19:35 UTC
Hey,

glad you figured things out.

the pam_loginuid man page mentions it has a mode where it will fail unconditionally if auditd isn't running. I thought maybe a bug in the module may have caused it to enter that mode erroneously. Since, pam_loginuid isn't to blame, but pam_keyinit is, that stab in the dark was obviously wrong.

I'm not sure why older accountsservices would work and newer accountsservices doesn't. Looking at the code i see:

→       /* create a session keyring, discarding the old one */•
→       ret = syscall(__NR_keyctl,•
→       →             KEYCTL_JOIN_SESSION_KEYRING,•
→       →             NULL);•
→       debug(pamh, "JOIN = %d", ret);•
→       if (ret < 0)•
→       →       return PAM_SESSION_ERR;•

so that syscall is failing.  Does your kernel have keys support ? if you mess around with keyctl, can you get it to work with the kernel keyrings ?
Comment 16 Ray Strode [halfline] 2012-05-11 12:25:02 UTC
Note, i do believe keyrings get created implicitly in response to certain posix calls (for instance setuid, not sure what other calls have an effect), so it could be the new accountsservice is implicitly triggering the kernel to set up keyring environment differently in a way that's making JOIN fail.
Comment 17 Ray Strode [halfline] 2012-05-11 12:29:03 UTC
I just noticed you preemptively answered some of my questions in comment 15 in comment 11.
Comment 18 Ray Strode [halfline] 2012-05-11 12:36:46 UTC
If you're up for rebuilding pam, one idea would be to throw a 

 if (ret < 0) 
 debug(pamh, "%m");

in there
Comment 19 Mantas Mikulėnas 2012-05-11 12:46:27 UTC
(In reply to comment #15)
> I'm not sure why older accountsservices would work and newer accountsservices
> doesn't. Looking at the code i see:
> 
> →       /* create a session keyring, discarding the old one */•
> →       ret = syscall(__NR_keyctl,•
> →       →             KEYCTL_JOIN_SESSION_KEYRING,•
> →       →             NULL);•
> →       debug(pamh, "JOIN = %d", ret);•
> →       if (ret < 0)•
> →       →       return PAM_SESSION_ERR;•
> 
> so that syscall is failing.  Does your kernel have keys support ? if you mess
> around with keyctl, can you get it to work with the kernel keyrings ?

Yes, it has CONFIG_KEYS=y, and `keyctl show` shows my session keyring if present . (When keyinit fails, it shows the "user default session keyring", which isn't terrible, but doesn't provide the session isolation).

With older accountsservice, pam_keyinit logs the following:

    pam_keyinit(gdm-welcome:session): OPEN 1
    pam_keyinit(gdm-welcome:session): UID:120 [0]  GID:120 [120]
    pam_keyinit(gdm-welcome:session): JOIN = 724785089
    [later]
    pam_keyinit(gdm-welcome:session): CLOSE 1,724785089,1
    pam_keyinit(gdm-welcome:session): REVOKE 724785089

(In reply to comment #16)
> Note, i do believe keyrings get created implicitly in response to certain posix
> calls (for instance setuid, not sure what other calls have an effect), so it
> could be the new accountsservice is implicitly triggering the kernel to set up
> keyring environment differently in a way that's making JOIN fail.

This still shouldn't prevent a process from explicitly creating a new session keyring; for example, `keyctl session` still works for me.

(In reply to comment #18)
> If you're up for rebuilding pam, one idea would be to throw a 
> 
>  if (ret < 0) 
>  debug(pamh, "%m");
> 
> in there

Currently doing so, although it seems to me that errno is set only when using the keyutils.h functions, and here -1 (EPERM) is the actual error code returned by the syscall(). I'll try %m, though.

(Only a wild guess, but: Does libaccountsservice use threads before and after the GDBus port? I'm currently browsing security/keys/ and the kernel appears to be returning -EPERM in certain cases when the parent process is multi-threaded... Not sure if this could be related.)
Comment 20 Mantas Mikulėnas 2012-05-11 12:56:08 UTC
Ah, apparently errno *is* being set by syscall(), I didn't know that. The error is 31 (EMLINK, "Too many links").

In linux/security/keys/process_keys.c:773, join_session_keyring() returns -EMLINK if the process has more than one thread.

	/* only permit this if there's a single thread in the thread group -
	 * this avoids us having to adjust the creds on all threads and risking
	 * ENOMEM */
	if (!current_is_single_threaded())
		return -EMLINK;
Comment 21 Ray Strode [halfline] 2012-05-14 12:36:51 UTC
I talked to dhowells about this today.  In summary:

- Mantas, you're absolutely correct with your analysis in comment 20.  The kernel only allows manipulating the session keyring in single threaded processes.
- This means we've probably had this problem for a long time for some PAM module combinations (some pam modules are multithreaded), but never noticed
- It's very difficult, technically, to implement.  Some of the reasons for this are:
    1) Every thread in a keyring shares the same session keyring.  This is by the original design (which has thread keyrings, session keyrings, and user keyrings)
    2) This means when changing the session keyring, the kernel needs to ensure all threads get subscribed to the new keyring.
    3) This implies creating new credentials structures, and going through a subscription process for every thread.  This is error prone; in the event of insufficient memory, scheduling stalls, or threads running in different security contexts, the process could end up in an inconsistent state with no path to recover session-keyring consistency
- While the original design says every thread in a keyring shares the same session keyring (1 above), David would be okay with lightening that constraint, and making the session keyring manipulation calls only apply to the calling thread, with a big "buyer beware" caveat. This would solve the problem for us specifically, since we're going to be execing anyway.
Comment 22 Ray Strode [halfline] 2012-05-14 12:40:22 UTC
(In reply to comment #21)
>     1) Every thread in a keyring shares the same session keyring.
Should read "Every thread in a process shares the same session keyring."
Comment 23 David Howells 2012-05-17 01:39:01 UTC
There may be a way to handle this in userspace, depending on where the extra threads stem from.  If the extra threads are created by PAM modules, and not created by the caller prior to PAM being called, then as long as the session keyring is created prior to the new thread(s) being spawned, it should work.

However, this does have two downsides:

 (1) The new keyring will be owned by root upon creation.

This is awkward, but not really a problem now that possessor permissions exist.  Provided a process is subscribed to that keyring, and provided the keyring grants KEY_POS_VIEW/READ/WRITE/SEARCH, the keyring will be usable.  KEYCTL_CHOWN can then be used to move ownership to the possessor when the ownership is known.  Possibly all root-owned keys in the session keyring should be chown'd at the same time.

 (2) The keyring will have the wrong security label.

This is tricky.  Normally, pam_keyinit has to be run *after* pam_selinux so that the keyring will pick up the correct security label.  As the keyring already exists, it will be labelled incorrectly.

We can probably get around this by making the KEYCTL_CHOWN attempt a label transition.  The label used by gdm or whatever the login process is will probably be special, and we can probably use that and the current process label to work out where we need to go.
Comment 24 Mantas Mikulėnas 2012-05-17 02:30:29 UTC
(In reply to comment #23)
> There may be a way to handle this in userspace, depending on where the extra
> threads stem from.  If the extra threads are created by PAM modules, and not
> created by the caller prior to PAM being called, then as long as the session
> keyring is created prior to the new thread(s) being spawned, it should work.

It seems that the threads are created by GDM itself; more specifically, by the accountsservice library, which seems to be using a separate "gdbus" thread after the dbus-glib -> GDBus rewrite. (My PAM configuration is in comment #3.)
Comment 25 Ray Strode [halfline] 2012-05-17 10:01:30 UTC
Right, we could work around the issue in multithreaded pam modules by having pam_keyinit get runs twice in the session stack, once to create the keyring, and later to fix up the ownership of the keyring.

That doesn't fix the problem that glib itself creates multiple threads now. The only ways around that problem would be:

1) fix glib's gdbus apis to work without spawning a worker thread
2) stop using gdbus is in libaccountservice and use libdbus-glib again
3) fix the kernel to allow the session keyrings in a process to be temporarily different among different threads in a process until exec
4) fix the kernel to allow changing the session keyring of every thread in a multithreaded process.

1 and 4 are hard.  2 isn't great either, since dbus-glib is deprecated and GDM is about to be ported to gdbus as well.  The problem will just resurface (see https://bugzilla.gnome.org/show_bug.cgi?id=622888 ).  So 3 does really seem like the best way forward, if it's feasbile.

FWIW, in fedora, we already make pam_keyinit optional.  That's probably an okay near term workaround, since there aren't that many things leveraging the session keyring, i think (just kerberos in a non-default configuration is all i know about)
Comment 26 David Howells 2012-05-21 05:33:28 UTC
Here's a patch that should work:

     http://linux-nfs.org/pipermail/keyrings/2012-May/000911.html

As discussed in comment 21, it makes session (and process) keyring subscriptions per-thread.  With this patch, the difference between thread, process and session keyrings is a matter of how inheritance works over fork, clone and execve and nothing about automatically sharing changes to the keyring subscription.
Comment 27 Ray Strode [halfline] 2012-07-24 20:29:49 UTC
David, did this patch end up going in?  If so, I'll probably close this bug.
Comment 28 Ray Strode [halfline] 2012-10-02 19:34:38 UTC
So I ended up giving the patch referenced in comment 26 a try today.

With an unpatched kernel I was able to recreate the problem by making pam_keyinit a required module in my pam stack.  It would fail with the message:

Cannot make/remove an entry for the specified session

Then with a patched kernel, it silently succeeded.
Comment 29 Florian Mickler 2012-12-22 09:20:20 UTC
A patch referencing this bug report has been merged in Linux v3.8-rc1:

commit 3a50597de8635cd05133bd12c95681c82fe7b878
Author: David Howells <dhowells@redhat.com>
Date:   Tue Oct 2 19:24:29 2012 +0100

    KEYS: Make the session and process keyrings per-thread
Comment 30 Ray Strode [halfline] 2013-01-02 14:47:25 UTC
Great, thanks David and Florian