Bug 36529 - /home on LVM that isn't on the boot disk causes bootup to hang
Summary: /home on LVM that isn't on the boot disk causes bootup to hang
Status: RESOLVED NOTOURBUG
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Lennart Poettering
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-23 10:50 UTC by Andy Lutomirski
Modified: 2011-04-26 15:42 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Andy Lutomirski 2011-04-23 10:50:25 UTC
I have /home on an LVM volume in a volume group that isn't necessary to mount /.  This means that dracut doesn't activate the volume group.

Since there aren't any udev rules that automatically activate LVM volume groups as they appear, systemd sits and waits forever for /home to appear.  This causes the system to fail to boot.

I'm not sure that this is a bug so much as a missing feature, although it would be nice if there was at least a sensible error message to indicate what's wrong.
Comment 1 Lennart Poettering 2011-04-26 12:55:49 UTC
Which distro is this? 

This should work fine if you run an up-to-date version of LVM with udev rules enabled and a service file that does "vgchange -ay" or a similar command at bootup.
Comment 2 Andy Lutomirski 2011-04-26 14:07:01 UTC
This is Fedora 15 beta, fully up to date.

I have a similarly configured F14 system that works fine.  AFAICT it works because /etc/rc.sysinit explicitly handles LVM.  It does:

if [ -x /sbin/lvm ]; then
        action $"Setting up Logical Volume Management:" /sbin/lvm vgchange -a y --sysinit
fi

and then starts to mount filesystems.

In F15 with systemd, the only thing that might even try to start LVM is /etc/init.d/netfs, which is After sysinit.target, which is After local-fs.target, which will never finish starting if LVM doesn't get started.

So, unless I'm missing something, this is a regression that prevents systems that used to boot from booting.  Maybe the bug is in Fedora 15 for not providing an LVM systemd unit or udev rule, but the regression is caused by systemd not having a feature that rc.sysinit had.

I "fixed" it with a custom udev rule like:
ACTION=="change", ENV{UDISKS_LVM2_PV_VG_NAME}=="vg_midnight", RUN="/sbin/vgchange -ay vg_midnight"
but I shouldn't have had to do that.  Or at the very least systemd should have printed a nice message telling me that it's getting bored waiting for /dev/vg_midnight/home to show up, which would have saved a long time diagnosing it.
Comment 3 Lennart Poettering 2011-04-26 15:04:10 UTC
On F15 /lib/systemd/system/fedora-storage-init.service and /lib/systemd/system/fedora-storage-init-late.service are responsible to set up LVM. That is run during normal boot, pulls in udev-settle.service and runs vgchange.

I really don't get what this bug is about?
Comment 4 Andy Lutomirski 2011-04-26 15:15:09 UTC
(In reply to comment #3)
> On F15 /lib/systemd/system/fedora-storage-init.service and
> /lib/systemd/system/fedora-storage-init-late.service are responsible to set up
> LVM. That is run during normal boot, pulls in udev-settle.service and runs
> vgchange.

I don't have exact logs of the startup order handy, but fedora-storage-init.service doesn't seem to be ordered after cryptsetup.target, and since cryptsetup is slow (it needs user interaction) and my LVM PV is a LUKS device, fedora-storage-init.service will most likely not notice my LVM device.

My configuration is:

sda -> LUKS -> PV for LVM volume group 'a'.
sdb -> LUKS -> PV for LVM volume group 'b'.

All of my filesystems except /home are on LVs on VG a.  /home is on an LV on VG b.

On bootup, dracut (I think) initializes LUKS on sda and volume group a on the LUKS device.  Then, unless I have the hacked-up udev rule, systemd gets stuck trying to mount /home, times out after awhile, and drops me to an emergency shell.

> 
> I really don't get what this bug is about?

It's about the fact that I couldn't boot my computer without a hacked-up udev rule.  I'm not presently near the affected computer, but I'll try to re-test things later tonight and get a real log of the startup ordering.
Comment 5 Lennart Poettering 2011-04-26 15:24:10 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > On F15 /lib/systemd/system/fedora-storage-init.service and
> > /lib/systemd/system/fedora-storage-init-late.service are responsible to set up
> > LVM. That is run during normal boot, pulls in udev-settle.service and runs
> > vgchange.
> 
> I don't have exact logs of the startup order handy, but
> fedora-storage-init.service doesn't seem to be ordered after cryptsetup.target,
> and since cryptsetup is slow (it needs user interaction) and my LVM PV is a
> LUKS device, fedora-storage-init.service will most likely not notice my LVM
> device.

No, it won't. But fedora-storage-init-late.service will.

LVM is a pretty broken piece of software. To support both LUKS-on-LVM and LVM-on-LUKS we have to start LVM once before and once after crypsetup.target. Which is why there is -init and -init-late. If you look into these fails you'll see that the former has no ordering on cryptsetup.target, but the latter has.

> On bootup, dracut (I think) initializes LUKS on sda and volume group a on the
> LUKS device.  Then, unless I have the hacked-up udev rule, systemd gets stuck
> trying to mount /home, times out after awhile, and drops me to an emergency
> shell.

Sounds as if the LVM entries in the udev db are lost on their way to the main system. If you run "dracut -f" to build a new initrd, does that fix the problem?
Comment 6 Lennart Poettering 2011-04-26 15:25:29 UTC
s/fails/files
Comment 7 Andy Lutomirski 2011-04-26 15:29:22 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > On F15 /lib/systemd/system/fedora-storage-init.service and
> > > /lib/systemd/system/fedora-storage-init-late.service are responsible to set up
> > > LVM. That is run during normal boot, pulls in udev-settle.service and runs
> > > vgchange.
> > 
> > I don't have exact logs of the startup order handy, but
> > fedora-storage-init.service doesn't seem to be ordered after cryptsetup.target,
> > and since cryptsetup is slow (it needs user interaction) and my LVM PV is a
> > LUKS device, fedora-storage-init.service will most likely not notice my LVM
> > device.
> 
> No, it won't. But fedora-storage-init-late.service will.

$ systemctl status fedora-storage-init-late.service 
fedora-storage-init-late.service - Initialize storage subsystems (RAID, LVM, etc.)
	  Loaded: loaded (/lib/systemd/system/fedora-storage-init-late.service)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/fedora-storage-init-late.service

local-fs.service doesn't seem to want fedora-storage-init-late.service.  When I get home I'll see whether adding the symlink makes my system boot without the funny udev rule.

> 
> LVM is a pretty broken piece of software. To support both LUKS-on-LVM and
> LVM-on-LUKS we have to start LVM once before and once after crypsetup.target.
> Which is why there is -init and -init-late. If you look into these fails you'll
> see that the former has no ordering on cryptsetup.target, but the latter has.

Agreed.

Maybe in some ideal future universe, something would make a database of all the storage things that need to be initialized and dynamically generate systemd (and maybe udev) configuration for them.
Comment 8 Lennart Poettering 2011-04-26 15:41:35 UTC
(In reply to comment #7)

> > No, it won't. But fedora-storage-init-late.service will.
> 
> $ systemctl status fedora-storage-init-late.service 
> fedora-storage-init-late.service - Initialize storage subsystems (RAID, LVM,
> etc.)
>       Loaded: loaded (/lib/systemd/system/fedora-storage-init-late.service)
>       Active: inactive (dead)
>       CGroup: name=systemd:/system/fedora-storage-init-late.service
> 
> local-fs.service doesn't seem to want fedora-storage-init-late.service.  When I
> get home I'll see whether adding the symlink makes my system boot without the
> funny udev rule.
> 

Oha, that's the bug. The initscripts package currently doesn't enable fedora-storage-init-late.service, it just installs it.

I have now filed a bug about this.

https://bugzilla.redhat.com/show_bug.cgi?id=699918

> Maybe in some ideal future universe, something would make a database of all the
> storage things that need to be initialized and dynamically generate systemd
> (and maybe udev) configuration for them.

The plan is to fix LVM and make it listen to devices popping up, so that we only have to wait for the devices we really nee to boot, and the "wait for everything to settle" idea goes away.

I think we tracked down the bug, so tentatively closing this now.

Thanks for tracking this down, even though I dickishly closed the bug early. Thanks!
Comment 9 Lennart Poettering 2011-04-26 15:42:11 UTC
Further discussion on the rhbz bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.