Bug 85200 - No way to repair root FS in emergency mode
Summary: No way to repair root FS in emergency mode
Status: NEW
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-19 15:00 UTC by Milan Bouchet-Valat
Modified: 2015-05-30 05:17 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
journalctl --all --full -b (44.56 KB, text/plain)
2014-12-08 17:43 UTC, Milan Bouchet-Valat
Details
systemctl --full (16.50 KB, text/plain)
2014-12-08 17:43 UTC, Milan Bouchet-Valat
Details
ps -aux (10.85 KB, text/plain)
2014-12-08 17:44 UTC, Milan Bouchet-Valat
Details
lsof / (27.41 KB, text/plain)
2014-12-08 17:44 UTC, Milan Bouchet-Valat
Details

Description Milan Bouchet-Valat 2014-10-19 15:00:11 UTC
My F20 system crashed twice last week (once because of a kernel bug, and the second time because of a broken battery), and each time it wouldn't boot after that because of errors on the ext4 root. I was granted with an emergency boot screen like this:
http://foobaring.blogspot.fr/2014/01/howto-run-fsck-on-emergency-mode-on.html

The problem is, there's no (easy) way to run fsck on the unmounted root partition to actually fix the errors. I had to create a Live USB and run fsck manually from there. That's quite painful, and a non-technical user would be completely stuck, thinking Linux is quite a fragile OS.

Apart from the blog post above, I've found other complaints or help requests about this (and sometimes of course it triggers the usual rants about systemd, which is a bit sad):
http://forums.fedoraforum.org/archive/index.php/t-299624.html (Fedora)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754340 (Debian, there's an interesting discussion there about potential explanations/solutions)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=697962 (Debian)
https://bbs.archlinux.org/viewtopic.php?id=186201 (Arch)
http://bitsofmymind.com/2014/03/14/how-to-fix-fsck-your-root-file-system-that-you-have-to-boot-into-on-linux/ (Fedora)
http://forums.fedora-fr.org/viewtopic.php?id=62374 (Fedora, in French)

One common remark is that unmounting / does not work (at least not without a trick). But beyond that, I think it would be very useful to offer a simple option to repair the FS automatically, for users who would run 'fsck -y' anyway (most users, probably).


(There's also the problem that when Plymouth is enabled, you don't see the fsck error message, only the "Welcome to emergency mode", without any explanation of why normal boot failed. And journalctl -xb does not contain any information about that either.)
Comment 1 Lennart Poettering 2014-10-23 23:54:39 UTC
The emergency mode is unfortunately not easy to use, that kinda lies in the nature of it though. 

However, it's not that difficult to run fsck... Try "umount /sysroot", followed by fsck on the desired device. That should really suffice, no?
Comment 2 Milan Bouchet-Valat 2014-10-28 14:02:50 UTC
(In reply to Lennart Poettering from comment #1)
> The emergency mode is unfortunately not easy to use, that kinda lies in the
> nature of it though. 
> 
> However, it's not that difficult to run fsck... Try "umount /sysroot",
> followed by fsck on the desired device. That should really suffice, no?
Not sure, apparently something is using / (haven't tried /sysroot though). This is discussed here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754340

But if there's an easy way, offering this option to the user would be great.
Comment 3 Milan Bouchet-Valat 2014-12-08 17:42:56 UTC
I had the "chance" to reproduce the problem due to a corrupt ext4 /. Actually, with rescue.target / cannot be unmounted (busy), but rebooting under emergency.target works fine. Looks like some services are started when they shouldn't. I'm attaching a series of log files which may be useful to understand what's going on.

My problem looks really similar to that Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754340
Comment 4 Milan Bouchet-Valat 2014-12-08 17:43:30 UTC
Created attachment 110578 [details]
journalctl --all --full -b
Comment 5 Milan Bouchet-Valat 2014-12-08 17:43:54 UTC
Created attachment 110579 [details]
systemctl --full
Comment 6 Milan Bouchet-Valat 2014-12-08 17:44:07 UTC
Created attachment 110580 [details]
ps -aux
Comment 7 Milan Bouchet-Valat 2014-12-08 17:44:48 UTC
Created attachment 110581 [details]
lsof /
Comment 8 Milan Bouchet-Valat 2014-12-08 17:45:50 UTC
And FWIW the logs are from rescue.target from systemd 217 on Fedora 21.
Comment 9 Lennart Poettering 2015-02-04 17:30:12 UTC
/ can obviously not be unmounted while you are booted from it: all your binaries keep it busy. THis has been this way since about forever, and there's no way around it.

Note following what really the issue is here?

If you are in the emergency prompt in the initrd, then the future root is mounted to /sysroot, and you can unmount it there and then fsck the root device as you like.

If you are in the emergency prompt on the host, then I'd recommend rebooting and entering the emergency mode in the initrd, so that you don#t keep the device busy.

ANyway, really not getting what the issue is supposed to be here... Care to elaborate?
Comment 10 Milan Bouchet-Valat 2015-02-04 17:58:13 UTC
My general issue is simply that when / gets corrupt for some reason, people get a console screen without any clear way of fixing the problem unless they are experts of the boot process. This sounds too bad since simply running fsck before mounting / would fix the problem.

Now, going into the technical details, I'd say that the #1 problem is that when / is corrupt, systemd goes to rescue.target instead of emergency.target. In the former mode, / is already mounted, which means you cannot run fsck on it. Not everybody knows that you're supposed to reboot into emergency.target (even more so since the prompt says "Welcome to emergency mode" and not "rescue mode").

So I think it would be more useful to boot/reboot into emergency.target when / fails to mount. Even better, the prompt could say "Failed to mount root partition. Do you want to automatically try to repair it? [Y/n]" I suspect 90% of people merely call 'fsck -y' without in those cases anyway.

IMHO this is really required if we want Linux to be as reliable as other OSes for average-technical users, for which systemd is only a vague name and boot targets something they never heard of.
Comment 11 Lennart Poettering 2015-02-04 18:27:54 UTC
fscking of the root fs is something that needs to be done before the initrd transitions into the host OS, it is a job of the initrd. The Fedora initrd actually runs fsck before mounting the root fs, hence I am bit puzzled what you are saying... I mean we already *are* invoking fsck implicitly before mounting things.

Moreover, your attachments suggest that your root is btrfs? btrfs turns of fsck before mounting, the kernel basically does all the fscking that's necessary on its own when mounting. Userspace is not involved there anymore...

So, not sure what this bug is about...
Comment 12 Milan Bouchet-Valat 2015-02-04 18:58:09 UTC
No, my / is ext4 (the default).

But I'm 100% sure I've needed several times to run fsck manually to get my system to start (either by using a live USB or by rebooting into emergency.target). This was after a kernel crash had corrupt the filesystem.

I guess when the corruption is bad enough, fsck refuses to do anything automatically? That's the case I'm taking about: the one where you want to ask for a confirmation from the user before trying to repair the disk.
Comment 13 Konstantin Svist 2015-05-30 05:17:39 UTC
FWIW, I've been dealing with fsck in emergency mode on a few of my machines
Note: Fedora 20, NOT using LVM; sda1 ext4 /boot, sda2 swap, sda3 ext4 /

From graphical.target: 
systemctl isolate emergency.target switches to "emergency mode" but / is mounted rw and can't trivially be remounted ro. Can't run fsck on rw, obviously.

Editing grub to add "emergency" to kernel line, boots the / in ro mode, which allows me to run fsck without any problem. As soon as I remount rw, however, it's locked to rw.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.