Bug 85477 - Creating new partition and filesystem sometimes only creates the partition
Summary: Creating new partition and filesystem sometimes only creates the partition
Status: RESOLVED FIXED
Alias: None
Product: udisks
Classification: Unclassified
Component: operations (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: David Zeuthen (not reading bugmail)
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-26 10:10 UTC by intrigeri
Modified: 2016-11-23 15:19 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description intrigeri 2014-10-26 10:10:54 UTC
Hi,

I'm using GNOME Disks 3.12.1-1+b1 on current Debian unstable, with udisks2 (2.1.3-5), gvfs (1.22.1-1) and libatasmart4 0.19-3.

Steps to reproduce:

1. Blank a USB drive (dd if=/dev/zero)

2. Format that USB drive with GNOME Disks (default options: quick and MBR)

3. Attempt to create a new partition+filesystem on that USB drive using the "+" icon in GNOME Disks (default options, again: don't overwrite, FAT, no label)

* I always get this error in a dialog:

  Error creating partition
  Error wiping newly created partition /dev/sdb1: Command-line `wipefs -a "/dev/sdb1"' exited with non-zero exit status 1:
  wipefs: error: /dev/sdb1: probing initialization failed: No such file or directory
  (udisks-error-quark, 0)

* The sdb1 partition is always created.

* Sometimes the FAT filesystem is created on this partition, sometimes not.

* In any case, I can then reliably "Format" the newly created partition with GNOME Disks and get a filesystem created as expected.

Seems like udisks is trying to act on a device before the device node appears. The affected code seems to live in src/udiskslinuxpartitiontable.c, that pretends to "sit and wait for the partition to show up" before wiping the newly created partition, but apparently doesn't reliably wait for the device node to appear in /dev.
Comment 1 intrigeri 2014-10-26 10:31:16 UTC
I retried the same when running udisksd without --no-debug. The journal (for the partition+FS creation) says:

Oct 26 11:29:02 sid-desktop kernel:  sda: sda1
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.286:[1790]:[DEBUG]: uevent change /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.287:[1795]:[INFO]: Cleanup check start [udisksstate.c:419, udisks_state_check_in_thread()]
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.287:[1795]:[INFO]: Cleanup check end [udisksstate.c:457, udisks_state_check_in_thread()]
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.771:[1790]:[DEBUG]: uevent add /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda/sda1 [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Oct 26 11:29:03 sid-desktop kernel:  sda: sda1
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.817:[1790]:[DEBUG]: uevent remove /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda/sda1 [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.817:[1795]:[INFO]: Cleanup check start [udisksstate.c:419, udisks_state_check_in_thread()]
Oct 26 11:29:03 sid-desktop udisksd[1790]: 11:29:03.818:[1795]:[INFO]: Cleanup check end [udisksstate.c:457, udisks_state_check_in_thread()]
Oct 26 11:29:04 sid-desktop udisksd[1790]: 11:29:04.416:[1790]:[DEBUG]: uevent change /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Oct 26 11:29:04 sid-desktop udisksd[1790]: 11:29:04.416:[1795]:[INFO]: Cleanup check start [udisksstate.c:419, udisks_state_check_in_thread()]
Oct 26 11:29:04 sid-desktop udisksd[1790]: 11:29:04.416:[1795]:[INFO]: Cleanup check end [udisksstate.c:457, udisks_state_check_in_thread()]
Oct 26 11:29:05 sid-desktop udisksd[1790]: 11:29:05.004:[1790]:[DEBUG]: uevent change /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Oct 26 11:29:05 sid-desktop udisksd[1790]: 11:29:05.004:[1795]:[INFO]: Cleanup check start [udisksstate.c:419, udisks_state_check_in_thread()]
Oct 26 11:29:05 sid-desktop udisksd[1790]: 11:29:05.004:[1795]:[INFO]: Cleanup check end [udisksstate.c:457, udisks_state_check_in_thread()]
Oct 26 11:29:05 sid-desktop udisksd[1790]: 11:29:05.482:[1790]:[DEBUG]: uevent add /sys/devices/pci0000:00/0000:00:01.2/usb1/1-2/1-2.2/1-2.2:1.0/host8/target8:0:0/8:0:0:0/block/sda/sda1 [udiskslinuxprovider.c:902, udisks_linux_provider_handle_uevent()]
Comment 2 intrigeri 2015-08-21 14:25:06 UTC
Reproduced with udisks 2.1.6-2 on current Debian sid, this time in Python, with any of direct D-Bus calls (both sync and async), libudisks (both sync and async) and gdbus. This tends to confirm that the bug is in UDisks2 itself.
Comment 3 Marc Deslauriers 2016-03-23 11:12:18 UTC
Here's my understanding of this issue:

In src/udiskslinuxpartitiontable.c, handle_create_partition():

1- New partition is created by calling parted
2- udisks_linux_block_object_trigger_uevent() is called to trigger udev
3- Wait for partition to show up by using wait_for_partition()
4- Wipe new partition by calling wipefs
5- udisks_linux_block_object_trigger_uevent() is called to trigger udev

The problem arises because #2 and #5 aren't done synchronously. When udev processes the device change request, it removes the newly created partition device (ie: sdb1), and then adds it back again. This happens in a race with #4, and results in wipefs being called while the device is removed.

This issue can be solved in a few ways, from best to worst:

1- Create a sync version of udisks_linux_block_object_trigger_uevent() that waits until the device is actually handled in #2 and #5
2- Make sure there are no pending events in the udev queue after #2 and #5
3- Adding an artificial delay after #2 and #5 to wait until the device has settled down (some places in the code add a udisks_linux_block_object_reread_partition_table(), which I assume results in enough delay to not hit the race most of the time.)

I've tried removing #2 and #5 altogether, and that appeared to work also, but then gnome-disks crashes because udev hasn't added the device back fast enough after the wipefs.
Comment 4 Martin Pitt 2016-04-04 09:36:34 UTC
I applied Mathieu's patch: https://cgit.freedesktop.org/udisks/commit/?id=5c859c99dfcd3625

Thank you!
Comment 5 Marius Vollmer 2016-05-12 14:10:32 UTC
I think the patch is not fully effective.  We still see failures related to unexpected remove/add uevent pairs with storaged and Cockpit.

Storaged can create a partition and format it with a single method call, and the remove/add pair caused by the partition creation interferes frequently with the immediately following formatting, so I guess it will still interfere with wipefs as well.

The events come from udevd itself, when it calls BLKRRPART as part of synthesizing the change event for the parent device of a partition.  The change event is synthesized when the parent device is closed after having been open for writing (udev listens for that via inotify).  I don't know whether there are any ordering guarantees between uevents and inotify, so I don't know whether udisks/storaged can do anything to make sure that the removed/add pair has definitely happened by now.

My ideal fix would probably be in the kernel: BLKRRPART should not emit remove/add pairs in the first place, but be more accurate about what has really changed.

I have a patch proposal for storaged that tries to deal with this situation: https://github.com/storaged-project/storaged/pull/61

This helps a lot, but I guess wipefs or other operations might still hit a missing block device.  Hmm...
Comment 6 Marius Vollmer 2016-05-24 12:39:17 UTC
> I think the patch is not fully effective. 

In fact, the patch seems to make things worse. :-/

One way to prevent the rogue BLKRRPART ioctl by udevd is to take a shared lock on the parent device.  Everything becomes sane with that, except that now udisks2 (or storaged) will issue random BLKRRPARTs themselves, which just make things worse.

I vote to revert 5c859c99d.

I'll work on a patch for storaged to put a shared lock around parted/wipefs/cryptsetup/mkfs.
Comment 7 Marius Vollmer 2016-05-25 12:24:12 UTC
I think this is better:

   https://github.com/storaged-project/storaged/pull/64

The current patch 5c859c99dfcd3625 makes our tests fail reliably, so I'll reopen this.
Comment 8 Martin Pitt 2016-11-23 15:19:57 UTC
Thanks Marius! I finally got around to this, and ported the storaged patches to udisks:
 
  https://cgit.freedesktop.org/udisks/commit/?id=e168e59f
  https://cgit.freedesktop.org/udisks/commit/?id=554daa4b

I tried this with an USB stick; I do get the wipefs failure pretty consistently, and with this creating the partition and FS seems to work reliably.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.