Bug 15922 - readdir_r buffer has invalid size on fuse file system and maybe others. was: dbus-daemon: free(): invalid next size (fast):
Summary: readdir_r buffer has invalid size on fuse file system and maybe others. was: ...
Status: RESOLVED FIXED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.2.x
Hardware: Other All
: medium blocker
Assignee: Simon McVittie
QA Contact: John (J5) Palmieri
URL:
Whiteboard: see #8284
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-13 12:00 UTC by Maximilian Mehnert
Modified: 2011-01-25 05:21 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
valgrind output of running dbus-daemon --session (25.56 KB, text/plain)
2008-05-13 12:00 UTC, Maximilian Mehnert
Details
changes readdir_r buffer for fuse file system (and maybe others) (855 bytes, patch)
2008-05-26 13:31 UTC, Maximilian Mehnert
Details | Splinter Review
[PATCH 1/3] dirent_buf_size: always assume readdir_r needs at least 255 bytes (1.42 KB, patch)
2011-01-06 07:09 UTC, Simon McVittie
Details | Splinter Review
[PATCH 2/3] dirent_buf_size: always assume that readdir_r needs at least NAME_MAX bytes (1.20 KB, patch)
2011-01-06 07:10 UTC, Simon McVittie
Details | Splinter Review
[PATCH 3/3] dirent_buf_size: always demand at least sizeof(struct dirent) bytes (1.06 KB, patch)
2011-01-06 07:10 UTC, Simon McVittie
Details | Splinter Review

Description Maximilian Mehnert 2008-05-13 12:00:37 UTC
Created attachment 16511 [details]
valgrind output of running dbus-daemon --session

For me, running dbus-daemon fails. 

The summary line of the report is the error received.

I recompiled the current debian package 1.2.1-2 with debugging symbols and ran valgrind.

The output is attached.

I have a feeling that the cause for this is the fact that my root file system is on zfs-fuse (http://zfs-on-fuse.blogspot.com/), since after rebuilding my system this way the problem started to occur. 
Further system details: Kernel 2.6.26-rc2

Shall I take any action to debug this further?
Comment 1 John (J5) Palmieri 2008-05-14 08:49:20 UTC
I have a feeling this has to do with the non-caching userdb code path.  Please look at bug #15588 and compile with caching on to see if this issue is fixed.  For reference bug #15589 has the noncaching issue.  Basically userinfo structures need to become objects with references. 
Comment 2 Maximilian Mehnert 2008-05-17 09:31:41 UTC
I recompiled with --enable-userdb-cache but that did not solve the problem. :-(
Comment 3 John (J5) Palmieri 2008-05-19 06:17:53 UTC
did you apply the patch also? If not then that flag won't actually do anythin.  That was part of the bug.
Comment 4 Maximilian Mehnert 2008-05-19 07:03:17 UTC
Yes, I applied that patch :-(
Comment 5 Maximilian Mehnert 2008-05-26 07:08:26 UTC
is there anything I can do to help finding the problem here?
Comment 6 Maximilian Mehnert 2008-05-26 09:01:51 UTC
Having a look at valgrind's
--snip--
Thread 1: status = VgTs_Runnable
==3603==    at 0x4822B8E: realloc (vg_replace_malloc.c:429)
==3603==    by 0x32641: dbus_realloc (dbus-memory.c:601)
==3603==    by 0x32E69: reallocate_for_length (dbus-string.c:364)
==3603==    by 0x32F14: set_length (dbus-string.c:404)
==3603==    by 0x332B2: _dbus_string_lengthen (dbus-string.c:872)
==3603==    by 0x334DC: append (dbus-string.c:1020)
==3603==    by 0x33572: _dbus_string_append (dbus-string.c:1050)
==3603==    by 0x3C8B2: _dbus_directory_get_next_file (dbus-sysdeps-util-unix.c:761)
==3603==    by 0x3E63: update_directory (activation.c:577)
==3603==    by 0x41E0: bus_activation_new (activation.c:771)
==3603==    by 0x6602: process_config_every_time (bus.c:497)
==3603==    by 0x68E2: bus_context_new (bus.c:615)
==3603==    by 0x1B2A1: main (main.c:443)
--snap--

I did a 

--snip--

$ gdb /usr/bin/dbus-daemon 
GNU gdb 6.7.1-debian
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) break _dbus_directory_open
Breakpoint 1 at 0x3c5ee: file dbus-sysdeps-util-unix.c, line 637.
(gdb) break _dbus_directory_get_next_file
Breakpoint 2 at 0x3c756: file dbus-sysdeps-util-unix.c, line 727.
(gdb) run --session --nofork
Starting program: /usr/bin/dbus-daemon --session --nofork
Breakpoint 1 at 0x3c5db: file dbus-sysdeps-util-unix.c, line 630.
Breakpoint 2 at 0x3c743: file dbus-sysdeps-util-unix.c, line 720.
Warning:
Cannot insert breakpoint 1.
Error accessing memory address 0x3c5db: Input/output error.
Cannot insert breakpoint 2.
Error accessing memory address 0x3c743: Input/output error.

--snap--

Any idea what that is?
Comment 7 Maximilian Mehnert 2008-05-26 13:31:14 UTC
Created attachment 16746 [details] [review]
changes readdir_r buffer for fuse file system (and maybe others)
Comment 8 Maximilian Mehnert 2008-05-26 13:34:06 UTC
With my latest attachment / patch, dbus works for me with the root file system on fuse. 
It was a bit odd to track down - it turned out that in the calculation of the buffer for readdir_r the maximum length for a filename had a value of 0.
Comment 9 Maximilian Mehnert 2008-06-10 03:19:12 UTC
hmm. any opinion on the patch and/or the problem?
Feedback appreciated.
Comment 10 David Abrahams 2009-02-28 20:07:15 UTC
I can confirm this problem and that the patch fixes it.

FWIW, *my* only FUSE filesystem is mounted at /usr

Comment 11 Havoc Pennington 2009-03-01 08:42:50 UTC
This code requires some care:
http://womble.decadentplace.org.uk/readdir_r-advisory.html

There is no indication in the standard that fpathconf should ever be returning 0:
http://www.opengroup.org/onlinepubs/009695399/functions/pathconf.html

Surely this is a bug in fuse.

Not sure falling back to PATH_MAX is safe. In fact I think it isn't. readdir_r will smash the stack if the filename is longer than the buffer...
Comment 12 Simon McVittie 2011-01-06 06:09:37 UTC
Falling back to 0 is certainly unsafe (*any* filename will overflow it and smash the stack), so I don't think we can really make the situation any worse by replacing 0 with NAME_MAX... I'm very tempted to just make the buffer size be NAME_MAX if fpathconf() returns less than that. It can't hurt.

For comparison, GLib's g_dir_read_name() calls readdir() (without holding a lock, which doesn't seem ideal). An alternative approach would be to take a lock around readdir().

I've asked Ben (author of that readdir_r advisory) for input.
Comment 13 Simon McVittie 2011-01-06 06:18:26 UTC
For further comparison, Qt calls pathconf, with a comment explaining the race condition, and doesn't handle pathconf returning 0 either.
Comment 14 Thiago Macieira 2011-01-06 06:24:57 UTC
(In reply to comment #13)
> For further comparison, Qt calls pathconf, with a comment explaining the race
> condition, and doesn't handle pathconf returning 0 either.

Because, like Havoc said, pathconf is not allowed to return 0 unless the limit is *really* a zero, meaning zero bytes.
Comment 15 Simon McVittie 2011-01-06 07:09:14 UTC
Created attachment 41708 [details] [review]
[PATCH 1/3] dirent_buf_size: always assume readdir_r needs at least 255 bytes

HP-UX apparently defines NAME_MAX too small. Sigh.
Comment 16 Simon McVittie 2011-01-06 07:10:01 UTC
Created attachment 41709 [details] [review]
[PATCH 2/3] dirent_buf_size: always assume that readdir_r needs at least NAME_MAX bytes

fpathconf() can apparently return 0 for the max length of a filename,
at least on zfs-on-fuse. Sigh.

(This patch hopefully fixes the bug stated here.)
Comment 17 Simon McVittie 2011-01-06 07:10:26 UTC
Created attachment 41710 [details] [review]
[PATCH 3/3] dirent_buf_size: always demand at least sizeof(struct dirent) bytes

As per http://womble.decadent.org.uk/readdir_r-advisory.html rev 4,
d_name might not be the last member in struct dirent.
Comment 18 Simon McVittie 2011-01-06 09:01:28 UTC
(In reply to comment #14)
> Because, like Havoc said, pathconf is not allowed to return 0 unless the limit
> is *really* a zero, meaning zero bytes.

As far as I can tell, glibc on Linux basically returns f_namelen from struct statfs, which is a copy of f_namemax from struct statvfs, so if a filesystem fails to set that, we lose. Being a bit defensive by allocating more memory than we strictly needed to seems like a better user experience than smashing our own stack in protest at a filesystem bug, particularly when userland filesystems exist.
Comment 19 Simon McVittie 2011-01-07 08:50:43 UTC
On Bug #8284, Havoc points out that we don't really need thread-safety at all, because we only call readdir_r() from the bus daemon and the tests; so we could probably just use readdir() and bypass this whole mess.
Comment 20 Colin Walters 2011-01-07 12:21:45 UTC
(In reply to comment #19)
> On Bug #8284, Havoc points out that we don't really need thread-safety at all,
> because we only call readdir_r() from the bus daemon and the tests; so we could
> probably just use readdir() and bypass this whole mess.

On Linux/glibc readdir() is threadsafe already, because the name is stored in DIR, not in struct dirent, which really, is the only sane thing to do.

It's not reentrant, but libdbus isn't giving out its DIR * pointers, so it doesn't matter.
Comment 21 Simon McVittie 2011-01-25 05:21:20 UTC
Fixed in git for 1.4.4, 1.5.0 by merging the patch from Bug #8284, so we no longer use readdir_r at all. Thanks to Will Thompson for code review.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.