Bug 55213

Summary: Early boot vsnprintf/vasprintf segfault
Product: systemd Reporter: jpsinthemix
Component: generalAssignee: systemd-bugs
Status: RESOLVED FIXED QA Contact: systemd-bugs
Severity: blocker    
Priority: high CC: cottrell, freedesktop, gkulyk, public.avatar, stuffcorpse, thomas
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
i915 platform: i915 features:
Attachments: Working (but ugly) fix of early boot vsnprintf/vasprintf segfault
Working (less ugly) fix of early boot vsnprintf/vasprintf segfault

Description jpsinthemix 2012-09-22 07:28:15 UTC
Created attachment 67533 [details]
Working (but ugly) fix of early boot vsnprintf/vasprintf segfault


Beginning in post-systemd-189 git, and still present in systemd-191, there is a nasty problem in function log_struct_internal() (src/shared/log.c), which renders my PC unbootable, giving an early boot systemd[1] segfault in /lib/libc-2.16.90.so, and 'Freezing execution.'

My system: i686-pc-linux-gnu, gcc-4.7.1, glibc-2.16.0, linux-3.6.0-rc6, systemd-191 (all software built from source).

The segfault is due to a misuse of the vsnprintf() and vasprintf() functions. Specifically, one must not use va_arg() after calls to vsnprintf() or vasprintf() as these function/macros internally use va_arg() and may modify it undefined ways. In particular, from the vsnprintf manpage:

The functions vprintf(), vfprintf(), vsprintf(), vsnprintf() are equivalent to the functions printf(), fprintf(), sprintf(), snprintf(), respectively, except that they are called with a va_list instead of a variable number of arguments.  These  functions  do  not  call  the va_end macro.  Because they invoke the va_arg macro, the value of ap is undefined after the call.  See stdarg(3),

and from the stdarg manpage:

    If ap is passed to a function that uses va_arg(ap,type) then the value of ap is undefined after the return of that function.

In systemd-191/src/shared/log.c, we have the following problematic code in log_struct_internal():

705                 va_start(ap, format);
706                 while (format && n + 1 < ELEMENTSOF(iovec)) {
707                         char *buf;
709                         if (vasprintf(&buf, format, ap) < 0) {
710                                 r = -ENOMEM;
711                                 goto finish;
712                         }
714                         IOVEC_SET_STRING(iovec[n++], buf);
716                         iovec[n].iov_base = (char*) &nl;
717                         iovec[n].iov_len = 1;
718                         n++;
720                         format = va_arg(ap, char *); <---- use of ap here is invalid/non-portable
721                 }
722                 va_end(ap);

741                 /* Fallback if journal logging is not available */
743                 va_start(ap, format);
744                 while (format) {
746                         vsnprintf(buf, sizeof(buf), format, ap);
747                         char_array_0(buf);
749                         if (startswith(buf, "MESSAGE=")) {
750                                 found = true;
751                                 break;
752                         }
754                         format = va_arg(ap, char *); <---- use of ap here is invalid/non-portable
755                 }
756                 va_end(ap);

It appears that the assumption has been made that ap is internally incremented as values are formatted into buf, but there is no guarantee that this will occur. Apparently it does occur for some architecture/compiler/libcs, while not for others. This is truly unfortunate as the vsnprintf/vasprintf-relaled code now in log_struct_internal() is, in my judgement, quite elegant.

One way around this would be to write code to parse the format strings enough to figure out the number and types of args associated with each format (tricky and counter to the use of vsnprintf/vasprintf in the first place, and probably a pain to maintain and port).

I have attached a patch which 'fixes' this issue in the sense that I'm now up and running with systemd-191, and all appears to be well.. but it's a bit of a hack, and admittedly rather ugly. What I've done is to pass each format-vararglist pair in now occuring in log_struct_internal() calls to a new function log_struct_entry() which produces the desired string. Then, log_struct_internal() calls simply have to deal with a vararglist of strings.

hope this helps, and thanks for your time,

PS: I've been using systemd since v188 (w/o sysvinit) for desktop/laptops (and even in ramroots) and love it. The unit file syntax is nice and intuitive and things mostly work great out-of-the-box. This, is after several years of using Upstart, which worked adequately, but still was not as flexible as it should've been, and a was bit fragile..
Comment 1 Oleksii Shevchuk 2012-09-23 08:47:27 UTC
*** Bug 55231 has been marked as a duplicate of this bug. ***
Comment 2 Zbigniew Jedrzejewski-Szmek 2012-09-24 10:31:46 UTC
I think that the analysis is correct. I don't like the posted patch though. It is intrusive, and the formatting is done early, and the message might be skipped afterwards.

I have a patch in the works, will post soon, taking a different approach:
va_copy(aq, ap)
vasprintf(aq, ...)
update original ap using parse_printf_format
continue with va_arg(ap, ...)

The downsides are: slight ugliness of having to update ap manually, C99 requirement for parse_printf_format.
Comment 3 Lennart Poettering 2012-09-24 11:00:29 UTC
It really sucks that we have to go though the format string twice. But well, I guess the nice syntax does justify this. 

I wasn't aware of the existance of parse_printf_format(). Sounds like the best possible choice to fix this! Thanks for finding that!
Comment 4 jpsinthemix 2012-09-24 11:20:13 UTC
Wholeheartedly agree.

I have a nicer patch which I've been using for several days w/o problems which does enough format string parsing to allow manual ap updating (turns out to be not that difficult). On the other hand, it'd probably be better/safer to go with parse_printf_format (didn't know about that one). I'll post it for your amusement..
thanks for your time,
Comment 5 jpsinthemix 2012-09-24 11:22:19 UTC
Created attachment 67625 [details]
Working (less ugly) fix of early boot vsnprintf/vasprintf segfault
Comment 6 Lennart Poettering 2012-09-24 21:27:56 UTC
OK, I have now commited a fix for this, using Zbigniew's idea. (Sorry for being impatient, I'd really like 192 out of the door, fixing this).
Comment 7 Lennart Poettering 2012-09-24 21:29:11 UTC
Oh, I'd be really thankful if somebody could test this fix on his machine, since I actually never ran into it myself!

Comment 8 jpsinthemix 2012-09-24 22:57:26 UTC
Thumbs up here.
Comment 9 Gert Michael Kulyk 2012-09-25 05:32:48 UTC
Seems like your commit is fixing this. Thanks.
Comment 10 Lennart Poettering 2012-09-25 10:40:37 UTC
OK, closing then.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.