Bug 11411

Summary: ParseComposeStringFile() is dubious
Product: UIM Reporter: Christian Biere <christianbiere>
Component: bridge: GTK+Assignee: uim-bugs
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Handle st.st_size correctly

Description Christian Biere 2007-06-28 13:19:52 UTC
First of all it ignores that st.st_size is off_t which is usually wider than "long" which can result in allocating to little memory as the cast will truncate the value. If the file is 4 GiB + 1 byte large, only 1 byte will be allocated resulting in a buffer overflow.

However, even with proper checks in place the file can grow after fstat() and the initial memory allocation might be insufficient resulting in a buffer overflow again. The parsing code should be changed to take the size of the allocated buffer into account.
Comment 1 Christian Biere 2007-06-28 13:21:37 UTC
Created attachment 10491 [details] [review]
Handle st.st_size correctly

The attached patch is only a partial fix for the first part of the issue to correct the handling of st.st_size.
Comment 2 Etsushi Kato 2007-07-09 05:54:33 UTC
Fixed in r4670 trunk.
Comment 3 Etsushi Kato 2007-07-11 05:41:34 UTC
I've noticed the your assumption about the buffer is incorrect.  There is no such a buffer overflow case as the allocated buffer is only used for a token.

Anyway thank for your report. The code you pointed out was totally brain damaged to allocate huge unused space.
Comment 4 Christian Biere 2007-07-11 06:50:51 UTC
No, it is really possible to cause a buffer overflow, if the file changes after you've determined its filesize because the code assumes no token will be larger than the file itself. This is a reasonable assumption, if and only if you're working with atomic objects or using atomic operations. That's not the case here.

If you look at nexttoken() in gtk/compose.c (the old version before your edit) you can see that there are two loops which only terminate if certain characters are found and otherwise keep on writing to the pre-allocated token buffer. As you basically you use getc() to fetch characters from the file, the originally determined filesize is nowhere taken into account. Thus in case of a growing file, you may very well run over the last element of the token buffer.

Note, I'm very well aware that this scenario is far from being a realistic security issue or the like. This was not the point of my report. The point was simply that this parsing algorithm was fundamentally flawed. If and only if you read the whole file into memory for parsing and ensured that the last character is a terminator, then this algorithm could be applied.
Comment 5 Etsushi Kato 2007-07-11 08:26:11 UTC
> No, it is really possible to cause a buffer overflow, if the file changes after
> you've determined its filesize because the code assumes no token will be larger
> than the file itself.

Ah.  That is possible.  Now, current code (r4706) checks the token size itself, so the problem is fixed I think.

Thanks for the comment.  It helps very much.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.