Bug 30292

Summary: openssl backend does not handle fatal errors during async writes correctly
Product: Telepathy Reporter: Vivek Dasmohapatra <vivek>
Component: gabbleAssignee: Vivek Dasmohapatra <vivek>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: normal    
Priority: medium Keywords: patch
Version: git master   
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/vivek/wocky.git;a=shortlog;h=refs/heads/async-tls-write-error-handling
Whiteboard: review+
i915 platform: i915 features:

Description Vivek Dasmohapatra 2010-09-20 16:18:03 UTC
Currently the openssl error retries partial writes on all errors, it
should only do so on EWOULDBLOCK/G_IO_ERROR_WOULD_BLOCK.
Comment 1 Will Thompson 2010-09-21 03:23:19 UTC
The commit message doesn't really describe the bug or the fix very well. How about something like:

  Don't retry indefinitely on failed OpenSSL writes

  Previously, the response to any partial write was to try again later, even if the error returned indicated that it would never succeed. Instead, we should only retry later if WOULD_BLOCK is returned (indicating only that the write buffer is full, as opposed to an actual error).

  This error was only in the OpenSSL backend; gnutls does not require us to implement this logic at all.

+      else
+        {
+          /* should never happen - GIO bug? */
+          g_warning ("Incomplete GIO operation did not return an error");
+        }

Company on #gnome-hackers suggests that a partial write with no error is reasonable behaviour, as does man 2 write. And the documentation for g_output_stream_write_async() suggests the same:

> On success, the number of bytes written will be passed to the
> callback. It is not an error if this is not the same as the requested
> size, as it can happen e.g. on a partial I/O error, but generally we
> try to write as many bytes as requested.

So it seems like this should indeed fall through to the same retry path as WOULD_BLOCK, but that a warning is overly alarmist.

+          /* EWOULDBLOCK - expected for async operations. Anything else *
+           * is actually a real error an indicates we should abort      */

                                   and ^^

I'm kind of allergic to this shape of comment block, preferring

  /* Foo bar
   * and also baz
   */

but that could just be me. That style seems more common in these codebases, though.

Maybe the whole block for written != buffered could be made easier to follow with a bit of reordering. With your patch, the logic is:

 • check error, and frob it a bit
 • update the buffer
 • and then check the error again

It would be clearer if the buffer update came before all the error-related code. 

+        wocky_tls_session_try_operation (session, WOCKY_TLS_OP_WRITE);

Supposedly this invocation makes us bail out. It's not clear to me how this works. But maybe this is me being unfamiliar with the code.
Comment 2 Will Thompson 2010-09-21 03:25:46 UTC
+      /* scrub the data we did manage to write frm our buffer */

                                           from ^^^
Comment 3 Will Thompson 2010-09-21 03:26:52 UTC
1525             DEBUG ("Incomplete async write [%d/%d bytes]: %s"

The size parameters are gssize. The format string should use G_GSSIZE_FORMAT rather than making unportable assumptions.
Comment 4 Vivek Dasmohapatra 2010-09-21 17:20:36 UTC
Updated branch.

wocky_tls_session_try_operation inspects the state of the session,
which includes the error resulting from the last actual write/read
attempt (if we didn't scrub it), and completes the simple async 
result appropriately, it's the same exit point as for the 
"write completed successfully" block above the incomplete/error
block. (In fact, it's the exit point for the async read, write and 
handshake op circuits)
Comment 5 Will Thompson 2010-09-22 02:18:19 UTC
While we're here:


+          gint ignore_warning;
+          gchar *pending = g_memdup (buffer + written, psize);
+
+          ignore_warning = BIO_reset (session->wbio);
+          ignore_warning = BIO_write (session->wbio, pending, psize);

Why is it okay to ignore the return value of BIO_reset and BIO_write here? I think ignore_warning was added to squash a Coverity warning. It might be nice to add a comment explaining why it's there and why it's kosher.
Comment 6 Vivek Dasmohapatra 2010-09-22 04:17:18 UTC
Comment added. Basically, memory BIO ops can't fail (at least not the ones we're using here). The warnings are from the compiler as well as coverity type tools.
(cast in void context iirc).
Comment 7 Will Thompson 2010-09-22 04:42:43 UTC
ship it!
Comment 8 Will Thompson 2010-10-25 02:51:49 UTC
this was shipped.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.