[ccan] RFC: draft rfc822 module

David Gibson david at gibson.dropbear.id.au
Mon May 21 20:38:41 EST 2012


On Sun, May 20, 2012 at 09:38:09PM -0500, Jonathan Nieder wrote:
> Hi David,
> 
> David Gibson wrote:
> 
> > Here's a draft of a module for parsing RFC822(/RFC2822/RFC5322)
> > messages.  It doesn't do a heap at the moment, and it needs some work
> > on stuff to match the RFC better, but here's what I have now for
> > comment.
> 
> Thanks for writing this.
> 
> Quick comments follow, mostly nitpicky stuff.
> 
> [...]
> > --- /dev/null
> > +++ b/ccan/rfc822/_info
> > @@ -0,0 +1,54 @@
> > +#include "config.h"
> > +#include <stdio.h>
> > +#include <string.h>
> > +
> > +/**
> > + * rfc822 - Parsing of RFC822 emails
> > + *
> > + * This code allows easy processing of RFC822/RFC2822/RFC5322
> > + * formatted email messages.  For now only read-only operation is
> > + * supported.
> > + *
> > + * The important design goals are these:
> > + * - Be lazy.  Don't compute immediately compute fancy indexes for the
> > + *   mesage.  Just reading messages into the system, then sending
> > + *   them out again should not incur a serious performance hit.
> 
> sed -i -e '
> 	s/mesage/message/
> 	s/system, then/system and then/
> ' ccan/rfc822/_info

Thanks

[snip]
> > +static void allocation_failure(const char *s)
> > +{
> > +	if (allocation_failure_hook)
> > +		(*allocation_failure_hook)(s);
> > +	else
> > +		default_allocation_failure(s);
> > +}
> > +
> > +void rfc822_set_allocation_failure_handler(void (*h)(const char *))
> > +{
> > +	allocation_failure_hook = h;
> > +}
> > +
> > +#define ALLOC_CHECK(p, r) \
> > +	do { \
> > +		if (!(p)) { \
> > +			allocation_failure(__FILE__ ":" stringify(__LINE__)); \
> > +			return (r); \
> > +		} \
> > +	} while (0)
> 
> For the future: would this make sense as its own ccan module?

Hrm.  I'm not quite sure how that would work.

> Since the failure handler is not allowed to free up some memory and
> ask the caller to try again, this is basically acting like a second
> return value.  But unlike providing a second return value, this method
> saves callers from the trouble of checking the return value at all
> call sites.

That's the idea.

I can see some value in allowing for a 'retry' option, but it does
increase the complexity.

> The callback doesn't need a context cookie because it is global
> (static).  Any context it needs can sit in a static variable alongside
> it.
> 
> The downside is that if two independent callers want to use the rfc822
> module at the same time, they must use the same failure handler.

Or be in separate processes.  I don't think that's a serious
limitation, really.  Especially since, frankly, I think almost no-one
will ever actually use the override - cases in which you can usefully
do something other than abort() on allocation are pretty rare.  And I
expect that the majority of users who do override will just do some
sort of custom error logging followed by an abort or exit.

[snip]
> Just as clear to say
> 
> 	struct rfc822_msg {
> 		const char *data, *end;
> 		const char *remainder;
> 		struct list_head headers;
> 		const char *body;
> 	};
> 
> (less vertical whitespace).

Yeah, fair enough.

[snip]
> [...]
> > +static const struct rfc822_header *next_header_cached(struct rfc822_msg *msg,
> > +						      const struct rfc822_header *hdr)
> > +{
> > +	struct list_node *h = &msg->headers.n;
> > +	const struct list_node *n = h;
> > +
> > +	CHECK(msg);
> > +
> > +	if (hdr)
> > +		n = &hdr->list;
> > +
> > +	if (n->next == h)
> > +		return NULL;
> > +
> > +	CHECK(msg);
> > +
> > +	return list_entry(n->next, struct rfc822_header, list);
> > +}
> 
> What is this function about?

The idea is that we parse the boundaries between headers only once -
if the caller traverses the headers again we just retrieve the already
parsed header list again.

[snip]
> > +struct bytestring rfc822_body(struct rfc822_msg *msg)
> > +{
> > +	const char *p = msg->body;
> > +
> > +	CHECK(msg);
> > +
> > +	if (!msg->body && msg->remainder) {
> > +		p = memmem(msg->remainder, msg->end - msg->remainder,
> > +			   "\r\n\r\n", 4);
> > +		if (p) {
> > +			p += 4;
> > +		} else {
> > +			p = memmem(msg->remainder, msg->end - msg->remainder,
> > +				   "\n\n", 2);
> 
> What happens if my message contains \n\n before \r\n\r\n?  (For
> example, imagine a message that uses Unix line-endings with an
> attached DOS-style text file with Content-Transfer-Encoding: 8bit.)
> Not sure what the right thing to do is in that case.  Whatever choice
> you make, I imagine documenting it would help avoid surprises.

Ah.  That is an excellent point, I'll need to revise what I do here.
Of course, a strict reading of the RFC says *only* CRLF should be
recognized, but LF-only messages are so common on Unix systems that
that's not really practical.  So my plan was to basically treat CRLF
and LF as equivalent - but as you've pointed out, what I have above
doesn't quite do that.

[snip]
> > +/**
> > + * rfc822_free - get pointer to enclosing structure
> > + * @msg:
> > + *
> > + * Given a pointer to a member of a structure, this macro does pointer
> > + * subtraction to return the pointer to the enclosing type.
> > + */
> > +void rfc822_free(struct rfc822_msg *msg);
> 
> Looks like a copy+paste went awry.

Oops, yes.

> [...]
> > +int rfc822_header_valid(const struct rfc822_header *hdr);
> 
> I would be tempted to call a function with this name when I want to
> lint a header field to make sure it uses the right encoding, complies
> with any relevant standards, etc, but I would be wrong.  How is this
> function intended to be used?

That is actually the idea, at least linting the properties that aren't
specific to a particular header type (which isn't much).  It's just
that not much of that is implemented yet.

I was actually planning to invert this and have an
rfc822_header_errors() which would return a bitmap of possible errors
with the header (no colon, bad characters in name, ...).


> [...]
> > --- /dev/null
> > +++ b/ccan/rfc822/test/run-allocation-failure.c
> > @@ -0,0 +1,56 @@
> [...]
> > +	talloc_set_allocator(failing_malloc, free, realloc);
> 
> By the way, as an alternative to the rfc822 allocation failure
> callback, I can set up exit-cleanly-on-failed-allocation semantics
> here.  I can even use talloc_add_external to make my handler's
> behavior depend on the talloc context.

Um.. I'm not sure what you mean by this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson


More information about the ccan mailing list