[ccan] RFC: draft rfc822 module

Jonathan Nieder jrnieder at gmail.com
Mon May 21 12:38:09 EST 2012


Hi David,

David Gibson wrote:

> Here's a draft of a module for parsing RFC822(/RFC2822/RFC5322)
> messages.  It doesn't do a heap at the moment, and it needs some work
> on stuff to match the RFC better, but here's what I have now for
> comment.

Thanks for writing this.

Quick comments follow, mostly nitpicky stuff.

[...]
> --- /dev/null
> +++ b/ccan/rfc822/_info
> @@ -0,0 +1,54 @@
> +#include "config.h"
> +#include <stdio.h>
> +#include <string.h>
> +
> +/**
> + * rfc822 - Parsing of RFC822 emails
> + *
> + * This code allows easy processing of RFC822/RFC2822/RFC5322
> + * formatted email messages.  For now only read-only operation is
> + * supported.
> + *
> + * The important design goals are these:
> + * - Be lazy.  Don't compute immediately compute fancy indexes for the
> + *   mesage.  Just reading messages into the system, then sending
> + *   them out again should not incur a serious performance hit.

sed -i -e '
	s/mesage/message/
	s/system, then/system and then/
' ccan/rfc822/_info

> + * - But cache.  Once the user does request data that needs parsing,
> + *   cache the results in suitable data structures so that if lots
> + *   more lookups are done they're then fast.
> + * - Cope with ill-formatted messages.  Even if the input is not
> + *   RFC822 compliant, don't SEGV and try to return as much useful
> + *   data as possible.

sed -i -e "
	s/don't SEGV and try/don't SEGV but try/
" ccan/rfc822/_info

> + *
> + * Example:
> + *	char buf[] = "From: <from at example.com>\n"
> + *		     "To: <to at example.com>\n\n"
> + *                   "body\n";
> + *	struct rfc822_msg *msg;
> + *	struct bytestring body;
> + *
> + *	msg = rfc822_start(NULL, buf, sizeof(buf));
> + *	body = rfc822_body(msg);
> + *	fwrite(body.ptr, 1, body.len, stdout);

Nice.

[...]
> --- /dev/null
> +++ b/ccan/rfc822/rfc822.c
> @@ -0,0 +1,257 @@
[...]
> +static void allocation_failure(const char *s)
> +{
> +	if (allocation_failure_hook)
> +		(*allocation_failure_hook)(s);
> +	else
> +		default_allocation_failure(s);
> +}
> +
> +void rfc822_set_allocation_failure_handler(void (*h)(const char *))
> +{
> +	allocation_failure_hook = h;
> +}
> +
> +#define ALLOC_CHECK(p, r) \
> +	do { \
> +		if (!(p)) { \
> +			allocation_failure(__FILE__ ":" stringify(__LINE__)); \
> +			return (r); \
> +		} \
> +	} while (0)

For the future: would this make sense as its own ccan module?

Since the failure handler is not allowed to free up some memory and
ask the caller to try again, this is basically acting like a second
return value.  But unlike providing a second return value, this method
saves callers from the trouble of checking the return value at all
call sites.

The callback doesn't need a context cookie because it is global
(static).  Any context it needs can sit in a static variable alongside
it.

The downside is that if two independent callers want to use the rfc822
module at the same time, they must use the same failure handler.

> +
> +struct rfc822_msg {
> +	const char *data, *end;
> +
> +	const char *remainder;
> +
> +	struct list_head headers;
> +
> +	const char *body;
> +};

Just as clear to say

	struct rfc822_msg {
		const char *data, *end;
		const char *remainder;
		struct list_head headers;
		const char *body;
	};

(less vertical whitespace).

[...]
> +static const struct rfc822_header *next_header_cached(struct rfc822_msg *msg,
> +						      const struct rfc822_header *hdr)
> +{
> +	struct list_node *h = &msg->headers.n;
> +	const struct list_node *n = h;
> +
> +	CHECK(msg);
> +
> +	if (hdr)
> +		n = &hdr->list;
> +
> +	if (n->next == h)
> +		return NULL;
> +
> +	CHECK(msg);
> +
> +	return list_entry(n->next, struct rfc822_header, list);
> +}

What is this function about?

[...]
> +struct bytestring rfc822_body(struct rfc822_msg *msg)
> +{
> +	const char *p = msg->body;
> +
> +	CHECK(msg);
> +
> +	if (!msg->body && msg->remainder) {
> +		p = memmem(msg->remainder, msg->end - msg->remainder,
> +			   "\r\n\r\n", 4);
> +		if (p) {
> +			p += 4;
> +		} else {
> +			p = memmem(msg->remainder, msg->end - msg->remainder,
> +				   "\n\n", 2);

What happens if my message contains \n\n before \r\n\r\n?  (For
example, imagine a message that uses Unix line-endings with an
attached DOS-style text file with Content-Transfer-Encoding: 8bit.)
Not sure what the right thing to do is in that case.  Whatever choice
you make, I imagine documenting it would help avoid surprises.

[...]
> --- /dev/null
> +++ b/ccan/rfc822/rfc822.h
> @@ -0,0 +1,81 @@
[...]
> +/**
> + * rfc822_free - get pointer to enclosing structure
> + * @msg:
> + *
> + * Given a pointer to a member of a structure, this macro does pointer
> + * subtraction to return the pointer to the enclosing type.
> + */
> +void rfc822_free(struct rfc822_msg *msg);

Looks like a copy+paste went awry.

[...]
> +int rfc822_header_valid(const struct rfc822_header *hdr);

I would be tempted to call a function with this name when I want to
lint a header field to make sure it uses the right encoding, complies
with any relevant standards, etc, but I would be wrong.  How is this
function intended to be used?

[...]
> --- /dev/null
> +++ b/ccan/rfc822/test/run-allocation-failure.c
> @@ -0,0 +1,56 @@
[...]
> +	talloc_set_allocator(failing_malloc, free, realloc);

By the way, as an alternative to the rfc822 allocation failure
callback, I can set up exit-cleanly-on-failed-allocation semantics
here.  I can even use talloc_add_external to make my handler's
behavior depend on the talloc context.

Thanks for a pleasant read.

Hope that helps,
Jonathan


More information about the ccan mailing list