[ccan] tok_itr pull request

Wed Apr 3 11:03:25 EST 2013

anto canto <ar.cantor at gmail.com> writes:
> Hello Rusty,

Hi Anto!

> I created a pull request on github for a small module which iterates over
> token strings. If you don't want to pull it that's fine, but since I
> haven't gotten a response from you on github I wanted to point it out here
> on the mailing list.

Hmm, I missed another one.  Ah, github doesn't put my email address in
the to: header, which was defeating my filters.  Fixed, thanks.

Now, as to your module.  Since it's your first contribution, I'll give
as much advice as I can.  You don't need to follow *any* of it if you
don't want to, though.

> +++ b/ccan/tok_itr/_info
> @@ -0,0 +1,41 @@
> +#include <string.h>
> +#include "config.h"
> +
> +/**
> + * tok_itr - iterate over tokens in a string
> + *
> + * The list header contains routines for manipulating double linked lists.
> + * It defines two types: struct list_head used for anchoring lists, and
> + * struct list_node which is usually embedded in the structure which
> is place> d
> + * in the list.

This is the wrong description :)

> diff --git a/ccan/tok_itr/tok_itr.c b/ccan/tok_itr/tok_itr.c
> new file mode 100644
> index 0000000..63db8f6
> --- /dev/null
> +++ b/ccan/tok_itr/tok_itr.c
> @@ -0,0 +1,49 @@
> +/* Licensed under LGPL - see LICENSE file for details */
> +#include "tok_itr.h"
> +
> +void tok_itr_init(struct tok_itr *itr, const char *list, char delim) {
> +
> +	itr->delim = delim;
> +	itr->curr = list;
> +}
> +
> +int tok_itr_end(const struct tok_itr *itr) {
> +	return (itr->curr == NULL);
> +}

Shouldn't this return bool?

> +size_t tok_itr_val(const struct tok_itr *itr, char *val, size_t size) {
> +	char *next;
> +	size_t amt, len;
> +
> +	if(size < 1)
> +		return 0;

I thought this function was supposed to return the length of the token,
but here it's returning 0.

> +	next = strchr(itr->curr, itr->delim);
> +	if(next == NULL)
> +		len = strlen(itr->curr);
> +	else if(next == itr->curr)
> +		len = 0;
> +	else  /* next > itr->curr */
> +		len = next - itr->curr;

Note that the "if(next == itr->curr)" is redundant, as the else
statement does the right thing.

> +	/* only want to copy at most size - 1 bytes because
> +	 * we need to save the last spot for '\0' */
> +	size--;
> +	amt = (len > size)? size : len;
> +
> +	if(amt > 0)
> +		strncpy(val, itr->curr, amt);

This can be a memcpy, since you know amt is < strlen(itr->curr).

> +
> +	/* just copied bytes 0->(amt-1) for total of *amt* bytes.
> +	 * now add the null byte to pos at *amt*       */
> +	val[amt] = '\0';
> +
> +	return len;

I'm uncomfortable with this, because (as your examples show) it usually
means people will truncate tokens.  I think we can do better.

How about we allocate if the token doesn't fit in the buffer, and cache
the pointer in struct tok_itr?

struct tok_itr {
        char delim;
        const char *curr;
        size_t alloc_len;
        char *alloc_buf;
};

static inline void tok_itr_init(struct tok_itr *itr,
                                const char *list, char delim)
{
        itr->delim = delim;
        itr->curr = list;
        itr->alloc_len = 0;
        itr->alloc_buf = NULL;
}

Now we can make tok_itr the fundamental function:

/* Returns buf, or an allocated buffer if it doesn't fit.  Returns NULL
 * on no-more-tokens or malloc failure. */
char *tok_itr(struct tok_itr *itr, char *buf, size_t buflen)
{
	char *next;
        size_t toklen;

        if (!itr->curr)
                goto cleanup;

        next = strchr(itr->curr, itr->delim);
	if (next == NULL)
                toklen = strlen(itr->curr);
        else
                toklen = next - itr->curr;

        /* Can't fit in their buffer (incl nul terminator)? */
	if (buflen < toklen + 1) {
                /* Resize/allocate our internal buffer */
                if (itr->alloc_len < toklen + 1) {
                        char *a = realloc(itr->alloc, toklen + 1);
                        if (!a)
                                goto cleanup;
                        itr->alloc = a;
                        itr->alloc_len = toklen + 1;
                }
                buf = itr->alloc;
        }

        memcpy(buf, itr->curr, toklen);
        buf[toklen] = '\0';
        itr->curr = next;
        return buf;

cleanup:
        free(itr->alloc);
        return NULL;
}

If you really want a foreach macro, it becomes:

#define TOK_ITR_FOREACH(buf, buf_size, ptr, list, delim, tok_itr_ptr)   \
	for (tok_itr_init(tok_itr_ptr, list, delim),                    \
                ptr = tok_itr(tok_itr_ptr, buf, buf_size);              \
                ptr;                                                    \
                ptr = tok_itr(tok_itr_ptr, buf, buf_size))

There need to be more brackets in here to protect complex expressions,
though, at minimum around (ptr) (which, admittedly, is unlikely to be a
complex expression):

	for (tok_itr_init(tok_itr_ptr, list, delim),                    \
                (ptr) = tok_itr(tok_itr_ptr, buf, buf_size);            \
                (ptr);                                                  \
                (ptr) = tok_itr(tok_itr_ptr, buf, buf_size))

Though I generally prefer my arguments in "context first" order, with
trivia at the end, which means tok_itr_ptr and list are probably needed
at the front:

        TOK_ITR_FOREACH(tok_itr_ptr, list, delim, ptr, buf, buf_size)

Example usage:
        TOK_ITR_FOREACH(&itr, "this is a test", ' ', p, NULL, 0) {

This is still a bit messy, so maybe the buffer should be in token setup,
not in the iteration.  We have space in the structure, so this is easy:

struct tok_itr {
        char delim;
        bool allocated;
        const char *curr;
        size_t buf_len;
        char *buf;
};

static inline void tok_itr_init(struct tok_itr *itr, char *buf, size_t buflen,
                                const char *list, char delim)
{
        itr->delim = delim;
        itr->allocated = false;
        itr->curr = list;
        itr->buf_len = buflen;
        itr->buf = buf;
}

This makes usage pretty simple, with no looping macro required.

static void print_split_by_colon(const char *string)
{
        struct tok_itr itr;
        char buf[32], *p;

        tok_itr_init(&itr, buf, sizeof(buf), string, ';');
        while (p = tok_itr(&itr))
                printf("%s\n", p);
}

Cheers,
Rusty.
PS.  Completely untested, of course!