[ccan] [PATCH] stringbuilder: Functions for joining strings.

Stuart Longland stuartl at longlandclan.yi.org
Sat Aug 23 09:28:20 EST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 18/08/14 13:26, David Gibson wrote:
> On Sun, Aug 17, 2014 at 09:16:05PM +1000, 
> stuartl at longlandclan.yi.org wrote:
>> From: Stuart Longland <stuartl at longlandclan.yi.org>
>> 
>> This is a small couple of functions for joining lists of strings 
>> together.  The lists can either be varadic arguments or arrays, 
>> and delimiters are optional.
> 
> So, I'm a bit torn as to whether this makes more sense as its own 
> module or an extension to the 'str' module.  I'd also suggest 
> putting "join" somewhere in the title, since that's the name the 
> operation you're performing often goes by (including in the tal/str
> module).

I did think about this too… I thought I'd play it safe by putting it
by itself, with a view that it could be merged with the 'str' module
if needed.

As to naming, I was thinking of "building up a string", since delim
can be NULL indicating no delimiter, and then it's like repeatedly
calling strcat, except it'll take care of how long the destination
buffer is.

> [snip]
>> diff --git a/ccan/stringbuilder/stringbuilder.c 
>> b/ccan/stringbuilder/stringbuilder.c new file mode 100644 index 
>> 0000000..cb2bc2d --- /dev/null +++ 
>> b/ccan/stringbuilder/stringbuilder.c @@ -0,0 +1,71 @@ +/* CC0 
>> (Public domain) - see LICENSE file for details */ +#include 
>> <ccan/stringbuilder/stringbuilder.h> +#include <string.h> 
>> +#include <errno.h> + +int stringbuilder(const char* delim, char*
>> str, size_t str_sz, ...)
> 
> I'd suggest putting the destination arguments (str and str_sz) 
> first in all these functions, to match sprintf(), strcat() and 
> others.

A fair point, I'll look into this.

>> +{ +	int res; +	va_list ap; +	va_start(ap, str_sz); +	res = 
>> stringbuilder_va(delim, str, str_sz, ap); +	va_end(ap); +	return
>>  res; +} + +static inline int stringbuilder_cpy( +		char** str, 
>> size_t* str_sz, const char* s, size_t s_len)
> 
> Because this isn't in a header, you shouldn't specify "inline" here
> - the compiler can work out for itself whether it's best to 
> inline.

Ahh okay, my intent was to hint to the compiler that the function
*should* be inlined since it'll be called a lot.

>> +{ +	if (*str != s) { +		if (!s_len) +			s_len = strlen(s); + if
>> (s_len > *str_sz) +			return EMSGSIZE; +		strcpy(*str, s); +	} +
>> *str += s_len; +	*str_sz -= s_len; +	return 0; +} + +int 
>> stringbuilder_va(const char* delim, char* str, size_t str_sz, 
>> va_list ap) +{ +	int res = 0; +	size_t delim_len = 0; +	const 
>> char* s = va_arg(ap, const char*); + +	if (delim) +		delim_len =
>>  strlen(delim); + +	if (s) +		res = stringbuilder_cpy(&str, 
>> &str_sz, s, 0); +	while(s && !res) { +		s = va_arg(ap, const 
>> char*);
> 
> I think you can simplify the while/if logic here a bit, if you put
>  the va_arg assignment into the loop condition.

I'll admit I'm not used to doing assignments inside a conditions test,
something about it looks unclear and awkward to me so I tend not to do
it, but I'll have a look and see.

>> +		if (s && delim) +			res = stringbuilder_cpy(&str, &str_sz, + 
>> delim, delim_len);
> 
> And you can simplify it further, if you make stringbuilder_cpy 
> treat NULL as an empty string.

Fair point, I'll look into that.

>> +		if (s && !res) +			res = stringbuilder_cpy(&str, &str_sz, +
>> s, 0); +	} +	return res; +} + +int stringbuilder_array(const
>> char* delim, char* str, size_t str_sz, +		const char** strings,
>> size_t strings_sz)
> 
> I'd suggest renaming strings_sz to nstrings or something, since 
> it's not a bytelength like the other _sz parameters.  I'd also 
> suggest putting it before strings, since (argc, argv) establishes a
> pretty strong precedent that the count goes first when dealing with
> count/array pairs.

Well, I consider the _sz argument as an element count; if it's a char*
then the "element" is the individual char, so that's where that naming
comes from.  It's the size of the array, which happens to be an array
of pointers.

Fair point about the ordering though: I guess I was mimicking what
existing string functions do where the buffer comes before its size
(e.g. snprintf).

>> +{ +	int res = 0; +	size_t delim_len = 0; + +	if (delim) + 
>> delim_len = strlen(delim); + +	while(strings_sz-- && !res) { + 
>> res = stringbuilder_cpy(&str, &str_sz, *(strings++), 0); +		if 
>> (!res && delim) +			res = stringbuilder_cpy(&str, &str_sz, + 
>> delim, delim_len);
> 
> IIUC correctly, if you just have  a single string in the array, 
> this will put a "delim" at the end, which I don't think is what you
> want, and doesn't see to match the varargs variants.

Ahh okay, another test case for that.  I had seen some cases where it
did put a extraneous delimiter at the end of a string, but haven't
considered the single string case.


>> +/** + * stringbuilder - Join strings from a varadic list.  The 
>> list of arguments + * are all assumed to be of type const char* 
>> and must end with a NULL.  If the + * first argument is str, then
>> the contents of str are preserved and appended + * to. + * + *
>> @delim:	A delimiter to separate the strings with, or NULL. + * 
>> @str:	A pointer to a string buffer that will receive the result. 
>> + * @str_sz:	Size of the buffer pointed to by str. + * + * 
>> Returns:	0 on success + * 		EMSGSIZE if the resulting string 
>> would overflow the buffer. + * 		If an overflow condition is 
>> detected, the buffer content is + * 		NOT defined.
> 
> Returning an errno code is not a normal error interface.  Usually 
> you'd return -1 (or something negative), and set errno to 
> EMSGSIZE.

My preference for doing the former is that it's more thread safe,
something errno is definitely not.

I recognise that it's not conventional practice but if I'm going to
return one thing for "all okay" and another for "error", then why not
return the error code via a route where it's less likely to get
splattered by another system call?

>> + * + * Example: + * 	int res; + * 	char file_name[80]; + * 	res
>>  = stringbuilder("/", file_name, sizeof(file_name), + * 
>> "/var/lib/foo", "bar", "baz", + * 		NULL); + * 	if (res) + * 
>> printf("Failed to determine file name: %s", + * strerror(res));
>> + * 	else + * 		printf("File is at %s", file_name); + */ +int 
>> stringbuilder(const char* delim, char* str, size_t str_sz, ...);
> 
> It would be nice to use a macro wrapper around this to avoid having
> to explicitly include the terminating NULL.

Ahh okay, I hadn't thought of that option.  I'll do so.

>> + +/** + * stringbuilder_va - Join strings from a varadic list. 
>> The list of arguments + * are all assumed to be of type const 
>> char* and must end with a NULL.  If the + * first argument is 
>> str, then the contents of str are preserved and appended + * to. 
>> + * + * @delim:	A delimiter to separate the strings with, or 
>> NULL. + * @str:	A pointer to a string buffer that will receive 
>> the result. + * @str_sz:	Size of the buffer pointed to by str. +
>>  * + * Returns:	0 on success + * 		EMSGSIZE if the resulting 
>> string would overflow the buffer. + * 		If an overflow condition
>>  is detected, the buffer content is + * 		NOT defined. + * + * 
>> Example: + * 	#include <ccan/stringbuilder/stringbuilder.h> + * 
>> #include <stdarg.h> + * 	#include <stdio.h> + * 	#include 
>> <string.h> + * 	#include <errno.h> + * + *	int 
>> my_stringbuilder(const char* delim, char* str, + *		size_t 
>> str_sz, ...); + * + *	int my_stringbuilder(const char* delim, 
>> char* str, + *		size_t str_sz, ...) + *	{
> 
> Somewhat contrived example, but ok..

Well, stringbuilder_va is the basis for the stringbuilder function.
I'm not sure that it's valid to just point to the source code as an
example of usage.

So I just used the same function body, giving it a different name.


>> + +/* + * Even though stringbuilder is implemented using + * 
>> stringbuilder_va, we need to test it explicitly.
> 
> Really?  Why?

I honestly don't know, but it wasn't happy until I did so.
Or maybe I was misinterpreting ccanlint.  It'll give a score, but then
leaves me to figure out what I missed.

Perhaps I haven't fully figured out how to drive it yet. :-)

>> +int main(int argc, char *argv[]) +{ +	char string[20]; +	const 
>> char* str_array[] = { +		"xxx", "yyy" +	}; +	int res; + +	res = 
>> test_stringbuilder_va(NULL, string, sizeof(string), +			"aaa", 
>> "bbb", NULL); +	printf("res: %s, string: %s\n", + strerror(res), 
>> string); +	ok1(res == 0); +	ok1(!strcmp(string, "aaabbb"));
> 
> I suggest using streq() from ccan/str, rather than the potentially 
> confusing !strcmp construction.

I did try this, but had all kinds of issues with getting the tests to
compile.  I don't recall the exact errors I got.

>> +	res = stringbuilder_array(NULL, string, sizeof(string), + 
>> str_array, sizeof(str_array)/sizeof(str_array[0])); + 
>> printf("res: %s, string: %s\n", +			strerror(res), string); + 
>> ok1(res == 0); +	ok1(!strcmp(string, "xxxyyy"));
> 
> You don't have a testcase for stringbuilder_array with a
> delimiter, which I think would find a bug as noted above.

No problems, I'll look into that too.

Thanks for the feedback.
Regards,
- -- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJT99IUAAoJEE36GRQQveO3wEwQALNl7/JrA2dhg5QxenOmu1R5
LLts+Tz9pYrwjq8TjrUtkjE9tJUfYdcFHZekkw3JXblpxs4CpXNn0nsECEzGHH6y
KLw8CvSBp2hUjVTz3T3ukgPxR41t12RnHqVvPf5aKsZFnojuhnc9svad2872Yuzm
w/U7NaXleXxGH4oFsSjUvJS//I7AB5fh105ooo4mLvGhZOh7wELmr6/LHmMvrpAU
QUGq0DJSwuE3pT57uG86RqNOCf3/O7t1dPf8GJZf0ZlT17rmZKzPM2yiPS5Bh4rB
lollUHL7aie9KF/kFZX2gmmSHIV7r95LTBFGte7r+rt07Flk+dGcoH41Rh64ZfOy
tV0n0oXlbY9fjvrtByb7xVqR+b+IJAD9Q1X4vR32ywaWZ1LT1ty7HyuUv75OhLJd
fMM4NFy7lDogeUkgGwDYZqmLFKkV/6YeR5aCjmHNeSOY0R9trApzWvq3IqqGrvdD
dbQ8xlJYV5Rfu554DTXTh1qbRIozD3H8lDAbnhVT9bW9nddyMWqjWP53HLGdO/g9
1i5xqGtYsOAxUR5Y2BYEhM5shHV9clQQLDvqar32Ss7cvUMvDlbZCpmJqcw5/3pz
YuFBdyEogyroBPy/ITaRu//VINpVhs7U544w9RJOYTzON4yYOyvADxL3ADWtB2ju
xAp3w92mtGOc5zn9rR3a
=OVKO
-----END PGP SIGNATURE-----


More information about the ccan mailing list