[ccan] New module: bdelta
Rusty Russell
rusty at rustcorp.com.au
Sun Aug 21 21:12:31 EST 2011
On Fri, 19 Aug 2011 10:41:33 -0400, Joey Adams <joeyadams3.14159 at gmail.com> wrote:
> On Fri, Aug 19, 2011 at 2:17 AM, Rusty Russell <rusty at rustcorp.com.au> wrote:
> > I like it! It's a perfect example of a ccan module: it does one thing,
> > and does it well.
>
> Thanks!
>
> > A few minor nitpicks from reading the patch (I haven't applied it yet!)
> >
> > 1) I'm always tempted to reuse errno rather than invent my
> > own error codes (ENOMEM, EINVAL and EIO?), but otherwise I would do an
> > enum for the error codes; it just makes the API more explicit.
>
> My error strings are more descriptive. "Patch applied to wrong
> string" is far more descriptive than "Invalid argument".
Sure, but that's partially a documentation question.
> Besides, I
> prefer to be portable in a library, and ENOMEM/EINVAL/EIO are defined
> by POSIX, not by the C standards.
Well, we don't avoid write() and read() :)
> I was going to use an enum, but what would I call it? Remember that
> C++ would treat enum bdelta {...} as a type definition (rather than
> simply an enum tag), meaning "bdelta" could no longer be used as a
> variable name.
I don't really care about C++, but enum bdelta seems wrong anyway:
bdelta_err maybe?
> > 2) Maybe use the term 'char arrays' rather than 'byte strings' as
> > string has that 0-terminated sense for C coders.
>
> I wasn't sure what to call them. PostgreSQL calls them "byte arrays
> (BYTEA)". Haskell calls it ByteString. char almost implies plain
> text, while "byte" or "octet" suggests the library can be used with
> binary data. A "string" in many other programming languages can have
> null bytes.
"byte arrays"? I just did a double-take at the term 'strings'...
> Another naming consideration: I'm not sure whether to call it a
> "patch", "diff", or "delta". I called the library bdelta rather than
> bdiff mainly because bdiff_diff looks silly. I should probably use
> all of those terms so people using search engines can find what
> they're looking for.
>
> Note that I shunned the word "binary" in my documentation because the
> diff algorithm shortens at the byte level, not the bit level. An
> insertion of one bit would change all the bytes that follow.
Yeah, bdelta = byte delta?
> > 3) I wouldn't implement bdelta_perror personally, since I prefer
> > err/errx/warn/warnx.
>
> I'm following the lead of GnuTLS here, which has a function called
> gnutls_perror. It's really convenient to have, if you want to follow
> the good practice of handling all error codes but haven't become
> interested in the errors themselves yet.
Your implementation though, is *worse* than someone implementing it
themselves. I'm not arguing against bdelta_strerror(), just
bdelta_perror(). It's stupid.
If you're a library, you shouldn't be printing to stderr unless you're
about to die (cf. err()). Otherwise, you often want a format string
rather than a fixed string. I've found that generally when coders try
to use perror it makes their code worse.
> err/errx/warn/warnx aren't defined by C or POSIX, but are BSD extensions.
Absolutely! And when we find a platform which doesn't implement them,
I'll write ccan/err as a wrapper.
> > 4) We really need a good bitstring library: it'd be great for this!
>
> You just said "string"! :-)
Yeah, I suck :)
> > I think testing could be improved a bit. For example, I'd test the
> > diff results directly, before testing patching. eg. that "aaaa" vs
> > "aaaa" produced a non-diff, then "aaaa" vs "aaaab", etc. Then once the
> > diffs pass, test those patches.
>
> Indeed, bdelta could use more trivial tests. Someone on freenode/#c
> recently asked if anyone had any tests for longest common subsequence,
> and I pointed to bdelta. Having more tests would have been nice.
>
> Also, it would be nice to have some tests that test the backend
> algorithm itself (diff_myers), checking the result against expected,
> hard-coded patch strings.
>
> > As far as I can tell, if your diff never generated anything but whole
> > new strings, you'd pass?
>
> Indeed. Otherwise, it would be a performance test, and I'm not sure
> CCAN supports those yet :-)
Well, hence manual testing.
I have an idea for performance stuff, but it's a while away...
> > And I don't think your random test actually tests patching?
>
> Actually, they do. bdelta_diff verifies the patch before returning
> it. I was originally going to make this an assertion check, but then
> users (myself included) would have to manually check the result
> anyway. One bad diff can corrupt hundreds of datums.
Ah, I missed that. I would have been tempted to put it under !NDEBUG or
CCAN_BDELTA_DEBUG; once you're confident in the algorithm the checking
seems questionable.
> Thanks for the review!
No worries, now I have to actually *understand* the code to give
feedback on the implementation :)
Thanks!
Rusty.
More information about the ccan
mailing list