[ccan] New module: bdelta

Adam Kennedy adamkennedybackup at gmail.com
Sun Aug 21 00:18:07 EST 2011


When your tests can be run in arbitrary environments under conditions
not under your control, performance testing can be challenging, to say
the least.

CPAN hasn't cracked it yet. I've made some proprietary (because I'm
too embarrased to open source them) attempts to integrate performance
testing into TAP-outputing test collections at work, but never come up
with anything that wasn't horribly fragile.

Biggest problem is that you can't control isolation. If you do
performance testing and someone is mid-install and fires up come other
compile job which hammers the cpu, then your performance test might
fail. And since failing tests can block installation, you've screwed
the user.

Adam

On 20 August 2011 00:41, Joey Adams <joeyadams3.14159 at gmail.com> wrote:
> On Fri, Aug 19, 2011 at 2:17 AM, Rusty Russell <rusty at rustcorp.com.au> wrote:
>> I like it!  It's a perfect example of a ccan module: it does one thing,
>> and does it well.
>
> Thanks!
>
>> A few minor nitpicks from reading the patch (I haven't applied it yet!)
>>
>> 1) I'm always tempted to reuse errno rather than invent my
>> own error codes (ENOMEM, EINVAL and EIO?), but otherwise I would do an
>> enum for the error codes; it just makes the API more explicit.
>
> My error strings are more descriptive.  "Patch applied to wrong
> string" is far more descriptive than "Invalid argument".  Besides, I
> prefer to be portable in a library, and ENOMEM/EINVAL/EIO are defined
> by POSIX, not by the C standards.
>
> I was going to use an enum, but what would I call it?  Remember that
> C++ would treat enum bdelta {...} as a type definition (rather than
> simply an enum tag), meaning "bdelta" could no longer be used as a
> variable name.
>
> I should probably use a typedef enum of some name, if only to annotate
> which functions return error codes and which ones don't.
>
>> 2) Maybe use the term 'char arrays' rather than 'byte strings' as
>>   string has that 0-terminated sense for C coders.
>
> I wasn't sure what to call them.  PostgreSQL calls them "byte arrays
> (BYTEA)".  Haskell calls it ByteString.  char almost implies plain
> text, while "byte" or "octet" suggests the library can be used with
> binary data.  A "string" in many other programming languages can have
> null bytes.
>
> Another naming consideration: I'm not sure whether to call it a
> "patch", "diff", or "delta".  I called the library bdelta rather than
> bdiff mainly because bdiff_diff looks silly.  I should probably use
> all of those terms so people using search engines can find what
> they're looking for.
>
> Note that I shunned the word "binary" in my documentation because the
> diff algorithm shortens at the byte level, not the bit level.  An
> insertion of one bit would change all the bytes that follow.
>
>> 3) I wouldn't implement bdelta_perror personally, since I prefer
>>   err/errx/warn/warnx.
>
> I'm following the lead of GnuTLS here, which has a function called
> gnutls_perror.  It's really convenient to have, if you want to follow
> the good practice of handling all error codes but haven't become
> interested in the errors themselves yet.
>
> err/errx/warn/warnx aren't defined by C or POSIX, but are BSD extensions.
>
>> 4) We really need a good bitstring library: it'd be great for this!
>
> You just said "string"!  :-)
>
>> I think testing could be improved a bit.  For example, I'd test the
>> diff results directly, before testing patching.  eg. that "aaaa" vs
>> "aaaa" produced a non-diff, then "aaaa" vs "aaaab", etc.  Then once the
>> diffs pass, test those patches.
>
> Indeed, bdelta could use more trivial tests.  Someone on freenode/#c
> recently asked if anyone had any tests for longest common subsequence,
> and I pointed to bdelta.  Having more tests would have been nice.
>
> Also, it would be nice to have some tests that test the backend
> algorithm itself (diff_myers), checking the result against expected,
> hard-coded patch strings.
>
>> As far as I can tell, if your diff never generated anything but whole
>> new strings, you'd pass?
>
> Indeed.  Otherwise, it would be a performance test, and I'm not sure
> CCAN supports those yet :-)
>
>> And I don't think your random test actually tests patching?
>
> Actually, they do.  bdelta_diff verifies the patch before returning
> it.  I was originally going to make this an assertion check, but then
> users (myself included) would have to manually check the result
> anyway.  One bad diff can corrupt hundreds of datums.
>
> Thanks for the review!
>
> - Joey
> _______________________________________________
> ccan mailing list
> ccan at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/ccan
>


More information about the ccan mailing list