[Simplicity] Fuzz testing vectors

Wed Nov 13 10:54:52 AEDT 2019

Hi Gregory,

Quick question of clarification: Are the contents of these files the
serialized witness program input to the evaluator?

I am currently (although infrequently, sadly) working on a test harness
that compares the Jet implementations and the Haskell EDSL spec for the
purposes of verification. It seems like I could also pull these in and test
them against the Haskell tools and see if they succeed/fail in the same
ways. That said, the segfault failures will not be replicable in the
Haskell tools, but perhaps the segmentation faults will be eliminated in
the C implementation in the future. My naive first take is that you are
right about the "failing cleanly" assessment, but perhaps Russell has a
reason for the current implementation being the way that it is right now.

If we want to do more intelligent fuzzing (well-typed programs), it may be
easier to set those kinds of tests up in Haskell's quickcheck and send them
over the FFI boundary to the C evaluator.

Cool to see more people interested in this project! Thanks for the data!

On Mon, Nov 11, 2019 at 8:30 PM Gregory Maxwell <gmaxwell at gmail.com> wrote:

> Greetings,
>
> Now that there is a fast C implementation I thought I would throw some
> coverage-guided fuzzing at it.
>
> I made a trivial test harness that uses
> elements_simplicity_execSimplicity and testTx1 from test.c as the
> context, just like the ordinary tests in test.c.
>
> I grinded on it with a 1 second execution upper time bound starting
> from a few hundred thousand randomly generated short inputs for about
> a cpu-core-year.
>
> The result is a collection of 7371 inputs extracted from a few billion
> attempts that collectively hit the as broad a coverage footprint as I
> observed in this test run:
>
> https://people.xiph.org/~greg/simplicityC_tests.tar.xz (78kb, expands to
> 7.3mb)
>
> The files are named based on the result of the execution, e.g.
> successes vs various failures and roughly where they failed.
>
> Collections like this are useful for comparing other implementations,
> quickly testing new changes, or as a starting point for new
> coverage-directed fuzzing runs.
>
> While setting up testing I found that the non-elements harness (e.g.
> test.c:test_program) segfaults when elements operations are used and
> no transaction context is provided, which is kind of surprising. I
> would have expected that either a null tx pointer should always fail
> (api misuse) or the elements operations should become cleanly
> unavailable.  This might just be me misunderstanding the intended
> interface.  But I would have guessed that there would be different
> entry points with different required arguments for different kinds of
> contexts, and use of the wrong opcodes for your context would just
> make execution fail like an executed assertLR node.
>
> I also found two distinct assertion failures which probably trace back
> to a single faulty elements operation. See issues #21 and #22.
>
> I was surprised by the rather large percentage of inputs that made it
> through type inference. I have no idea how efficient the serialization
> will be for actual programs people want to use, but it's at least
> efficient for something. :)
>
> All in all I was pretty impressed by how robust the implementation
> appears to be even at this early stage of development.
>
> I don't, however, think this fuzzing effort is particularly good at
> this point. I'm aware of the following shortcomings of my efforts:
>
> * The initial starting points would ideally include a large number of
> well formed programs. It would be good to start from a manually
> constructed set of tiny tests that check each opcode and combinator.
> Unfortunately the provided examples are mostly currently too slow to
> use as good fuzzing vectors, so I just used random ones.
>
> * The bulk of the fuzzing should take care to avoid making stupid
> serialization syntax errors that waste time without testing anything
> useful.
> Including, for example, not emitting reserved or unimplemented opcodes
> or running the reader past the end of the file.
>
> * The implementation really needs a static runtime bound for this
> testing. With that, I could set it reasonably low, testing would run
> much faster, and any test case which still took a long time would be
> worth investigating.
>
> * There are probably a number of trivial optimizations that might make
> sense for the sake of fuzzing performance that wouldn't otherwise be
> worthwhile for real code, e.g. observing during type inference that
> the program is trivial in various ways and skipping execution. Though
> it would be good to also fuzz with those optimizations off, right now
> it seems to waste a lot of time on various trivial programs that
> aren't too fuzzing-interesting.
>
> * I didn't fuzz with additional tools like ubsan or by adding any
> additional instrumentation beyond the existing asserts.
>
> * My collection doesn't include any of the test cases that don't
> complete in a second. Manually inspecting some of them suggests that
> they many would run fine but would just take an eternity to complete.
>
> * The success/failure state of each of the vectors in the collection
> should be at least checked against another implementation, which I
> didn't do. (but someone could do for me, e.g. with the haskell
> tools?).
>
> I'll be happy to give it another go later (e.g. once static runtime
> analysis is implemented), but I thought even with the above
> limitations this collection might be useful to people.
>
> Cheers,
> --
> Simplicity mailing list
> Simplicity at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/simplicity
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/simplicity/attachments/20191112/34f5f174/attachment.htm>