[Simplicity] Fuzz testing vectors

Tue Nov 12 14:29:24 AEDT 2019

Greetings,

Now that there is a fast C implementation I thought I would throw some
coverage-guided fuzzing at it.

I made a trivial test harness that uses
elements_simplicity_execSimplicity and testTx1 from test.c as the
context, just like the ordinary tests in test.c.

I grinded on it with a 1 second execution upper time bound starting
from a few hundred thousand randomly generated short inputs for about
a cpu-core-year.

The result is a collection of 7371 inputs extracted from a few billion
attempts that collectively hit the as broad a coverage footprint as I
observed in this test run:

https://people.xiph.org/~greg/simplicityC_tests.tar.xz (78kb, expands to 7.3mb)

The files are named based on the result of the execution, e.g.
successes vs various failures and roughly where they failed.

Collections like this are useful for comparing other implementations,
quickly testing new changes, or as a starting point for new
coverage-directed fuzzing runs.

While setting up testing I found that the non-elements harness (e.g.
test.c:test_program) segfaults when elements operations are used and
no transaction context is provided, which is kind of surprising. I
would have expected that either a null tx pointer should always fail
(api misuse) or the elements operations should become cleanly
unavailable.  This might just be me misunderstanding the intended
interface.  But I would have guessed that there would be different
entry points with different required arguments for different kinds of
contexts, and use of the wrong opcodes for your context would just
make execution fail like an executed assertLR node.

I also found two distinct assertion failures which probably trace back
to a single faulty elements operation. See issues #21 and #22.

I was surprised by the rather large percentage of inputs that made it
through type inference. I have no idea how efficient the serialization
will be for actual programs people want to use, but it's at least
efficient for something. :)

All in all I was pretty impressed by how robust the implementation
appears to be even at this early stage of development.

I don't, however, think this fuzzing effort is particularly good at
this point. I'm aware of the following shortcomings of my efforts:

* The initial starting points would ideally include a large number of
well formed programs. It would be good to start from a manually
constructed set of tiny tests that check each opcode and combinator.
Unfortunately the provided examples are mostly currently too slow to
use as good fuzzing vectors, so I just used random ones.

* The bulk of the fuzzing should take care to avoid making stupid
serialization syntax errors that waste time without testing anything
useful.
Including, for example, not emitting reserved or unimplemented opcodes
or running the reader past the end of the file.

* The implementation really needs a static runtime bound for this
testing. With that, I could set it reasonably low, testing would run
much faster, and any test case which still took a long time would be
worth investigating.

* There are probably a number of trivial optimizations that might make
sense for the sake of fuzzing performance that wouldn't otherwise be
worthwhile for real code, e.g. observing during type inference that
the program is trivial in various ways and skipping execution. Though
it would be good to also fuzz with those optimizations off, right now
it seems to waste a lot of time on various trivial programs that
aren't too fuzzing-interesting.

* I didn't fuzz with additional tools like ubsan or by adding any
additional instrumentation beyond the existing asserts.

* My collection doesn't include any of the test cases that don't
complete in a second. Manually inspecting some of them suggests that
they many would run fine but would just take an eternity to complete.

* The success/failure state of each of the vectors in the collection
should be at least checked against another implementation, which I
didn't do. (but someone could do for me, e.g. with the haskell
tools?).

I'll be happy to give it another go later (e.g. once static runtime
analysis is implemented), but I thought even with the above
limitations this collection might be useful to people.

Cheers,