<div dir="ltr">Hi Gregory,<div><br></div><div>Quick question of clarification: Are the contents of these files the serialized witness program input to the evaluator?</div><div><br></div><div>I am currently (although infrequently, sadly) working on a test harness that compares the Jet implementations and the Haskell EDSL spec for the purposes of verification. It seems like I could also pull these in and test them against the Haskell tools and see if they succeed/fail in the same ways. That said, the segfault failures will not be replicable in the Haskell tools, but perhaps the segmentation faults will be eliminated in the C implementation in the future. My naive first take is that you are right about the "failing cleanly" assessment, but perhaps Russell has a reason for the current implementation being the way that it is right now.</div><div><br></div><div>If we want to do more intelligent fuzzing (well-typed programs), it may be easier to set those kinds of tests up in Haskell's quickcheck and send them over the FFI boundary to the C evaluator.</div><div><br></div><div>Cool to see more people interested in this project! Thanks for the data!</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 11, 2019 at 8:30 PM Gregory Maxwell <<a href="mailto:gmaxwell@gmail.com">gmaxwell@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Greetings,<br>

<br>

Now that there is a fast C implementation I thought I would throw some<br>

coverage-guided fuzzing at it.<br>

<br>

I made a trivial test harness that uses<br>

elements_simplicity_execSimplicity and testTx1 from test.c as the<br>

context, just like the ordinary tests in test.c.<br>

<br>

I grinded on it with a 1 second execution upper time bound starting<br>

from a few hundred thousand randomly generated short inputs for about<br>

a cpu-core-year.<br>

<br>

The result is a collection of 7371 inputs extracted from a few billion<br>

attempts that collectively hit the as broad a coverage footprint as I<br>

observed in this test run:<br>

<br>

<a href="https://people.xiph.org/~greg/simplicityC_tests.tar.xz" rel="noreferrer" target="_blank">https://people.xiph.org/~greg/simplicityC_tests.tar.xz</a> (78kb, expands to 7.3mb)<br>

<br>

The files are named based on the result of the execution, e.g.<br>

successes vs various failures and roughly where they failed.<br>

<br>

Collections like this are useful for comparing other implementations,<br>

quickly testing new changes, or as a starting point for new<br>

coverage-directed fuzzing runs.<br>

<br>

While setting up testing I found that the non-elements harness (e.g.<br>

test.c:test_program) segfaults when elements operations are used and<br>

no transaction context is provided, which is kind of surprising. I<br>

would have expected that either a null tx pointer should always fail<br>

(api misuse) or the elements operations should become cleanly<br>

unavailable.  This might just be me misunderstanding the intended<br>

interface.  But I would have guessed that there would be different<br>

entry points with different required arguments for different kinds of<br>

contexts, and use of the wrong opcodes for your context would just<br>

make execution fail like an executed assertLR node.<br>

<br>

I also found two distinct assertion failures which probably trace back<br>

to a single faulty elements operation. See issues #21 and #22.<br>

<br>

I was surprised by the rather large percentage of inputs that made it<br>

through type inference. I have no idea how efficient the serialization<br>

will be for actual programs people want to use, but it's at least<br>

efficient for something. :)<br>

<br>

All in all I was pretty impressed by how robust the implementation<br>

appears to be even at this early stage of development.<br>

<br>

I don't, however, think this fuzzing effort is particularly good at<br>

this point. I'm aware of the following shortcomings of my efforts:<br>

<br>

* The initial starting points would ideally include a large number of<br>

well formed programs. It would be good to start from a manually<br>

constructed set of tiny tests that check each opcode and combinator.<br>

Unfortunately the provided examples are mostly currently too slow to<br>

use as good fuzzing vectors, so I just used random ones.<br>

<br>

* The bulk of the fuzzing should take care to avoid making stupid<br>

serialization syntax errors that waste time without testing anything<br>

useful.<br>

Including, for example, not emitting reserved or unimplemented opcodes<br>

or running the reader past the end of the file.<br>

<br>

* The implementation really needs a static runtime bound for this<br>

testing. With that, I could set it reasonably low, testing would run<br>

much faster, and any test case which still took a long time would be<br>

worth investigating.<br>

<br>

* There are probably a number of trivial optimizations that might make<br>

sense for the sake of fuzzing performance that wouldn't otherwise be<br>

worthwhile for real code, e.g. observing during type inference that<br>

the program is trivial in various ways and skipping execution. Though<br>

it would be good to also fuzz with those optimizations off, right now<br>

it seems to waste a lot of time on various trivial programs that<br>

aren't too fuzzing-interesting.<br>

<br>

* I didn't fuzz with additional tools like ubsan or by adding any<br>

additional instrumentation beyond the existing asserts.<br>

<br>

* My collection doesn't include any of the test cases that don't<br>

complete in a second. Manually inspecting some of them suggests that<br>

they many would run fine but would just take an eternity to complete.<br>

<br>

* The success/failure state of each of the vectors in the collection<br>

should be at least checked against another implementation, which I<br>

didn't do. (but someone could do for me, e.g. with the haskell<br>

tools?).<br>

<br>

I'll be happy to give it another go later (e.g. once static runtime<br>

analysis is implemented), but I thought even with the above<br>

limitations this collection might be useful to people.<br>

<br>

Cheers,<br>

-- <br>

Simplicity mailing list<br>

<a href="mailto:Simplicity@lists.ozlabs.org" target="_blank">Simplicity@lists.ozlabs.org</a><br>

<a href="https://lists.ozlabs.org/listinfo/simplicity" rel="noreferrer" target="_blank">https://lists.ozlabs.org/listinfo/simplicity</a><br>

</blockquote></div>