[PATCH v2 0/9] Tools and fixes for parallel parsing

Mon Feb 26 16:42:27 AEDT 2018

Stephen Finucane <stephen at that.guru> writes:

> On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> Thomas Petazzoni reported that Patchwork would occasionally lose
>> Buildroot email. Andrew - having talked to jk and sfr - suggested that
>> this may be race-condition related.
>> 
>> I investigated and found some bugs. I first had to develop some tools.
>> Along the way I found other unrelated bugs too.
>> 
>> Patches 1-4 are tooling - ways to do parallel parsing of messages and
>> get and compare the output. (Patch 1 fixes an issue I found when
>> running the tool from patch 2)
>> 
>> Patch 5 is an unrelated fix that came up along the way and
>> demonstrates that humans remain the best fuzzers, and that Python's
>> email module is still adorably* quirky.
>> 
>> Patch 6 is a bug that came up very quickly in testing but is unlikely
>> to be the actual bug Buildroot is hitting, as it can only occur the
>> first time an email address is seen.
>> 
>> Patch 7 is a related tidy-up/optimisation.
>> 
>> Patch 8 fixes up a MySQL-only bug, but also adds some robustness.
>> 
>> I think patch 9 closes the most likely issue for Buildroot patches.
>> 
>> Pending review, patches 5, 6, 8 and 9 should go to stable.
>> 
>> V2: Address review comments from Andrew, see patches.
>>     Added the new python script to the automatic pep8 testing.
>>     Minor improvement to parallel parse script to allow different
>>      pythons to be used.
>
> Well, this is all rather unfortunate :( When I was developing this, I
> was testing using 'parsearchive' with a single archive, which naturally
> meant everything was running in series. It seems the use of an MTA
> means the 'parsemail' script, which 'parsearchive' harnesses, gets
> called multiple times here and highlights all these races.

Likewise, and yes, I think that's how we get here.

> I've reviewed most of the patches (bar one, which I'm still working on)
> at this point, and think most of these should be merged. However, I've
> noted that some of the patches leave FIXMEs in place. I'm pretty sure
> all of these occur where we apply a "lookup now and save later _if_
> these preconditions are met" pattern, which we do for things like
> series references, authors, etc. This isn't going to be easy to fix,
> IMO.
>
> That brings me to the real question here: what do we need to do to
> ultimately fix this? Far as I can see, there are two options:
>
>  * Save stuff unconditionally. If we're able to find an email, save
>    that even if it's not a patch/cover letter/comment. If we find an
>    author that we haven't seen before, save that too even if they
>    haven't touched a patch, cover letter or comment. This greatly
>    simplifies things, but would result in a more congested database and
>    API, and would require some background work (saving non-patch/cover
>    letter emails).

I don't think we need to do this - and I'm not sure that the problem is
the lookup-now-save-later pattern, I think it's just our implementation.
The big thing is series references, and I think that (apart from
unthreaded series and possibly deeply threaded series?) we can solve
that pretty easily by holding the 'head email' (patch 0 or patch 1) in
the SeriesReference. But I will think about it more; I don't have
certain answers just yet.

>  * Use locks. I recall a patch from Damien Lespiau some years ago
>    around making this a parallel process [1]. Couldn't we just do this
>    on the critical sections of code (anything that involves DB
>    accesses, I guess)? This would require minimal changes but could
>    have scalability issues.

So I have Feelings about introducing locking mechanisms.

In particular, looking at [1], I worry we're going to land in the
territory of bizzare and irritating issues that robust futexes / pthread
mutexes deal with.

In short, a futex is a fast userspace mutex device facilitated by the
kernel. It's used to implement higher-level locks/mutexes/semaphores.
It uses shared memory - basically it twiddles bits in memory to use as a
lock.

Now, consider 2 processes, A and B are using shared memory and futexes.
A takes the futex, and B sits waiting for the futex so it can proceed.
While holding the futex, A segfaults. However, A is still holding the
lock, and cannot clear it, so B is now deadlocked.

It's easy to go "oh, we should catch segfaults", and with the right
handler functions you can. So let's say A does that. Now if it
segfaults, the handler unlocks and B continues. This is all well and
good, but now consider the situation where an impatient sysadmin (or the
kernel) SIGKILLs A. In this case no handler can help you: B will always
deadlock.

Robust futexes solve this problem with kernel magic - that's what makes
them "robust". (Basically, the kernel deals with the case of the process
going away with the lock held.)

So, in Damien's case:
 - we could deadlock if there is a Python exception
 - so we catch this with a finally: and unlock
 - but we're still stuffed if the python process is killed in a way that
   stops Python from running the finally block

So we will have introduced a new locking mechanism with new, fragile,
tricky to debug failure modes*. There is no mechanism we could
conceivably hook into to become truly "robust". Now, Damien's code does
try: the underlying implementaion in
https://lists.ozlabs.org/pipermail/patchwork/2015-October/001957.html
makes the lockfile (/tmp/whatever) as a symlink to "hostname:pid" as
this is atomic. Good! But, to avoid the robustness issue, if you try to
take the lock and it's owned by a process with a PID that does not
exist, it guesses that the process has died and steals the lock. This,
doesn't work if any of the following apply:

 - you're in different PID namespaces but the same hostname namespace
   (e.g docker --net=none).

 - there have been enough processes that the PID counter has rolled over
   and the PID has been reused.

 - the code tests for the existence of a process by sending signal 0 to
   the PID and seeing what happens. It's very easy to imagine a paranoid
   sysadmin from using seccomp to block parsemail from sending a signal,
   as there's no obvious reason why it should do so. (And I checked
   strace, it doesn't appear to do so at the moment.)

To add insult to injury, we've added this concurrency control mechanism
to solve a problem in how we use an underlying system that has
(literally!!) decades of work in correct concurrent behaviour.

I would rather not do this.

Regards,
Daniel

* More interesting failure modes:
 - Each parsemail process runs in a docker container with a unique /tmp
 - PW runs on a RO filesystem
 - PW runs distributed across machines - have to share /tmp (and w/
    something that supports atomic symlinks e.g. NFS)
 - can another unrelated process delete the lock file?

>
> Thoughts?
>
> [1] http://patchwork.ozlabs.org/patch/532930/