patchwork series bug

Stephen Finucane stephen at that.guru
Sat Nov 7 01:45:16 AEDT 2020


On Sat, 2020-11-07 at 00:49 +1100, Daniel Axtens wrote:
Bin Meng <bmeng.cn at gmail.com> writes:

> On Wed, Oct 28, 2020 at 1:58 PM Bin Meng <bmeng.cn at gmail.com> wrote:
> > 
> > Hi,
> > 
> > Please see below two series:
> > 
> > Series #1:
> > http://patchwork.ozlabs.org/project/qemu-devel/list/?series=210330
> > Series #2:
> > http://patchwork.ozlabs.org/project/qemu-devel/list/?series=210336
> > 
> > The following patch
> > http://patchwork.ozlabs.org/project/qemu-devel/patch/20201027141740.18336-7-bmeng.cn@gmail.com/
> > 
> > should really be put in series #2, not series #1.
> > 
> > Not sure why this bug happens, as you can see clearly from the
> > message
> > id, the patch should belong to series #2.
> 
> Ping?

Sorry, patchwork is maintained by two hugely busy people without much
official support from our employers or any other funding. We will look
into it when we get a moment.

Kind regards,
Daniel

+1

I did have a chance to look at this this afternoon. What you're seeing
is unfortunately expected behavior, based on how the series detection
algorithm currently works. We attempt to find a series for a new patch
first by checking the email threading headers (In-Reply-To, Reference)
[1] and then by falling back to searching by other markers (version,
number of patches, author) [2]. The latter is necessary to handle
unthreaded emails but it is prone to false positives since it is by
design quite a broad search for matches. To minimise these false
positives, we timebox the lookup [3].

Because your references don't match up, you've fallen into the fallback
check, and because both series appear to Patchwork to be the same
"version" (version 1, since they don't have series version markers),
have the same number of patches (9), and the same author (you),
Patchwork is identifying them as the same thing. Crucially, both emails
were sent within 10 minutes of each other, which means they're passing
the timebox check.

There isn't a whole lot we can do to handle this without breaking
series detection for unthreaded series. We could possibly only do the
fallback check if there are no threading headers but there are likely
to be more unintended consequences to this. You could ask an admin to
move the patches manually but it's probably not worth the trouble. It
may seem silly, but simply waiting 20 minutes before resending a series
would likely be the easiest way to avoid this.

Sorry I don't have a better answer. Parsing arbitrary email and turning
it into structured data is hard :(

Cheers,
Stephen

[1] https://github.com/getpatchwork/patchwork/blob/bdb049c7939b/patchwork/parser.py#L204-L263
[2] https://github.com/getpatchwork/patchwork/blob/bdb049c7939b/patchwork/parser.py#L266-L300
[3] https://github.com/getpatchwork/patchwork/blob/bdb049c7939b/patchwork/parser.py#L43



More information about the Patchwork mailing list