PaReD: a patch relations detector for patchwork

Lukas Bulwahn lukas.bulwahn at gmail.com
Sat Sep 4 16:04:48 AEST 2021


also CC: Ralf and Konstantin as two other implementors of "patch
relation detectors" and Rohit as exploring the patchwork export and
import functionality.

Hi Daniel,

On Fri, Sep 3, 2021 at 2:32 AM Daniel Axtens <dja at axtens.net> wrote:
>
> Hi all,
>
> I have written the simplest patch relation detector that might possibly
> work as an API client. It is running against the patchwork and
> linuxppc-dev projects on patchwork.ozlabs.org.
>
> It currently detects mails with identical subjects (after prefixes are
> removed) within a 180 day window. This is not a very sophisticated
> matching system, but given that it's an API client and not in the core,
> I'm much happier to experiment and build up sophistication as and when
> it's needed.
>

Simplicity is certainly valuable.

Ralf and I envisioned the much more sophisticated algorithm for
similar patch detection (from pasta, https://github.com/lfd/PaStA)
integrated into a workflow with patchwork.

Daniel, you have seen the small steps we have taken:

- Mete (an intern at BMW, my employer at the time) implemented the
"related patches" feature for patchwork in 2019.
- Rohit (a Google Summer of Code student in 2020, mentored by Ralf and
me) implemented an "export, compute, import" toolchain between
patchwork and pasta, some more details are described in
https://github.com/lfd/PaStA/blob/master/documentation/pasta-patchwork.md.

Unfortunately, IMHO, we hit two challenging implementation tasks with this work:
1. Performance issue computing relations with pasta
2. The lack of being able to limit the computation to new incoming
patches: pasta was designed as an run-once off-line analysis tool, not
as an continuously running online analysis; changing that is possible,
but touches on various internal aspects throughout the whole tool.

At that point, we have not continued the work yet and I personally
believe that exploring simpler solutions than the complex pasta
heuristics is worth a try (even if just to save power consumption of
servers in the long run...).

For completeness, I need to mention that Konstantin's b4 tool also
detects the "latest patch series" when you ask it to pick a patch
series from a kernel mailing list. I do not know how it determines
that (and I hope that Konstantin can comment here), but it is probably
also a simple heuristics searching for similar/same subject lines of
the patch series cover letter. It would be nice if that functionality
could be invoked as some kind of library function/separate client tool
for patchwork as well.

I hope that others can also come up with simple PaReD variants, such
as parsing lore.kernel.org Links in the 'patch comment section' (so
below the "---"), as once named the best way for developers to refer
to previous versions in a ksummit-discuss email thread. I always hope
that once a tool provides a significant benefit for tracking and
managing previous versions, more developers pick up the needed
conventions that patches would need to follow to benefit from such a
tool.

> You can get the code at https://github.com/daxtens/pw-pared . I'm using
> the same license as Patchwork, for a number of reasons, but in part
> because we may one day want to migrate the functionality into the
> patchwork core. Patches are welcome.
>
> You can see some examples of where PaReD has set up meaningful relations
> at:
>
>  - https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210802073929.907431-2-kjain@linux.ibm.com/
>  - https://patchwork.ozlabs.org/project/patchwork/patch/20210823182833.3976100-6-raxel@google.com/
>
> Some very obvious things that doing this has exposed:
>
>  - the relations display should show the status of each related patch
>    (e.g. New, Superseded, Accepted)
>
>  - Series relations would make a lot of sense - probably even more sense
>    from a human point of view - and we should probably build those at
>    some point.
>

Agree. This is something Ralf, Mete, Rohit and I discussed as well.

Extending a patch relation to a patch series relation is conceptually simple:

If two patch series S1 and S2 with patches p1, ..., pn in series S1
and patches r1, ..., rm in series S2 share a critical amount of
related patches, i.e., for a large set of pairs of indices (i, j) in
I: pi and rj are related to each other, then the series S1 and S2 are
related to each other. Further, one could come up with a separate
similarity relation among cover letters, and weigh that into the
measure for related patch series. Fine-tune the weights and
thresholds, evaluate it on a representative dataset and you are
done...Conceptually clear, but this involves quite some work.

>  - PaReD requires an API token for a maintainer account (much like for
>    pushing checks) which is annoying and one day we should sort out
>    fine-grained permissions.
>
> Ask your patchwork instance admin if a maintainer account for PaReD is
> right for you!
>

I am looking forward to more implementations and more instances
running and trying out this feature.

Daniel, thanks for moving this feature yet a step further.


Lukas


More information about the Patchwork mailing list