Patch stack analysis

Tue Jun 4 10:58:04 AEST 2019

Hi,

> we (Ralf Ramsauer, Lukas Bulwahn and me) are currently working on
> extending the capabilities of Patchwork by combining it with a tool
> called PaStA [1] (Patch Stack Analysis). PaStA is the outcome of a
> research project [2] by the Technical University of Applied Sciences
> Regensburg. It analyses and compares all mails in a mailing list to
> find related ones (e.g former versions of the patch, see [3]). Ralf
> compared PaStA's results for the Linux kernel mailing list with a
> manually created ground truth and achieved an accuracy of 91%. This
> motivated us to integrate PaStA into Patchwork.

Cool, always interesting to see what people build on top of Patchwork!

We did consider having a feature like this and from memory we might even
have some infrastructure for it. (I get it confused with the feature
allowing a patch to belong to multiple series which we ripped out a
while ago.)

One bit of relevant Patchwork history: that there's a long-running fork
run by the freedesktop.org people: patchwork.freedesktop.org,
https://gitlab.freedesktop.org/patchwork-fdo/patchwork-fdo/ . They took
a different approach to series than we did: we focused on patches as the
key 'unit' of patchworking, they focused on series as the key unit. They
already have some support for multiple revisions of a series. I don't
know how they've implemented their feature for detecting multiple
revisions, but I'm guessing it's not based on analysis of (commit
message, diff) tuples. There's an example here:
https://patchwork.freedesktop.org/series/49692/

> Showing related patches (beside ones in the current series) allows
> developers to understand the patch's evolution better. We have
> adjusted the patch details view and renamed the series patch links
> from "related" to "series". Our new related row shows the patches
> related to each other by PaStA [3][4]. The relations between the
> patches in the screenshot were made manually and the next steps will
> be to automate this procedure with PaStA.

I'm really wary about incorporating something with so many dependencies
(and with presumably higher resource usage) into the core of
patchwork.

I'd want to know a few things:

 - what is the accuracy of the FDO Patchwork approach (which I assume is
   100% metadata based)? Does it require that patch sumbitters do
   particular things (e.g. use the same cover letter title)? Sometimes
   we can train users to be helpful in how they submit things to the
   lists in order to have them work properly in more simple systems.

 - one key use case is the Linux kernel, where we have stable trees, and
   patches getting picked up for those trees. Sometimes those patches
   are identical and sometimes they need backporting. Some care would
   need to be taken around this.

   An example would be:
    - I send this patch to the mailing list: http://patchwork.ozlabs.org/patch/1099934/
    - It is merged into mainline
    - It is proposed for stable trees. This involves multiple threads of
      over 100 emails each, including:
      * https://lkml.org/lkml/2019/5/29/1655
      * https://lkml.org/lkml/2019/5/30/361
      * (plus 3 others)

   In this case, the original patch is related to the stable patches,
   (despite being sent by someone different), and it is interesting and
   useful to know what stable series a patch landed in. However, the
   patch is not really related to the entire stable patch _series_, and
   if you include all the hundreds of patches in your 'related' view in
   [3], you will drown out all the potentially useful signal in a bunch
   of noise.

   It does get more complicated than this too, for example when there is
   a need to packport a patch for stable. (See
   e.g. http://patchwork.ozlabs.org/patch/1109024/ and friends)

 - what's the resource usage, and how long does matching take?
   kernel.org has a patchwork instance that is hooked up to LKML, so
   this is a deeply practical concern for them!

I think a really good place to start would be to hook PaStA
up as an API consumer like Snowpatch. It wouldn't be able to report the
results back to patchwork just yet, but you'd be able to try it with
live data and demonstrate its value.

Thanks for letting us know about your research!

Regards,
Daniel

>
> Stephen, what's your opinion about this?
>
> Greetings,
>
> Mete Polat
>
> [1] https://github.com/lfd/pasta <https://github.com/lfd/pasta>
> [2] https://arxiv.org/pdf/1902.03147.pdf <https://arxiv.org/pdf/1902.03147.pdf>
> [3] https://drive.google.com/drive/folders/18s9FzJUKnIUBp7FBL7dV8dqlGPXqTemq?usp=sharing <https://drive.google.com/drive/folders/18s9FzJUKnIUBp7FBL7dV8dqlGPXqTemq?usp=sharing>
> [4] https://github.com/Honeybyte/patchwork/tree/pasta <https://github.com/Honeybyte/patchwork/tree/pasta>
> _______________________________________________
> Patchwork mailing list
> Patchwork at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork