[rfc] Extending Patchwork as a GSoC project

Stephen Finucane stephen at that.guru
Fri May 8 02:11:17 AEST 2020


On Thu, 2020-05-07 at 16:48 +0200, Ralf Ramsauer wrote:
> Hi Stephen,
> 
> On 07/05/2020 15:46, Stephen Finucane wrote:
> > Welcome, Rohit.
> > 
> > On Thu, 2020-05-07 at 09:00 +0530, Rohit Sarkar wrote:
> > > > Message IDs and patch IDs should also be stable/immutable. Message IDs,
> > > > being a property of _mails_, will be the same across different patchwork
> > > > instances that consume the same mail. Patch IDs, being a property of the
> > > > specific database that ingested the patch, will vary from patchwork
> > > > instance to patchwork instance.
> > > > 
> > > > > Daniel, I have in mind that there is already some kind of infrastructure
> > > > > in patchwork for receiving raw patches... AFAIR, Mete implemented an
> > > > > export routine that eases the first initial import. Is there a
> > > > > possibility to reliably "receive all new patches since my last pull"?
> > > > 
> > > > I struggle a little bit to follow the who's importing and exporting from
> > > > whom, but:
> > > > 
> > > >   - There is now code to extract patches in one go from a patchwork
> > > >     instance. I'd caution you that there are gigabytes of patches in the
> > > >     databases of production instances going back over a decade, so you
> > > >     might find that a challenging data set to acquire and work with.
> > > > 
> > > >   - In terms of 'catching up': I think you're asking if Patchwork will
> > > >     let you _export_ all patches since your last pull, rather than asking
> > > >     if patchwork will let you import patches? I think that makes the most
> > > >     sense in context. If that's the case, then the way I would do that
> > > >     is:
> > > > 
> > > >     a) observe the highest patch ID in the project you are tracking, as
> > > >        patch IDs are always increasing. Note that the same cannot be said
> > > >        about dates - patchwork instances, due to the quirks of email,
> > > >        often get mail out-of-order. You probably want something like:
> > > > 
> > > >        http://patchwork.ozlabs.org/api/patches/?order=-id&project=linuxppc-dev
> > > > 
> > > >     b) Retrieve all email from your last pull to that patch ID. Bear in
> > > >        mind that it is likely that more email will arrive while you are
> > > >        doing this - hence why I suggest fetching the patch ID first! Be
> > > >        careful also of pagination as that can also change if new patches
> > > >        come in. One day we will fix this by adding cursor-based
> > > >        pagination as well but we haven't done it yet. As such you
> > > >        probably want to do this with a different query with the opposite
> > > >        ordering, something like:
> > > > 
> > > >        http://patchwork.ozlabs.org/api/patches/?since=2020-05-01T00%3A00%3A00&project=linuxppc-dev
> > > > 
> > > >        (order=id is implied but wouldn't hurt to specify it, and an API
> > > >        version, in your final code)
> > > 
> > > I might be missing something, but why does it matter if more patches
> > > arrive while pulling? PaStA can pull all patches since it's last pull as
> > > you mentioned. 
> > 
> > I'll also point out the events API [1]. This would be a lighter way to
> > probe for new patches. In particular, you probably care about the
> > 'patch-created' event, which occurs every time we receive a new patch.
> > You can poll for these like so:
> > 
> >     http://patchwork.ozlabs.org/api/events/?category=patch-created&since=2020-05-01T00%3A00%3A00&project=linuxppc-dev
> > 
> > Also, this doesn't exist yet, but it would be quite easy to add the
> > concept of webhooks. With a webhook infrastructure, you'd be able to
> > configure Patchwork to POST a JSON payload to an arbitrary URL every
> > time we e.g. receive a new patch. This would allow Patchwork to push
> 
> Uh, does that scale?

TBD. We could integrate something like Redis if we wanted to decouple
things/make it optional.

> > things to you instead of having to poll. You would have to wait for a
> > future 3.0 release for this though, assuming you wanted to run against
> > a public instance.
> 
> Both approaches, webhooks and events are synchronous methods that only
> work if there are no interruptions. I'd rather prefer asynchronous methods.

Fair.

Stephen

> Thanks
>   Ralf
> 
> > Stephen
> > 
> > [1] https://patchwork.readthedocs.io/en/latest/api/rest/schemas/v1.2/#get--api-1.2-events-
> > 



More information about the Patchwork mailing list