[rfc] Extending Patchwork as a GSoC project
Rohit Sarkar
rohitsarkar5398 at gmail.com
Sat May 9 14:15:46 AEST 2020
Hi Daniel,
[snip]
>
> >> - In terms of 'catching up': I think you're asking if Patchwork will
> >> let you _export_ all patches since your last pull, rather than asking
> >> if patchwork will let you import patches? I think that makes the most
> >> sense in context. If that's the case, then the way I would do that
> >> is:
> >>
> >> a) observe the highest patch ID in the project you are tracking, as
> >> patch IDs are always increasing. Note that the same cannot be said
> >> about dates - patchwork instances, due to the quirks of email,
> >> often get mail out-of-order. You probably want something like:
> >>
> >> http://patchwork.ozlabs.org/api/patches/?order=-id&project=linuxppc-dev
> >
> > Excellent! So... If I got everything from [0..100] and JSON reports that
> > the latest ID is 130, then [101..130] will _definitely_ exist and form
> > the exact set of patches that I miss?
>
> It's not _quite_ that simple! Both the set [0..100] and the set
> [101..130] will likely contain patches that do not belong to your
> project. I suspect you do not want to gather patches for every project!
>
> But if you are following linuxppc, and you have gathered
> {linuxppc patches with id <= 100},
> and the latest ID for linuxppc is 130, then I believe
> {linuxppc patches where 100 < id <= 130}
> is the exact set of patches you've missed. We don't support sharding of
> PK space for multi-master writes or anything else that might mess with
> this.
This is exactly how I am thinking of going about this, although with a
slightly different approach mentioned below
> Sadly, we also don't currently support a filter predicate that would
> allow you to neatly express 'patches with IDs between 100 and 130' in a
> query, but I'd be happy to consider such a patch.
This would certainly be the most elegant solution.
> (In the mean time, you can store what page of
> http://patchwork.ozlabs.org/api/patches/?project=linuxppc-dev contained
> patch 100 and read all subsequent pages until you hit patch 130. As I
> alluded to but didn't state clearly, pagination when sorted by
> increasing ID is stable*.)
I am storing the highest patch id amongst all the patches that PaStA has
received. Then I fetch all patches from Patchwork or a particular
project reverse ordered by id. I read the patches until I reach the
patch that has patch id same as the highest patch id in PaStA.
In the worst case I see that I will be fetching an extra page of
patches. (When the first patch in a page is one that PaStA already has)
Is this an efficient way to go about things? Particularly is fetching
all patches for a project efficient considering the response is paged?
>
> >>
> >> b) Retrieve all email from your last pull to that patch ID. Bear in
> >> mind that it is likely that more email will arrive while you are
> >> doing this - hence why I suggest fetching the patch ID first! Be
> >
> > Ack.
> >
> >> careful also of pagination as that can also change if new patches
> >> come in. One day we will fix this by adding cursor-based
> >> pagination as well but we haven't done it yet. As such you
> >> probably want to do this with a different query with the opposite
> >> ordering, something like:
> >>
> >> http://patchwork.ozlabs.org/api/patches/?since=2020-05-01T00%3A00%3A00&project=linuxppc-dev
> >>
> >> (order=id is implied but wouldn't hurt to specify it, and an API
> >> version, in your final code)
> >>
In part b above: doesn't it suffer from the same issue of there being no
guarantee that patches will arrive in the same order as given by the
patch dates. Eg.
Last pull was at date(timestamp) x. A patch with date y, y<x, arrives after my last
pull. On my next pull from Patchwork, when I fetch patches arriving
since date x, I will lose the patch with date y.
Thanks,
Rohit
More information about the Patchwork
mailing list