[RFC PATCH v2 0/1] add parserelations management command

Tue Jun 30 18:31:43 AEST 2020

On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote:
> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
> 
> > Hey,
> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote:
> >> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
> >> 
> >> > Hey,
> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote:
> >> >> The parserelations command will be used to bulk import patch relations into
> >> >> Patchwork using a "patch groups" file.
> >> >> The patch groups file is structured such that each line contains a
> >> >> relation.
> >> >> Eg patch groups file contents.
> >> >> 1 3 4
> >> >> 2
> >> >> 5 9 10
> >> >> 
> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and
> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit
> >> >> references.
> >> >> Note that before ingesting the relations the existing relations in the
> >> >> DB are removed.
> >> >> 
> >> >> v1..v2
> >> >> - remove commit references from the patch groups file.
> >> >> - Update documentation
> >> >> - rename some variables to remove the overloading.
> >> >> - use filter and update instead of get and save to reduce the db calls.
> >> >>   (Visible performance enhancement)
> >> >> 
> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to
> >> >> proceed.
> >> >> 
> >> >> Rohit Sarkar (1):
> >> >>   management: introduce parserelations command
> >> >> 
> >> >>  docs/deployment/management.rst                | 26 +++++++
> >> >>  .../management/commands/parserelations.py     | 71 +++++++++++++++++++
> >> >>  patchwork/tests/test_management.py            |  7 ++
> >> >>  3 files changed, 104 insertions(+)
> >> >>  create mode 100644 patchwork/management/commands/parserelations.py
> >> >> 
> >> >> -- 
> >> >> 2.23.0.385.gbc12974a89
> >> >> 
> >> >
> >> > Just wanted to follow up on this. Does this look good?
> >> 
> >> I was thinking about this as I washed the dishes last night.
> >> 
> >> Purging all relations in the database means that if any API-based user
> >> of relations emerges before the big public servers adopt a release with
> >> this, then they'll never be able to use it.
> >> 
> >> What's the speed impact of doing two passes through the data, with pass
> >> 1 collecting all the projects that this touches and pass 2 actually
> >> installing the relations?
> >> 
> >> Regards,
> >> Daniel
> > I wanted to reopen the discussion on this with the aim being to get
> > everyone on the same page, and either go with this approach or decide on
> > an alternate approach.
> >
> > The primary goal is to have a "parserelations" command that can parse a
> > "patch-groups" file that contains a relation on each line. There are 2
> > approaches:
> >
> > a. Refresh the relations table. The parserelations command will remove
> > the existing relations before inserting the newly parsed relations. This
> > avoids any conflicts between existing and new relations. However other
> > users of the API might be affected.
> >
> > b. Insert relations in a non destructive way. We need a way to handle
> > conflicts between existing and incoming relations. So the solution is
> > not really clear in this case. What should have precedence in the case
> > of conflicts? The parserelations script or the existing relations.
> >
> 
> 
> Right, sorry for dropping this, work got a bit crazy.
> 
> It occurs to me that I have an embedded assumption that may not be true
> here. Are you trying to build a set of relations that includes
> cross-project relations, or are you just trying to build a set of
> relations entirely or mostly within a single patchwork project?
> 
> I've been assuming that your relations are entirely or predominantly
> intra-project, but I'm starting to suspect that maybe my understanding
> is not right.
The relations can be inter project too. PaStA is configured with a list
of Patchwork projects for which it will pull all the corresponding
patches and perform it's analyses. PaStA can relate patches across
projects too.

> If you are expecting a non-trivial quantity of cross-project relations,
> then my suggested approach earlier (caller of the script specifies the
> projects that patches may belong to) doesn't make sense. It probably
> makes most sense to go with approach (a) in that case - it's really
> difficult to figure out how to do (b) in a way that preserves the
> meaning that any other API user, or the PaStA tool, is trying to create.
> 
> If we do go down route (a), I want to make it really hard for users to
> accidentally shoot themselves in the foot here and wipe out data. So
> probably I would say that parserelations should only insert relations if
> the table is empty when the script begins, and there should be a
> separate command that with a suitably scary name that purges all
> existing relations.

I agree that we should make parserelations actions very explicit to
prevent any loss of data. However, if we use parserelations to
differentially import relations we will need to be able to insert
relations with a non empty table. Let me explain the complete flow
between PaStA and Patchwork so that things are a bit clear.
We plan to run a script, maybe as a cron job at regular time intervals
which does the following:
1. Exports patches from Patchwork into PaStA
2. Runs PaStA's analyses
3. Runs the parserelations script to parse the patch-groups file
generated by PaStA and inserts the relations into Patchwork. In this
step we refresh the db and insert all the relations afresh. A non
destructive approach would involve thinking of a way to resolve
conflicts between existing and new relations.

> 
> Have I correctly understood things here?
> 
> Regards,
> Daniel
> 
Thanks,
Rohit