[RFC PATCH v2 0/1] add parserelations management command

Thu Jul 2 04:15:38 AEST 2020

On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote:
> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
> 
> > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote:
> >> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
> >> 
> >> > Hey,
> >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote:
> >> >> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
> >> >> 
> >> >> > Hey,
> >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote:
> >> >> >> The parserelations command will be used to bulk import patch relations into
> >> >> >> Patchwork using a "patch groups" file.
> >> >> >> The patch groups file is structured such that each line contains a
> >> >> >> relation.
> >> >> >> Eg patch groups file contents.
> >> >> >> 1 3 4
> >> >> >> 2
> >> >> >> 5 9 10
> >> >> >> 
> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and
> >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit
> >> >> >> references.
> >> >> >> Note that before ingesting the relations the existing relations in the
> >> >> >> DB are removed.
> >> >> >> 
> >> >> >> v1..v2
> >> >> >> - remove commit references from the patch groups file.
> >> >> >> - Update documentation
> >> >> >> - rename some variables to remove the overloading.
> >> >> >> - use filter and update instead of get and save to reduce the db calls.
> >> >> >>   (Visible performance enhancement)
> >> >> >> 
> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to
> >> >> >> proceed.
> >> >> >> 
> >> >> >> Rohit Sarkar (1):
> >> >> >>   management: introduce parserelations command
> >> >> >> 
> >> >> >>  docs/deployment/management.rst                | 26 +++++++
> >> >> >>  .../management/commands/parserelations.py     | 71 +++++++++++++++++++
> >> >> >>  patchwork/tests/test_management.py            |  7 ++
> >> >> >>  3 files changed, 104 insertions(+)
> >> >> >>  create mode 100644 patchwork/management/commands/parserelations.py
> >> >> >> 
> >> >> >> -- 
> >> >> >> 2.23.0.385.gbc12974a89
> >> >> >> 
> >> >> >
> >> >> > Just wanted to follow up on this. Does this look good?
> >> >> 
> >> >> I was thinking about this as I washed the dishes last night.
> >> >> 
> >> >> Purging all relations in the database means that if any API-based user
> >> >> of relations emerges before the big public servers adopt a release with
> >> >> this, then they'll never be able to use it.
> >> >> 
> >> >> What's the speed impact of doing two passes through the data, with pass
> >> >> 1 collecting all the projects that this touches and pass 2 actually
> >> >> installing the relations?
> >> >> 
> >> >> Regards,
> >> >> Daniel
> >> > I wanted to reopen the discussion on this with the aim being to get
> >> > everyone on the same page, and either go with this approach or decide on
> >> > an alternate approach.
> >> >
> >> > The primary goal is to have a "parserelations" command that can parse a
> >> > "patch-groups" file that contains a relation on each line. There are 2
> >> > approaches:
> >> >
> >> > a. Refresh the relations table. The parserelations command will remove
> >> > the existing relations before inserting the newly parsed relations. This
> >> > avoids any conflicts between existing and new relations. However other
> >> > users of the API might be affected.
> >> >
> >> > b. Insert relations in a non destructive way. We need a way to handle
> >> > conflicts between existing and incoming relations. So the solution is
> >> > not really clear in this case. What should have precedence in the case
> >> > of conflicts? The parserelations script or the existing relations.
> >> >
> >> 
> >> 
> >> Right, sorry for dropping this, work got a bit crazy.
> >> 
> >> It occurs to me that I have an embedded assumption that may not be true
> >> here. Are you trying to build a set of relations that includes
> >> cross-project relations, or are you just trying to build a set of
> >> relations entirely or mostly within a single patchwork project?
> >> 
> >> I've been assuming that your relations are entirely or predominantly
> >> intra-project, but I'm starting to suspect that maybe my understanding
> >> is not right.
> > The relations can be inter project too. PaStA is configured with a list
> > of Patchwork projects for which it will pull all the corresponding
> > patches and perform it's analyses. PaStA can relate patches across
> > projects too.
> >
> >> If you are expecting a non-trivial quantity of cross-project relations,
> >> then my suggested approach earlier (caller of the script specifies the
> >> projects that patches may belong to) doesn't make sense. It probably
> >> makes most sense to go with approach (a) in that case - it's really
> >> difficult to figure out how to do (b) in a way that preserves the
> >> meaning that any other API user, or the PaStA tool, is trying to create.
> >> 
> >> If we do go down route (a), I want to make it really hard for users to
> >> accidentally shoot themselves in the foot here and wipe out data. So
> >> probably I would say that parserelations should only insert relations if
> >> the table is empty when the script begins, and there should be a
> >> separate command that with a suitably scary name that purges all
> >> existing relations.
> >
> > I agree that we should make parserelations actions very explicit to
> > prevent any loss of data. However, if we use parserelations to
> > differentially import relations we will need to be able to insert
> > relations with a non empty table. Let me explain the complete flow
> > between PaStA and Patchwork so that things are a bit clear.
> > We plan to run a script, maybe as a cron job at regular time intervals
> > which does the following:
> > 1. Exports patches from Patchwork into PaStA
> > 2. Runs PaStA's analyses
> > 3. Runs the parserelations script to parse the patch-groups file
> > generated by PaStA and inserts the relations into Patchwork. In this
> > step we refresh the db and insert all the relations afresh. A non
> > destructive approach would involve thinking of a way to resolve
> > conflicts between existing and new relations.
> 
> OK.
> 
> I am happy for you to stick with a destructive approach for the
> managment command, so long as it's clear to the user that data is
> potentially about to be destroyed. I'm happy for that to be:
> 
>  - two commands: a command that purges relations, and then a parse
>    command that checks if the relations table is empty and refuses to
>    run if it is not empty.
> 
>  - a single command that requires you to pass a command line option like
>    '--really-delete-old-relations'
> 
>  - a single command with a name that captures the idea that data will be
>    destroyed. I don't know what that would be, maybe 'replace...'.
> 
> I'm not fussy which option you go with, I just don't want a command
> named 'parse...' to delete data without the user being properly warned.
Yeah, I agree. The name "parserelations" doesnt make it explicit enough.
I am fine with replacerelations.

> (You will at some point need to consider how to do incremental updates
> over the API for a publicly usable version, but that can definitely wait
> and might be out of scope for your GSoC project.)
Agreed.

> Regards,
> Daniel
> 
> >
> >> 
> >> Have I correctly understood things here?
> >> 
> >> Regards,
> >> Daniel
> >> 
> > Thanks,
> > Rohit
Thanks
Rohit