[RFC PATCH v2 0/1] add parserelations management command

Thu Jul 2 08:48:36 AEST 2020

Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:

> On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote:
>> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
>> 
>> > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote:
>> >> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
>> >> 
>> >> > Hey,
>> >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote:
>> >> >> Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:
>> >> >> 
>> >> >> > Hey,
>> >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote:
>> >> >> >> The parserelations command will be used to bulk import patch relations into
>> >> >> >> Patchwork using a "patch groups" file.
>> >> >> >> The patch groups file is structured such that each line contains a
>> >> >> >> relation.
>> >> >> >> Eg patch groups file contents.
>> >> >> >> 1 3 4
>> >> >> >> 2
>> >> >> >> 5 9 10
>> >> >> >> 
>> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and
>> >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit
>> >> >> >> references.
>> >> >> >> Note that before ingesting the relations the existing relations in the
>> >> >> >> DB are removed.
>> >> >> >> 
>> >> >> >> v1..v2
>> >> >> >> - remove commit references from the patch groups file.
>> >> >> >> - Update documentation
>> >> >> >> - rename some variables to remove the overloading.
>> >> >> >> - use filter and update instead of get and save to reduce the db calls.
>> >> >> >>   (Visible performance enhancement)
>> >> >> >> 
>> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to
>> >> >> >> proceed.
>> >> >> >> 
>> >> >> >> Rohit Sarkar (1):
>> >> >> >>   management: introduce parserelations command
>> >> >> >> 
>> >> >> >>  docs/deployment/management.rst                | 26 +++++++
>> >> >> >>  .../management/commands/parserelations.py     | 71 +++++++++++++++++++
>> >> >> >>  patchwork/tests/test_management.py            |  7 ++
>> >> >> >>  3 files changed, 104 insertions(+)
>> >> >> >>  create mode 100644 patchwork/management/commands/parserelations.py
>> >> >> >> 
>> >> >> >> -- 
>> >> >> >> 2.23.0.385.gbc12974a89
>> >> >> >> 
>> >> >> >
>> >> >> > Just wanted to follow up on this. Does this look good?
>> >> >> 
>> >> >> I was thinking about this as I washed the dishes last night.
>> >> >> 
>> >> >> Purging all relations in the database means that if any API-based user
>> >> >> of relations emerges before the big public servers adopt a release with
>> >> >> this, then they'll never be able to use it.
>> >> >> 
>> >> >> What's the speed impact of doing two passes through the data, with pass
>> >> >> 1 collecting all the projects that this touches and pass 2 actually
>> >> >> installing the relations?
>> >> >> 
>> >> >> Regards,
>> >> >> Daniel
>> >> > I wanted to reopen the discussion on this with the aim being to get
>> >> > everyone on the same page, and either go with this approach or decide on
>> >> > an alternate approach.
>> >> >
>> >> > The primary goal is to have a "parserelations" command that can parse a
>> >> > "patch-groups" file that contains a relation on each line. There are 2
>> >> > approaches:
>> >> >
>> >> > a. Refresh the relations table. The parserelations command will remove
>> >> > the existing relations before inserting the newly parsed relations. This
>> >> > avoids any conflicts between existing and new relations. However other
>> >> > users of the API might be affected.
>> >> >
>> >> > b. Insert relations in a non destructive way. We need a way to handle
>> >> > conflicts between existing and incoming relations. So the solution is
>> >> > not really clear in this case. What should have precedence in the case
>> >> > of conflicts? The parserelations script or the existing relations.
>> >> >
>> >> 
>> >> 
>> >> Right, sorry for dropping this, work got a bit crazy.
>> >> 
>> >> It occurs to me that I have an embedded assumption that may not be true
>> >> here. Are you trying to build a set of relations that includes
>> >> cross-project relations, or are you just trying to build a set of
>> >> relations entirely or mostly within a single patchwork project?
>> >> 
>> >> I've been assuming that your relations are entirely or predominantly
>> >> intra-project, but I'm starting to suspect that maybe my understanding
>> >> is not right.
>> > The relations can be inter project too. PaStA is configured with a list
>> > of Patchwork projects for which it will pull all the corresponding
>> > patches and perform it's analyses. PaStA can relate patches across
>> > projects too.
>> >
>> >> If you are expecting a non-trivial quantity of cross-project relations,
>> >> then my suggested approach earlier (caller of the script specifies the
>> >> projects that patches may belong to) doesn't make sense. It probably
>> >> makes most sense to go with approach (a) in that case - it's really
>> >> difficult to figure out how to do (b) in a way that preserves the
>> >> meaning that any other API user, or the PaStA tool, is trying to create.
>> >> 
>> >> If we do go down route (a), I want to make it really hard for users to
>> >> accidentally shoot themselves in the foot here and wipe out data. So
>> >> probably I would say that parserelations should only insert relations if
>> >> the table is empty when the script begins, and there should be a
>> >> separate command that with a suitably scary name that purges all
>> >> existing relations.
>> >
>> > I agree that we should make parserelations actions very explicit to
>> > prevent any loss of data. However, if we use parserelations to
>> > differentially import relations we will need to be able to insert
>> > relations with a non empty table. Let me explain the complete flow
>> > between PaStA and Patchwork so that things are a bit clear.
>> > We plan to run a script, maybe as a cron job at regular time intervals
>> > which does the following:
>> > 1. Exports patches from Patchwork into PaStA
>> > 2. Runs PaStA's analyses
>> > 3. Runs the parserelations script to parse the patch-groups file
>> > generated by PaStA and inserts the relations into Patchwork. In this
>> > step we refresh the db and insert all the relations afresh. A non
>> > destructive approach would involve thinking of a way to resolve
>> > conflicts between existing and new relations.
>> 
>> OK.
>> 
>> I am happy for you to stick with a destructive approach for the
>> managment command, so long as it's clear to the user that data is
>> potentially about to be destroyed. I'm happy for that to be:
>> 
>>  - two commands: a command that purges relations, and then a parse
>>    command that checks if the relations table is empty and refuses to
>>    run if it is not empty.
>> 
>>  - a single command that requires you to pass a command line option like
>>    '--really-delete-old-relations'
>> 
>>  - a single command with a name that captures the idea that data will be
>>    destroyed. I don't know what that would be, maybe 'replace...'.
>> 
>> I'm not fussy which option you go with, I just don't want a command
>> named 'parse...' to delete data without the user being properly warned.
> Yeah, I agree. The name "parserelations" doesnt make it explicit enough.
> I am fine with replacerelations.
>

Ok cool! If you send a patch with this name and with tests, then -
assuming I don't have any code-quality nits - I'd be happy to take it.

Regards,
Daneil

>> (You will at some point need to consider how to do incremental updates
>> over the API for a publicly usable version, but that can definitely wait
>> and might be out of scope for your GSoC project.)
> Agreed.
>
>> Regards,
>> Daniel
>> 
>> >
>> >> 
>> >> Have I correctly understood things here?
>> >> 
>> >> Regards,
>> >> Daniel
>> >> 
>> > Thanks,
>> > Rohit
> Thanks
> Rohit