[PATCH RESEND 1/2] Rework tagging infrastructure

Veronika Kabatova vkabatov at redhat.com
Sat Sep 22 06:13:55 AEST 2018



----- Original Message -----
> From: "Stephen Finucane" <stephen at that.guru>
> To: "Veronika Kabatova" <vkabatov at redhat.com>
> Cc: patchwork at lists.ozlabs.org
> Sent: Wednesday, September 19, 2018 11:38:32 PM
> Subject: Re: [PATCH RESEND 1/2] Rework tagging infrastructure
> 
> On Wed, 2018-09-12 at 04:57 -0400, Veronika Kabatova wrote:
> > 
> > ----- Original Message -----
> > > From: "Stephen Finucane" <stephen at that.guru>
> > > To: vkabatov at redhat.com, patchwork at lists.ozlabs.org
> > > Sent: Tuesday, September 11, 2018 7:55:46 PM
> > > Subject: Re: [PATCH RESEND 1/2] Rework tagging infrastructure
> > > 
> > > On Mon, 2018-07-09 at 18:10 +0200, vkabatov at redhat.com wrote:
> > > > From: Veronika Kabatova <vkabatov at redhat.com>
> > > > 
> > > > Solve #113 and #57 GitHub issues, fix up returning tags in the API,
> > > > keep track of tag origin to be able to add tags to comments in the API.
> > > > 
> > > > Use relations Tag-Patch and Tag-CoverLetter to avoid duplication of
> > > > tags for each patch in series, and use `series` attribute of
> > > > SubmissionTag as a notion of a tag which is related to each patch in
> > > > the
> > > > series (because it comes from cover letter or it's comments)
> > > > 
> > > > Signed-off-by: Veronika Kabatova <vkabatov at redhat.com>
> > > 
> > > To start, apologies if I'm repeating myself on any of the comments
> > > below: it's been a long time since I first reviewed this (sorry!).
> > > Also, for the next revision of this, this can definitely be broken up
> > > on purpose :) Given that tags aren't currently shown in the API, I'd
> > > move the API changes into a separate patch. Similarly, docs (though not
> > > release notes) aren't needed and can be broken up and moved into a
> > > separate patch. Now to the review...
> > > 
> > 
> > Sounds good!
> > 
> > /me goes on to comment on everything
> 
> I've been playing with the new revision all evening and, unfortunately,
> have been hitting some pretty dire performance issues. I've outlined
> what I think to be the cause of the issues below, along with some steps
> that might help resolve these issues. Also, many of the comments I have
> on the new revision are follow ups to comments from the earlier
> revision (this one) so I'm replying here to avoid repeating myself :)
> 
> From what I can see, there are two issues here:
> 
>    1. The main issue we still have here is that fetching and combining
>       tags for a patch is really expensive. This is because we need to
>       fetch tags for the patch, its comments, the patch's series and this
>       series' comments. We do this in 'Patch.all_tags'.
>    2. Related to (1), because Tags are mapped to Submission, we suddenly
>       need to join on the Submission table again, something which Daniel
>       fixed in commit c97dad1cd.
> 
> I realized I've been pushing you in this direction but I didn't forsee
> either of these issues upfront. However, I do have an option that might
> resolve this.
> 
> Currently, SubmissionTag has three foreign keys:
> 
>  * submission
>  * series, NULLABLE
>  * comment, NULLABLE
> 
> As noted above, this means getting all tags for a given patch means
> fetching tags for the patch, its comments, the patch's series and this
> series' comments. From what I can tell, we can minimize the cost of
> only the first of these (using prefetch_related). The rest involve a
> query _per patch_ in the list which is really, really expensive.
> 
> I propose changing SubmissionTag to SeriesTag. This will have three
> foreign keys:
> 
>  * series
>  * patch, NULLABLE
>  * comment, NULLABLE
> 
> These would be set like so:
> 
>  * When you receive a cover letter, you set just the series field
>  * When you receive a patch, you set the series and patch fields
>  * When you receive a cover letter comment, you set the series and
>    comment fields
>  * When you receive a patch comment, you set all fields
> 

Interesting idea. I managed to create a working prototype here:
https://github.com/veruu/patchwork/tree/tags_changes

The tests are only partially fixed, but I'll finish that part later when
we agree on the general approach.

> Once we do this, we should be able to something like the below to get
> all tags for a given patch:
> 
>    SELECT * FROM patchwork_patch p
>    INNER JOIN patchwork_seriestag t ON p.submission_ptr_id = t.patch_id
>    UNION
>    SELECT * FROM patchwork_patch p
>    INNER JOIN patchwork_seriestag t ON p.series_id = t.series_id AND
>    t.patch_id IS NULL;
> 
> Apologies for it being in SQL - I still haven't figured out how to do
> this in Django's ORM. However, in brief that means:
> 
>    Join the two tables where the tag belongs to either that patch or
>    the patches series.
> 
> If we wanted to get tags for a cover letter, we could then do this:
> 
>    SELECT * patchwork_coverletter c
>    INNER JOIN patchwork_seriestag t ON c.series_id = c.series_id AND
>    c.patch_id IS NULL;
> 
> I _think_ that would work, but the question is figuring out how to
> convert that to something the ORM will like. I'll work on this myself
> over the coming days, but this is where I got this evening.
> 

Oh yeah, this part is fun. For a single patch (the all_tags property)
it's doable. For the tag filtering with the whole query set, I wasn't
able to create a single request. I also have a feeling that Django's
lazy evaluation for query sets in this case isn't exactly helpful either.
I got something that works, but didn't try to inspect the underlying
query created by Django in either case.

> More comments below.
> 
> > > > ---
> > > >  patchwork/api/comment.py        |  18 ++++-
> > > >  patchwork/api/cover.py          |  19 ++++-
> > > >  patchwork/api/filters.py        |  68 +++++++++++++++-

[snip]

> > 
> > I vaguely remember discussing this with someone before, but don't know
> > whom.
> > What if the submission is edited via the admin interface, and the tags,
> > being extracted only on the entry creation, will not match the real values
> > in
> > the database anymore? While it's not the intended workflow, it is possible
> > to
> > do it, and I think consistency between the data is important.
> > 
> > If there's an easy way to detect if the content of the submission was
> > changed
> > and not trigger the recounts otherwise, I'm all for it. Unfortunately, the
> > only ways I found were to keep a hidden copy of the field or to fetch the
> > data from the DB before saving, and neither of them sounds cool. Or, at
> > least
> > to trigger only on the changes coming from the admin interface, which would
> > reduce the trigger count a tons already (but still not move it to parser).
> 
> This is a very good point BUT I think you're overthinking this :) Is
> there a good reason the 'content' field and 'diff' fields of a patch
> should be writable? Personally, I don't think they should. I realize
> this is a unrelated change but I'd just make those fields immutable and
> avoid this whole thing.
> 
> (Note: I do realize people could still edit this through SQL or by
> making the fields writeable in the admin again, but once you do that,
> all guarantees are out the window).
> 

Some of our maintainers do edit the fields. For example, if the commit
message needs to be edited, instead of having the author resend the patch
or forgetting to fix it when applying the patch they change it in PW when
they see it.

I did double check with them regarding this feature though, and it should
be safe to move tag extraction to parser (which I did for the new linked
version). I can't speak for other people though.


Anyways, let me know if you have other ideas on what to include!
Veronika



More information about the Patchwork mailing list