[RFC 0/2 REBASE] Rework tagging infrastructure

Wed Apr 18 01:46:12 AEST 2018

On Tue, 2018-04-17 at 11:35 -0400, Veronika Kabatova wrote:
> > > Unless I'm overlooking something, we'd need to have the link from Tag to
> > > both Patch and CoverLetter. This should still have much better performance
> > > than my original solution (and will get rid of the duplication of yours).
> > > 
> > > Does this proposal make sense, or am I missing something?
> > 
> > That mostly makes sense. My main concern is what happens when you want
> > to show tags for a patch when those tags were created again the cover
> > letter. If that's the case, are we going to have to query on
> > 'patch.series.cover_letter.tags'? I imagine that's going to be slow
> > (lots of JOINs). We could store it on the series instead, but I'm not
> > sure how much that would improve things. Any ideas how to work around
> > this?
> > 
> 
> I was thinking about filtering on the SubmissionTag (or whatever the
> intermediate model will be named) based on submission IDs of the patch
> and cover (or comment IDs in case of comments API), instead of going
> through the relations. That said, my database knowledge is very...
> abstract... so I have no idea how much it helps with the underlying
> queries.
> 
> If you (or whoever else) can offer any insight that would be great!

We'd still need to get information about the cover letter though, and
that requires going through the series (one join). Maybe we already
have that JOIN though, so this warrants some validation.

Another idea I've had is to store a series attribute in addition to the
cover letter, comment and patch attributes. That way we could do
something like this for patches:

   tags = Tag.objects.filter(series=patch.series,
                             Q(patch=patch) | Q(patch=None))

e.g. if the patch is part of our series and doesn't belong to _another_
patch, it must be a series-wide patch? You'd need to do additional
filtering on this for duplicates, of course, but I imagine that's easy
enough. You'd also want to make liberal use of the 'only' and 'defer'
functions to make sure we avoid as many joins as possible, however, I
don't think this would require a join on the 'patchwork_series' table
as we only use the ID column (which we'd have).

Thoughts?
Stephen