[RFC 0/2] Rework tagging infrastructure

Thu Apr 12 03:50:22 AEST 2018

----- Original Message -----
> From: "Stephen Finucane" <stephen at that.guru>
> To: vkabatov at redhat.com, patchwork at lists.ozlabs.org
> Sent: Wednesday, April 11, 2018 6:43:25 PM
> Subject: Re: [RFC 0/2] Rework tagging infrastructure
> 
> On Fri, 2018-03-16 at 15:38 +0100, vkabatov at redhat.com wrote:
> > From: Veronika Kabatova <vkabatov at redhat.com>
> 
> I'm going to review this this week. However, this doesn't apply cleanly
> to head of master any more (sorry :(). Any chance you could send
> updated versions of these?
> 

Will rebase and resend shortly.

Veronika

> Stephen
> 
> > (TL;DR at the end)
> > 
> > This RFC describes an approach to rework tagging. It attempts to solve
> > GitHub
> > issues #57 [1] and #113 [2] as well as some other things we encountered.
> > I'm
> > sending the incomplete version (eg I haven't fixed the tests) to discuss
> > the
> > approach first.
> > 
> > Right now, tags are extracted from all the comments on the patch and the
> > patch
> > itself, and they are reextracted from all the sources every time a comment
> > is
> > added or removed. It makes saving slower, and might contribute to races
> > with
> > writes to database when we are parsing multiple emails at the same time.
> > This
> > gets even more prevalent if we want to solve the issue #113 (tags on cover
> > letter should increment counters on every patch in series) -- for each
> > added
> > comment on the cover letter, we would reextract tags from all the other
> > sources,
> > for each patch in series; and for a change on comments related to the patch
> > directly, we would need to take the tags on the cover letter and it's
> > comments
> > into account as well (I implemented this solution in my fork but I really
> > don't like it).
> > 
> > The current approach has several other issues as well, some of which are
> > mentioned in the issue #57, eg duplicate tags are counted more times.
> > Taking
> > into account the tags on cover letters would also be easier if we could
> > store
> > tags against them and just query them on demand, instead of reextracting
> > everything all over again.
> > 
> > Solutions for some other things we found missing solve the issues mentioned
> > above too. If we want to determine if the tag is duplicate, we need to save
> > the
> > associated value. Having the value would help us to use arbitrary strings
> > as
> > tags (for example links to issue trackers, like `Bugzilla: <link>` if the
> > patch
> > solves a known bug). The key-values approach to storing tags is mentioned
> > [3],
> > this email additionally mentions a currently nonexistent /comments REST API
> > which we would like to implement some time in the future. For the comments
> > API
> > we would also find it very useful to have the tags extracted from the
> > comments
> > available directly so we can query for them, which means we would either
> > need to
> > reextract the tags on every API call, or we could store the tags against
> > the
> > comments as they are extracted and only query them as needed. Keeping tags
> > related to comments where they originated from avoids the need to reextract
> > tags with addition of new comments, as well as gets rid of the
> > `patch_responses`
> > property used when converting comments to mbox (we finally get all the
> > custom
> > tags there instead of only the few hardcoded ones).
> > 
> > TL;DR:
> > Our goals:
> > - Avoid tag reextraction with each added comment
> > - Fix issues #57 and #113
> > - Prepare things for addition of /comments API
> > - Add tags to patch (currently returns {}) and cover letter APIs
> > 
> > If you agree with the general idea, I'd like to get some help with DB query
> > optimizations. Tags are more distributed, so counting them in PatchQuerySet
> > is
> > harder. I wanted to use annotation with Django's conditional selection, but
> > it
> > doesn't seem to work very well and generated invalid SQL in my attempts
> > (hence
> > my temporary solution with counting the tags on the fly only for the
> > patches
> > that are being displayed). That said, I'm not a DB expert so any faster
> > working
> > solutions are more than welcome.
> > 
> > 
> > Thoughts? Ideas?
> > 
> > 
> > [1] https://github.com/getpatchwork/patchwork/issues/57
> > [2] https://github.com/getpatchwork/patchwork/issues/113
> > [3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html
> > 
> > Veronika Kabatova (2):
> >   Rework tagging infrastructure
> >   Add migration for tagging changes
> > 
> >  docs/deployment/management.rst              |  16 +--
> >  patchwork/api/cover.py                      |  32 ++++-
> >  patchwork/api/patch.py                      |  41 +++++-
> >  patchwork/management/commands/retag.py      |  14 +-
> >  patchwork/migrations/0024_rework_tagging.py | 137 ++++++++++++++++++
> >  patchwork/models.py                         | 210
> >  ++++++++++++++--------------
> >  patchwork/templatetags/patch.py             |   3 +-
> >  patchwork/views/__init__.py                 |   3 -
> >  patchwork/views/utils.py                    |   8 +-
> >  9 files changed, 331 insertions(+), 133 deletions(-)
> >  create mode 100644 patchwork/migrations/0024_rework_tagging.py
> > 
> 
>