[RFC 0/2] Rework tagging infrastructure
Veronika Kabatova
vkabatov at redhat.com
Thu Apr 12 03:50:22 AEST 2018
----- Original Message -----
> From: "Stephen Finucane" <stephen at that.guru>
> To: vkabatov at redhat.com, patchwork at lists.ozlabs.org
> Sent: Wednesday, April 11, 2018 6:43:25 PM
> Subject: Re: [RFC 0/2] Rework tagging infrastructure
>
> On Fri, 2018-03-16 at 15:38 +0100, vkabatov at redhat.com wrote:
> > From: Veronika Kabatova <vkabatov at redhat.com>
>
> I'm going to review this this week. However, this doesn't apply cleanly
> to head of master any more (sorry :(). Any chance you could send
> updated versions of these?
>
Will rebase and resend shortly.
Veronika
> Stephen
>
> > (TL;DR at the end)
> >
> > This RFC describes an approach to rework tagging. It attempts to solve
> > GitHub
> > issues #57 [1] and #113 [2] as well as some other things we encountered.
> > I'm
> > sending the incomplete version (eg I haven't fixed the tests) to discuss
> > the
> > approach first.
> >
> > Right now, tags are extracted from all the comments on the patch and the
> > patch
> > itself, and they are reextracted from all the sources every time a comment
> > is
> > added or removed. It makes saving slower, and might contribute to races
> > with
> > writes to database when we are parsing multiple emails at the same time.
> > This
> > gets even more prevalent if we want to solve the issue #113 (tags on cover
> > letter should increment counters on every patch in series) -- for each
> > added
> > comment on the cover letter, we would reextract tags from all the other
> > sources,
> > for each patch in series; and for a change on comments related to the patch
> > directly, we would need to take the tags on the cover letter and it's
> > comments
> > into account as well (I implemented this solution in my fork but I really
> > don't like it).
> >
> > The current approach has several other issues as well, some of which are
> > mentioned in the issue #57, eg duplicate tags are counted more times.
> > Taking
> > into account the tags on cover letters would also be easier if we could
> > store
> > tags against them and just query them on demand, instead of reextracting
> > everything all over again.
> >
> > Solutions for some other things we found missing solve the issues mentioned
> > above too. If we want to determine if the tag is duplicate, we need to save
> > the
> > associated value. Having the value would help us to use arbitrary strings
> > as
> > tags (for example links to issue trackers, like `Bugzilla: <link>` if the
> > patch
> > solves a known bug). The key-values approach to storing tags is mentioned
> > [3],
> > this email additionally mentions a currently nonexistent /comments REST API
> > which we would like to implement some time in the future. For the comments
> > API
> > we would also find it very useful to have the tags extracted from the
> > comments
> > available directly so we can query for them, which means we would either
> > need to
> > reextract the tags on every API call, or we could store the tags against
> > the
> > comments as they are extracted and only query them as needed. Keeping tags
> > related to comments where they originated from avoids the need to reextract
> > tags with addition of new comments, as well as gets rid of the
> > `patch_responses`
> > property used when converting comments to mbox (we finally get all the
> > custom
> > tags there instead of only the few hardcoded ones).
> >
> > TL;DR:
> > Our goals:
> > - Avoid tag reextraction with each added comment
> > - Fix issues #57 and #113
> > - Prepare things for addition of /comments API
> > - Add tags to patch (currently returns {}) and cover letter APIs
> >
> > If you agree with the general idea, I'd like to get some help with DB query
> > optimizations. Tags are more distributed, so counting them in PatchQuerySet
> > is
> > harder. I wanted to use annotation with Django's conditional selection, but
> > it
> > doesn't seem to work very well and generated invalid SQL in my attempts
> > (hence
> > my temporary solution with counting the tags on the fly only for the
> > patches
> > that are being displayed). That said, I'm not a DB expert so any faster
> > working
> > solutions are more than welcome.
> >
> >
> > Thoughts? Ideas?
> >
> >
> > [1] https://github.com/getpatchwork/patchwork/issues/57
> > [2] https://github.com/getpatchwork/patchwork/issues/113
> > [3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html
> >
> > Veronika Kabatova (2):
> > Rework tagging infrastructure
> > Add migration for tagging changes
> >
> > docs/deployment/management.rst | 16 +--
> > patchwork/api/cover.py | 32 ++++-
> > patchwork/api/patch.py | 41 +++++-
> > patchwork/management/commands/retag.py | 14 +-
> > patchwork/migrations/0024_rework_tagging.py | 137 ++++++++++++++++++
> > patchwork/models.py | 210
> > ++++++++++++++--------------
> > patchwork/templatetags/patch.py | 3 +-
> > patchwork/views/__init__.py | 3 -
> > patchwork/views/utils.py | 8 +-
> > 9 files changed, 331 insertions(+), 133 deletions(-)
> > create mode 100644 patchwork/migrations/0024_rework_tagging.py
> >
>
>
More information about the Patchwork
mailing list