[RFC v2 0/1] Rework tagging infrastructure

Fri Apr 20 02:36:55 AEST 2018

From: Veronika Kabatova <vkabatov at redhat.com>

(TL;DR at the end)

This RFC describes an approach to rework tagging. It attempts to solve GitHub
issues #57 [1] and #113 [2] as well as some other things we encountered. I'm
sending the incomplete version (eg I haven't fixed the tests) to discuss the
approach first.

Right now, tags are extracted from all the comments on the patch and the patch
itself, and they are reextracted from all the sources every time a comment is
added or removed. It makes saving slower, and might contribute to races with
writes to database when we are parsing multiple emails at the same time. This
gets even more prevalent if we want to solve the issue #113 (tags on cover
letter should increment counters on every patch in series) -- for each added
comment on the cover letter, we would reextract tags from all the other sources,
for each patch in series; and for a change on comments related to the patch
directly, we would need to take the tags on the cover letter and it's comments
into account as well (I implemented this solution in my fork but I really
don't like it).

The current approach has several other issues as well, some of which are
mentioned in the issue #57, eg duplicate tags are counted more times. Taking
into account the tags on cover letters would also be easier if we could store
tags against them and just query them on demand, instead of reextracting
everything all over again.

Solutions for some other things we found missing solve the issues mentioned
above too. If we want to determine if the tag is duplicate, we need to save the
associated value. Having the value would help us to use arbitrary strings as
tags (for example links to issue trackers, like `Bugzilla: <link>` if the patch
solves a known bug). The key-values approach to storing tags is mentioned [3],
this email additionally mentions a comments REST API (currently worked on). For
the comments API we would also find it very useful to have the tags extracted
from the comments available directly so we can query for them, which means we
would either need to reextract the tags on every API call, or we could store
the comment data with the tags as they are extracted and only query them as
needed. Altogether, we would get rid of the `patch_responses` property used
when converting comments to mbox (we finally get all the custom tags there
instead of only the few hardcoded ones too).

TL;DR:
Our goals:
- Avoid tag reextraction with each added comment
- Fix issues #57 and #113
- Prepare tags addition to comments in the API
- Add tags to patch (currently returns {}) and cover letter APIs

[1] https://github.com/getpatchwork/patchwork/issues/57
[2] https://github.com/getpatchwork/patchwork/issues/113
[3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html

Veronika Kabatova (1):
  Rework tagging infrastructure

-- 
2.13.6