[PATCH] Implement list filtering
Don Zickus
dzickus at redhat.com
Tue Feb 6 04:14:18 AEDT 2018
On Tue, Feb 06, 2018 at 02:24:21AM +1100, Daniel Axtens wrote:
> Hi Veronika,
>
> So, having been convinced of the merits of this approach, here's my code
> review.
>
> > Sometimes, multiple projects reside at the same mailing list. So far,
> > Patchwork only allowed a single project per mailing list, which made it
> > impossible for these projects to use Patchwork (unless they did some
> > dirty hacks).
> >
> > Add a new property `subject_match` to projects and implement filtering
> > on (list_id, subject_match) match instead of solely list_id. Instance
> > admin can specify a regex on a per-project basis when the project is
> > created.
>
> Ok, so let me see if I have the logic of this patch right.
>
> A message comes in.
>
> A set of possible projects is found based on the list-id.
>
> Then, starting from the first one returned by the db (with no intrinsic
> ordering), the subject_match field is examined:
> - if subject_match is empty, this project is returned as a match
> - if subject_match is non-empty, the project is returned as a match if
> the subject matches
>
> If this understanding is incorrect, bail out here :)
>
> Consider a list like netdev. It has at least a couple of userspace
> projects (iproute2, ethtool) that could conceivably be spun off into
> their own lists, and which could use subject matches, but most patches
> are for the linux kernel and do not have usable subject matches.
>
> As I understand it, you would need to have 3 projects:
> - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
> - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
> Then how would you match the rest of the list? You could have an empty
> subject-match, but then if that is loaded first out of the database, it
> would match first and patches would never make it to the iproute2 or
> ethtool projects.
>
> I think you need some sort of default or fall-back or last-resort - the
> name is not important: just that it will match *only* if no other
> project matches.
Good point. Internally in Red Hat we had a default bucket that caught stuff
like this because regex's can't capture 100% of human behaviour (humans are
great at doing dumb things :-) ).
Periodically we just reviewed the bucket and manually moved patches to the
right project. I think that is what you are thinking, right?
Cheers,
Don
>
> Does that sound right?
>
> I have some other minor comments which I have included throughout:
>
> > diff --git a/docs/api.yaml b/docs/api.yaml
> > index 3e79f0b..3373226 100644
> > --- a/docs/api.yaml
> > +++ b/docs/api.yaml
> > @@ -374,6 +374,9 @@ definitions:
> > list_id:
> > type: string
> > description: Mailing list identifier for project.
> > + subject_match:
> > + type: string
> > + description: Regex used for email filtering.
> This is good.
>
> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> > index c50b7b6..870d7ee 100644
> > --- a/docs/deployment/management.rst
> > +++ b/docs/deployment/management.rst
> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> > .. option:: --list-id <list-id>
> >
> > mailing list ID. If not supplied, this will be extracted from the mail
> > - headers.
> > + headers. Don't use this option if you require filtering based on subjects.
> >
> > .. option:: infile
>
> I don't understand why this would interfere with subject
> filtering. I notice below that you've added a new method that matches by
> list-id and subject and kept the old list-id matching one - does the
> command-line option use the old method?
>
> I think I would prefer that there only be one system of mapping mails to
> projects and for this restriction to go away, but if there's a good
> reason for it to stay I am always open to persuasion.
>
> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> > index 446c473..597f605 100644
> > --- a/patchwork/api/project.py
> > +++ b/patchwork/api/project.py
> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> > class Meta:
> > model = Project
> > fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> > - 'web_url', 'scm_url', 'webscm_url', 'maintainers')
> > - read_only_fields = ('name', 'maintainers')
> > + 'web_url', 'scm_url', 'webscm_url', 'maintainers',
> > + 'subject_match')
> > + read_only_fields = ('name', 'maintainers', 'subject_match')
> This looks good - albeit I am not an API expert.
>
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index 11886f1..907707f 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >
> > linkname = models.CharField(max_length=255, unique=True)
> > name = models.CharField(max_length=255, unique=True)
> > - listid = models.CharField(max_length=255, unique=True)
> > + listid = models.CharField(max_length=255)
> > listemail = models.CharField(max_length=200)
> > + subject_match = models.CharField(
> > + max_length=64, blank=True, default='', help_text='Regex to match the '
> > + 'subject against if only part of emails sent to the list belongs to '
> > + 'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> > + 'rules for more projects match the first one returned from DB is '
> > + 'chosen.')
>
> I don't want to bikeshed this too much, but it definitely needs to say
> that a blank string matches everything.
>
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index ac7dc5f..015f709 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> > project = Project.objects.get(listid=list_id)
> > except Project.DoesNotExist:
> > logger.debug("'%s' if not a valid project list-id", list_id)
> > + except Project.MultipleObjectsReturned:
> > + logger.debug("Multiple projects with list-id '%s' found", list_id)
> > return project
> >
> >
> > +def find_project_by_id_and_subject(list_id, subject):
> > + """Find a `project` object based on `list_id` and subject prefix match."""
> > + projects = Project.objects.filter(listid=list_id)
> > + for project in projects:
> > + if (not project.subject_match or
> > + re.search(project.subject_match, subject,
> > + re.MULTILINE | re.IGNORECASE)):
> > + return project
> > +
> > + logger.debug("No project to match email with list-id '%s', subject '%s' "
> > + "found", list_id, subject)
> > + return None
> > +
> > +
> > def find_project_by_header(mail):
> > project = None
> > listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >
> > listid = match.group(1)
> >
> > - project = find_project_by_id(listid)
> > + project = find_project_by_id_and_subject(listid,
> > + mail.get('Subject'))
> > if project:
> > break
> Again here there is the 2 ways to find a project. It does look like
> subject matching only happens if you don't specify --list-id and I think
> that is probably not the approach I want to take.
>
> Patchwork also extracts the bits of the subject between square brackets
> - would matching on that be more appropriate? (See subject_prefixes in
> parser.py.) That would mean the regex will automatically select for
> things in square brackets without having to encode that in the regex,
> which might avoid some user confusion and misdirected mail.
>
> Lastly, please could you also add a simple test case for this: doesn't need to
> be big, just enough to show that the feature works.
>
> Thanks again for your contribution to Patchwork!
>
> Regards,
> Daniel
> _______________________________________________
> Patchwork mailing list
> Patchwork at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
More information about the Patchwork
mailing list