[PATCH] Implement list filtering

Tue Feb 6 04:14:18 AEDT 2018

On Tue, Feb 06, 2018 at 02:24:21AM +1100, Daniel Axtens wrote:
> Hi Veronika,
> 
> So, having been convinced of the merits of this approach, here's my code
> review.
> 
> > Sometimes, multiple projects reside at the same mailing list. So far,
> > Patchwork only allowed a single project per mailing list, which made it
> > impossible for these projects to use Patchwork (unless they did some
> > dirty hacks).
> >
> > Add a new property `subject_match` to projects and implement filtering
> > on (list_id, subject_match) match instead of solely list_id. Instance
> > admin can specify a regex on a per-project basis when the project is
> > created.
> 
> Ok, so let me see if I have the logic of this patch right.
> 
> A message comes in.
> 
> A set of possible projects is found based on the list-id.
> 
> Then, starting from the first one returned by the db (with no intrinsic
> ordering), the subject_match field is examined:
>  - if subject_match is empty, this project is returned as a match
>  - if subject_match is non-empty, the project is returned as a match if
>    the subject matches
>  
> If this understanding is incorrect, bail out here :)
> 
> Consider a list like netdev. It has at least a couple of userspace
> projects (iproute2, ethtool) that could conceivably be spun off into
> their own lists, and which could use subject matches, but most patches
> are for the linux kernel and do not have usable subject matches.
> 
> As I understand it, you would need to have 3 projects:
>  - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
>  - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
> Then how would you match the rest of the list? You could have an empty
> subject-match, but then if that is loaded first out of the database, it
> would match first and patches would never make it to the iproute2 or
> ethtool projects.
> 
> I think you need some sort of default or fall-back or last-resort - the
> name is not important: just that it will match *only* if no other
> project matches.

Good point.  Internally in Red Hat we had a default bucket that caught stuff
like this because regex's can't capture 100% of human behaviour (humans are
great at doing dumb things :-) ).

Periodically we just reviewed the bucket and manually moved patches to the
right project.  I think that is what you are thinking, right?

Cheers,
Don

> 
> Does that sound right?
> 
> I have some other minor comments which I have included throughout:
> 
> > diff --git a/docs/api.yaml b/docs/api.yaml
> > index 3e79f0b..3373226 100644
> > --- a/docs/api.yaml
> > +++ b/docs/api.yaml
> > @@ -374,6 +374,9 @@ definitions:
> >        list_id:
> >          type: string
> >          description: Mailing list identifier for project.
> > +      subject_match:
> > +        type: string
> > +        description: Regex used for email filtering.
> This is good.
> 
> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> > index c50b7b6..870d7ee 100644
> > --- a/docs/deployment/management.rst
> > +++ b/docs/deployment/management.rst
> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> >  .. option:: --list-id <list-id>
> >  
> >     mailing list ID. If not supplied, this will be extracted from the mail
> > -   headers.
> > +   headers. Don't use this option if you require filtering based on subjects.
> >  
> >  .. option:: infile
> 
> I don't understand why this would interfere with subject
> filtering. I notice below that you've added a new method that matches by
> list-id and subject and kept the old list-id matching one - does the
> command-line option use the old method?
> 
> I think I would prefer that there only be one system of mapping mails to
> projects and for this restriction to go away, but if there's a good
> reason for it to stay I am always open to persuasion.
> 
> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> > index 446c473..597f605 100644
> > --- a/patchwork/api/project.py
> > +++ b/patchwork/api/project.py
> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> >      class Meta:
> >          model = Project
> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> > -        read_only_fields = ('name', 'maintainers')
> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> > +                  'subject_match')
> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
> This looks good - albeit I am not an API expert.
> 
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index 11886f1..907707f 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >  
> >      linkname = models.CharField(max_length=255, unique=True)
> >      name = models.CharField(max_length=255, unique=True)
> > -    listid = models.CharField(max_length=255, unique=True)
> > +    listid = models.CharField(max_length=255)
> >      listemail = models.CharField(max_length=200)
> > +    subject_match = models.CharField(
> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
> > +        'subject against if only part of emails sent to the list belongs to '
> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> > +        'rules for more projects match the first one returned from DB is '
> > +        'chosen.')
> 
> I don't want to bikeshed this too much, but it definitely needs to say
> that a blank string matches everything.
> 
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index ac7dc5f..015f709 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> >          project = Project.objects.get(listid=list_id)
> >      except Project.DoesNotExist:
> >          logger.debug("'%s' if not a valid project list-id", list_id)
> > +    except Project.MultipleObjectsReturned:
> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
> >      return project
> >  
> >  
> > +def find_project_by_id_and_subject(list_id, subject):
> > +    """Find a `project` object based on `list_id` and subject prefix match."""
> > +    projects = Project.objects.filter(listid=list_id)
> > +    for project in projects:
> > +        if (not project.subject_match or
> > +                re.search(project.subject_match, subject,
> > +                          re.MULTILINE | re.IGNORECASE)):
> > +            return project
> > +
> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> > +                 "found", list_id, subject)
> > +    return None
> > +
> > +
> >  def find_project_by_header(mail):
> >      project = None
> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >  
> >              listid = match.group(1)
> >  
> > -            project = find_project_by_id(listid)
> > +            project = find_project_by_id_and_subject(listid,
> > +                                                     mail.get('Subject'))
> >              if project:
> >                  break
> Again here there is the 2 ways to find a project. It does look like
> subject matching only happens if you don't specify --list-id and I think
> that is probably not the approach I want to take.
> 
> Patchwork also extracts the bits of the subject between square brackets
> - would matching on that be more appropriate? (See subject_prefixes in
> parser.py.) That would mean the regex will automatically select for
> things in square brackets without having to encode that in the regex,
> which might avoid some user confusion and misdirected mail.
> 
> Lastly, please could you also add a simple test case for this: doesn't need to
> be big, just enough to show that the feature works.
> 
> Thanks again for your contribution to Patchwork!
> 
> Regards,
> Daniel
> _______________________________________________
> Patchwork mailing list
> Patchwork at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork