[PATCH 1/1] management: introduce parserelations command

Daniel Axtens dja at axtens.net
Fri Jun 12 00:50:27 AEST 2020


Rohit Sarkar <rohitsarkar5398 at gmail.com> writes:

> The parserelations script is used to ingest relations into Patchwork's
> patch database. A patch groups file is taken as input, which on each
> line contains a space separated list of patchwork ids denoting a
> relation. All the existing relations in Patchwork's database are removed
> and the relations read from the patch groups file are ingested.
>

This is a solid start, some comments below. I especially appreciate the
documentation update.

> Signed-off-by: Rohit Sarkar <rohitsarkar5398 at gmail.com>
> ---
>  docs/deployment/management.rst                | 27 +++++++
>  .../management/commands/parserelations.py     | 78 +++++++++++++++++++
>  patchwork/tests/test_management.py            |  7 ++
>  3 files changed, 112 insertions(+)
>  create mode 100644 patchwork/management/commands/parserelations.py
>
> diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> index 9c57f19..bcdeae6 100644
> --- a/docs/deployment/management.rst
> +++ b/docs/deployment/management.rst
> @@ -116,6 +116,33 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
>  
>     input mbox filename. If not supplied, a patch will be read from ``stdin``.
>  
> +parserelations
> +~~~~~~~~~~~~~~
> +
> +.. program:: manage.py parserelations
> +
> +Parse a patch groups file and store any relation found
> +
> +.. code-block:: shell
> +
> +    ./manage.py parserelations <infile>
> +
> +This is a script used to ingest relations into Patchwork.
> +A patch groups file contains on each line patchwork ids of patches that form a relation.

I think we would call them patch IDs, not 'patchwork' IDs - we also have
series IDs and event IDs etc.

Also (assuming the rest of the file does this!), please wrap your lines.

> +It can also optionally contain the upstream commit references of the patch group.
> +Eg contents of a patch groups file:
> +
> +    1 3 5
> +    2
> +    7 10 11 12 => <commit-hash-1> <commit-hash-2>
> +
> +In this case the script will identify 2 relations, (1, 3, 5) and (7, 10, 11, 12).
> +Running this script will remove all existing relations and replace them with the relations found in the file.
> +
> +.. option:: infile
> +
> +    input patch groups file.
> +
>  rehash
>  ~~~~~~
>  
> diff --git a/patchwork/management/commands/parserelations.py b/patchwork/management/commands/parserelations.py
> new file mode 100644
> index 0000000..467e504
> --- /dev/null
> +++ b/patchwork/management/commands/parserelations.py
> @@ -0,0 +1,78 @@
> +# Patchwork - automated patch tracking system
> +# Copyright (C) 2020 Rohit Sarkar <rohitsarkar5398 at gmail.com>

I'm not sure if GSoC requirements mean that you keep the copyright or if
it goes to some organisation. Something to check with your other mentors.
(I don't need any justification for the final answer you all land on, I
just want to know you all agree.)

> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import logging
> +import os
> +import sys
> +
> +from django.db import transaction
> +from django.core.management.base import BaseCommand
> +
> +from patchwork.models import Patch
> +from patchwork.models import PatchRelation
> +
> +logger = logging.getLogger(__name__)
> +
> +class Command(BaseCommand):
> +    help = 'Parse a relations file generated by PaStA and replace existing relations with the ones parsed'
> +
> +    def add_arguments(self, parser):
> +        parser.add_argument(
> +            'infile',
> +            help='input relations filename')
> +    
> +    def handle(self, *args, **options):   
> +        verbosity = int(options['verbosity'])
> +        if not verbosity:
> +            level = logging.CRITICAL
> +        elif verbosity == 1:
> +            level = logging.ERROR
> +        elif verbosity == 2:
> +            level = logging.INFO
> +        else:
> +            level = logging.DEBUG
> +
> +        logger.setLevel(level)
> +
> +        path = args and args[0] or options['infile']
> +        if not os.path.exists(path):
> +            logger.error('Invalid path: %s', path)
> +            sys.exit(1)
> +        
> +
> +        with open(path, 'r') as f:
> +            f = f.read().split('\n')

Please:

 - use readlines()

 - don't overload f: use different variables for the file, lines and
   fields.

> +        
> +        while len(f) and not f[-1]:
> +            f.pop()

I don't understand this bit (trim trailing blank lines?) but I hope it
will be clearer with the renamed variables

> +
> +        f = [x.split(' ') for x in f]
> +
> +        with transaction.atomic():
> +            PatchRelation.objects.all().delete()
> +            count = len(f)
> +            ingested = 0
> +            logger.info('Parsing %d relations' % count)
> +            for i, line in enumerate(f):
> +                related = list()
> +                while len(line) and line[0] != '=>':
> +                    try:
> +                        related.append(Patch.objects.get(id=line.pop(0)))
> +                    except Patch.DoesNotExist:
> +                        logger.error('Patch id not found')
> +                                
> +                if len(related) > 1:
> +                    relation = PatchRelation()
> +                    relation.save()
> +                    for patch in related:
> +                        patch.related = relation
> +                        patch.save()

Can you save outside the for block? I think it will make processing
relations of large numbers of patches quite a bit faster.

> +                    ingested += 1
> +                
> +                if i % 10 == 0:
> +                    self.stdout.write('%06d/%06d\r' % (i, count), ending='')
> +                    self.stdout.flush()
> +            
> +            self.stdout.write('Ingested %d relations' % ingested)
> diff --git a/patchwork/tests/test_management.py b/patchwork/tests/test_management.py
> index 66c6bad..186ea56 100644
> --- a/patchwork/tests/test_management.py
> +++ b/patchwork/tests/test_management.py
> @@ -124,3 +124,10 @@ class ParsearchiveTest(TestCase):
>  
>          self.assertIn('Processed 1 messages -->', out.getvalue())
>          self.assertIn('  1 dropped', out.getvalue())
> +
> +class ParserelationsTest(TestCase):
> +    def test_invalid_path(self):
> +        out = StringIO()
> +        with self.assertRaises(SystemExit) as exc:
> +            call_command('parserelations', 'xyz123random', stdout=out)
> +        self.assertEqual(exc.exception.code, 1)
> -- 
> 2.23.0.385.gbc12974a89


More information about the Patchwork mailing list