[PATCH v2 3/9] tools/scripts: split a mbox N ways

Daniel Axtens dja at axtens.net
Mon Feb 26 15:49:26 AEDT 2018


Stephen Finucane <stephen at that.guru> writes:

> On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> To test parallel loading of mail, it's handy to be able to split
>> an existing mbox file into N mbox files in an alternating pattern
>> (e.g. 1 2 1 2 or 1 2 3 4 1 2 3 4 etc)
>> 
>> Introduce tools/scripts as a place to put things like this.
>> 
>> Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>> Signed-off-by: Daniel Axtens <dja at axtens.net>
>> 
>> --
>> 
>> v2: address Andrew's review comments
>>     for full pep8 compliance, add to tox.ini testing
>> ---
>>  tools/scripts/split_mail.py | 80
>> +++++++++++++++++++++++++++++++++++++++++++++
>>  tox.ini                     |  2 +-
>>  2 files changed, 81 insertions(+), 1 deletion(-)
>>  create mode 100755 tools/scripts/split_mail.py
>> 
>> diff --git a/tools/scripts/split_mail.py
>> b/tools/scripts/split_mail.py
>> new file mode 100755
>> index 000000000000..d1e3b06fdf85
>> --- /dev/null
>> +++ b/tools/scripts/split_mail.py
>> @@ -0,0 +1,80 @@
>> +#!/usr/bin/python3
>> +# Patchwork - automated patch tracking system
>> +# Copyright (C) 2018 Daniel Axtens <dja at axtens.net>
>> +#
>> +# This file is part of the Patchwork package.
>> +#
>> +# Patchwork is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published
>> by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# Patchwork is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +
>> +import sys
>> +import os
>> +import mailbox
>> +
>> +usage = """Split a maildir or mbox into N mboxes
>> +in an alternating pattern
>> +
>> +Usage: ./split_mail.py <input> <mbox prefix> <N>
>> +
>> + <input>: input mbox file or Maildir
>> + <mbox prefix>: output mbox
>> +    <mbox-prefix>-1... must not exist
>> + <N> N-way split"""
>> +
>> +
>> +if len(sys.argv) != 4:
>> +    print(usage)
>> +    exit(1)
>> +
>> +in_name = sys.argv[1]
>> +out_name = sys.argv[2]
>> +
>> +try:
>> +    n = int(sys.argv[3])
>> +except ValueError:
>> +    print("N must be an integer.")
>> +    print(" ")
>> +    print(usage)
>> +    exit(1)
>> +
>> +if n < 2:
>> +    print("N must be be at least 2")
>> +    print(" ")
>> +    print(usage)
>> +    exit(1)
>> +
>> +if not os.path.exists(in_name):
>> +    print("No input at ", in_name)
>> +    print(" ")
>> +    print(usage)
>> +    exit(1)
>> +
>
> Can we just use argparse for this, please? It handles all these kinds
> of checks for us.

I really want to get the core of the series merged and backported and I
don't want to get too caught up in these otherwise perfectly valid
review comments.

How about I split the series in half: 1-4 and 5-9, and then we can
prioritise merging 5-9 while working out these and related quirks in
1-4?

>
>> +print("Opening", in_name)
>> +if os.path.isdir(in_name):
>> +    inmail = mailbox.Maildir(in_name)
>> +else:
>> +    inmail = mailbox.mbox(in_name)
>> +
>
> This needs to be closed onced open. You'll see warning in Python 3.4
> (?) otherwise.

Oddly I haven't seen them, but I will fix this in the respin of this half.

Regards,
Daniel

>
>> +out = []
>> +for i in range(n):
>> +    if os.path.exists(out_name + "-" + str(i + 1)):
>> +        print("mbox already exists at ", out_name + "-" + str(i +
>> 1))
>> +        print(" ")
>> +        print(usage)
>> +        exit(1)
>> +
>> +    out += [mailbox.mbox(out_name + '-' + str(i + 1))]
>> +
>> +print("Copying messages")
>> +
>> +for (i, msg) in enumerate(inmail):
>> +    out[i % n].add(msg)
>> +
>> +print("Done")
>> diff --git a/tox.ini b/tox.ini
>> index 09505f78e157..345f7fe2e15a 100644
>> --- a/tox.ini
>> +++ b/tox.ini
>> @@ -37,7 +37,7 @@ commands =
>>  [testenv:pep8]
>>  basepython = python2.7
>>  deps = flake8
>> -commands = flake8 {posargs} patchwork patchwork/bin/pwclient
>> +commands = flake8 {posargs} patchwork patchwork/bin/pwclient
>> tools/scripts/split_mail.py
>>  
>>  [flake8]
>>  ignore = E129, F405


More information about the Patchwork mailing list