[PATCH v2 3/9] tools/scripts: split a mbox N ways

Stephen Finucane stephen at that.guru
Tue Feb 27 22:18:01 AEDT 2018


On Mon, 2018-02-26 at 15:49 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen at that.guru> writes:
> 
> > On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> > > To test parallel loading of mail, it's handy to be able to split
> > > an existing mbox file into N mbox files in an alternating pattern
> > > (e.g. 1 2 1 2 or 1 2 3 4 1 2 3 4 etc)
> > > 
> > > Introduce tools/scripts as a place to put things like this.
> > > 
> > > Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> > > Signed-off-by: Daniel Axtens <dja at axtens.net>
> > > 
> > > --
> > > 
> > > v2: address Andrew's review comments
> > >     for full pep8 compliance, add to tox.ini testing
> > > ---
> > >  tools/scripts/split_mail.py | 80
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > >  tox.ini                     |  2 +-
> > >  2 files changed, 81 insertions(+), 1 deletion(-)
> > >  create mode 100755 tools/scripts/split_mail.py
> > > 
> > > diff --git a/tools/scripts/split_mail.py
> > > b/tools/scripts/split_mail.py
> > > new file mode 100755
> > > index 000000000000..d1e3b06fdf85
> > > --- /dev/null
> > > +++ b/tools/scripts/split_mail.py
> > > @@ -0,0 +1,80 @@
> > > +#!/usr/bin/python3
> > > +# Patchwork - automated patch tracking system
> > > +# Copyright (C) 2018 Daniel Axtens <dja at axtens.net>
> > > +#
> > > +# This file is part of the Patchwork package.
> > > +#
> > > +# Patchwork is free software; you can redistribute it and/or modify
> > > +# it under the terms of the GNU General Public License as published
> > > by
> > > +# the Free Software Foundation; either version 2 of the License, or
> > > +# (at your option) any later version.
> > > +#
> > > +# Patchwork is distributed in the hope that it will be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +
> > > +import sys
> > > +import os
> > > +import mailbox
> > > +
> > > +usage = """Split a maildir or mbox into N mboxes
> > > +in an alternating pattern
> > > +
> > > +Usage: ./split_mail.py <input> <mbox prefix> <N>
> > > +
> > > + <input>: input mbox file or Maildir
> > > + <mbox prefix>: output mbox
> > > +    <mbox-prefix>-1... must not exist
> > > + <N> N-way split"""
> > > +
> > > +
> > > +if len(sys.argv) != 4:
> > > +    print(usage)
> > > +    exit(1)
> > > +
> > > +in_name = sys.argv[1]
> > > +out_name = sys.argv[2]
> > > +
> > > +try:
> > > +    n = int(sys.argv[3])
> > > +except ValueError:
> > > +    print("N must be an integer.")
> > > +    print(" ")
> > > +    print(usage)
> > > +    exit(1)
> > > +
> > > +if n < 2:
> > > +    print("N must be be at least 2")
> > > +    print(" ")
> > > +    print(usage)
> > > +    exit(1)
> > > +
> > > +if not os.path.exists(in_name):
> > > +    print("No input at ", in_name)
> > > +    print(" ")
> > > +    print(usage)
> > > +    exit(1)
> > > +
> > 
> > Can we just use argparse for this, please? It handles all these kinds
> > of checks for us.
> 
> I really want to get the core of the series merged and backported and I
> don't want to get too caught up in these otherwise perfectly valid
> review comments.
> 
> How about I split the series in half: 1-4 and 5-9, and then we can
> prioritise merging 5-9 while working out these and related quirks in
> 1-4?

Sounds good to me.

> > 
> > > +print("Opening", in_name)
> > > +if os.path.isdir(in_name):
> > > +    inmail = mailbox.Maildir(in_name)
> > > +else:
> > > +    inmail = mailbox.mbox(in_name)
> > > +
> > 
> > This needs to be closed onced open. You'll see warning in Python 3.4
> > (?) otherwise.
> 
> Oddly I haven't seen them, but I will fix this in the respin of this half.

Also good.

Cheers,
Stephen

> Regards,
> Daniel
> 
> > 
> > > +out = []
> > > +for i in range(n):
> > > +    if os.path.exists(out_name + "-" + str(i + 1)):
> > > +        print("mbox already exists at ", out_name + "-" + str(i +
> > > 1))
> > > +        print(" ")
> > > +        print(usage)
> > > +        exit(1)
> > > +
> > > +    out += [mailbox.mbox(out_name + '-' + str(i + 1))]
> > > +
> > > +print("Copying messages")
> > > +
> > > +for (i, msg) in enumerate(inmail):
> > > +    out[i % n].add(msg)
> > > +
> > > +print("Done")
> > > diff --git a/tox.ini b/tox.ini
> > > index 09505f78e157..345f7fe2e15a 100644
> > > --- a/tox.ini
> > > +++ b/tox.ini
> > > @@ -37,7 +37,7 @@ commands =
> > >  [testenv:pep8]
> > >  basepython = python2.7
> > >  deps = flake8
> > > -commands = flake8 {posargs} patchwork patchwork/bin/pwclient
> > > +commands = flake8 {posargs} patchwork patchwork/bin/pwclient
> > > tools/scripts/split_mail.py
> > >  
> > >  [flake8]
> > >  ignore = E129, F405



More information about the Patchwork mailing list