Deduplication of patchwork mail content?
dja at axtens.net
Thu Oct 24 09:36:05 AEDT 2019
Konstantin Ryabitsev <konstantin at linuxfoundation.org> writes:
> On Wed, Oct 09, 2019 at 05:35:03PM +1100, Daniel Axtens wrote:
>> Hi sfr, jk, Konstantin and any other admins lurking,
>> I'm in the process of reworking the patchwork db schema to avoid one of
>> our very big and very annoying (and slow) JOINs.
>> While I'm at it, it occurred to me that for both the ozlabs and
>> kernel.org instances, there are a lot of mails that are sent across
>> multiple projects. ATM the entire contents of the mail - content,
>> headers, diff, what have you, will be stored in full for each project.
>> Would it be of value for your deployments if I used this opportunity to
>> normalise the database and deduplicate emails? I was thinking of
>> splitting the big raw text fields (diff, content, headers) into their
>> own table and then indexing into that by message-id.
> I think space is pretty cheap, and it's going to be a lot of work for
> little savings. Adding some indexes would be a much more effective way
> of improving performance in my view.
Cool, thanks to both of you. I will keep things the way they are, and
look at what indexes can be added.
More information about the Patchwork