[PATCH 0/4] Add 'Event.payload' field

Daniel Axtens dja at axtens.net
Thu Mar 29 02:36:41 AEDT 2018


Stephen Finucane <stephen at that.guru> writes:

> On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote:
>> Stephen Finucane <stephen at that.guru> writes:
>> 
>> Hi Stephen,
>> 
>> > The '/event' API endpoint is really slow due to the amount of JOINs
>> > necessary to retrieve records from the database. Resolve this by
>> > a static JSON representation of any embedded data in the database. This
>> > has some disadvantages, noted in the patches, but the performance
>> > improvement is huge and users will not notice a thing.
>> > 
>> > As a necessary side-effect of this change, events now require the REST
>> > API be enabled. To be honest, this should have been the case from day 1
>> > as events are currently only exposed over this API.
>> 
>> How does perfomance of this compare to
>> https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?
>
> I've no idea. I know that it removes all but one (I think) JOIN so in
> theory it should massively improve performance. It's impossible for me
> to test this stuff as I simply can't get my hands on a big enough
> archive to test it. If you have an idea on how to test this, I'd
> appreciate the help.

What I did to test it was to download the Ubuntu kernel team archive -
https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is
available from the top of the page and is currently 608MB. You can
import it multiple times into different projects too, so you can scale
up 1.2 GB pretty easily. This is what I did for my postgres tests.

I then used Django Debug Toolbar to tell me how long the SQL queries
were taking and how much time was spent in processing. I just tested
this on my laptop but it was enough to see the big, measurable
differences I included in my mail.

>> This is a much bigger change than that series, so if the performance
>> impact is similar perhaps the smaller, migration-free change would be
>> preferable?
>
> From what I'm hearing, these large JOINs are generally killing us and
> only get worse the larger the instance is. I was envisioning the above
> series as something that would fix performance for 2.0, while this as a
> longer term solution than the above series that could be applied for
> 2.1. As noted above though, I need some way to test this so any help
> here would be appreciated.

On much reflection, my concern with this series is how we would make a
change to the events API. As I understand it, we're going to have one
static blob stored in the db, so if we change what constitues an event
we're going to get a discontinuity in what's stored in the database.

For example if we add a field to the event - say for the sake of the
argument the SHA1 the message contents - we'd need to figure something
out. I can think of a few options, none of which thrill me:

 - We unversion the events API and push the burden to consumers to deal
   with changes to time-series data.

 - post-process the json coming out of the database on every query. For
   v1 consumers of new data, strip out the new field. For v2 consumers
   of old data, we calculate the new data and supply it (possibly saving
   the combined data so we only have to do this once per event.)

 - Or we have to do scary migrations where we read-modify-write all of
   the many, many, previous events.

What would be your plan for dealing with a change to the event model?

Regards,
Daniel
> Stephen
>
>> Regards,
>> Daniel
>> 
>> > 
>> > Stephen Finucane (4):
>> >   REST: Support embedded serializers without context
>> >   signals: Only enable events when REST API enabled
>> >   models: Migrate event fields to JSON field
>> >   REST: Only fetch required fields event filtering
>> > 
>> >  patchwork/api/embedded.py                          |  60 ++++++++---
>> >  patchwork/api/event.py                             | 110 +++++++++++++--------
>> >  patchwork/api/filters.py                           |   7 ++
>> >  patchwork/fields.py                                |  32 ++++++
>> >  .../migrations/0025_add_event_payload_field.py     |  21 ++++
>> >  ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
>> >  .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
>> >  patchwork/models.py                                |  28 +-----
>> >  patchwork/signals.py                               |  70 +++++++++++--
>> >  patchwork/tests/test_events.py                     | 110 +++++++++++++--------
>> >  .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
>> >  11 files changed, 413 insertions(+), 127 deletions(-)
>> >  create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
>> >  create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
>> >  create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
>> >  create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
>> > 
>> > -- 
>> > 2.14.3
>> > 
>> > _______________________________________________
>> > Patchwork mailing list
>> > Patchwork at lists.ozlabs.org
>> > https://lists.ozlabs.org/listinfo/patchwork


More information about the Patchwork mailing list