[PATCH 0/4] Add 'Event.payload' field
Stephen Finucane
stephen at that.guru
Wed Apr 11 05:01:55 AEST 2018
On Thu, 2018-03-29 at 02:36 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen at that.guru> writes:
>
> > On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote:
> > > Stephen Finucane <stephen at that.guru> writes:
> > >
> > > Hi Stephen,
> > >
> > > > The '/event' API endpoint is really slow due to the amount of JOINs
> > > > necessary to retrieve records from the database. Resolve this by
> > > > a static JSON representation of any embedded data in the database. This
> > > > has some disadvantages, noted in the patches, but the performance
> > > > improvement is huge and users will not notice a thing.
> > > >
> > > > As a necessary side-effect of this change, events now require the REST
> > > > API be enabled. To be honest, this should have been the case from day 1
> > > > as events are currently only exposed over this API.
> > >
> > > How does perfomance of this compare to
> > > https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?
> >
> > I've no idea. I know that it removes all but one (I think) JOIN so in
> > theory it should massively improve performance. It's impossible for me
> > to test this stuff as I simply can't get my hands on a big enough
> > archive to test it. If you have an idea on how to test this, I'd
> > appreciate the help.
>
> What I did to test it was to download the Ubuntu kernel team archive -
> https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is
> available from the top of the page and is currently 608MB. You can
> import it multiple times into different projects too, so you can scale
> up 1.2 GB pretty easily. This is what I did for my postgres tests.
>
> I then used Django Debug Toolbar to tell me how long the SQL queries
> were taking and how much time was spent in processing. I just tested
> this on my laptop but it was enough to see the big, measurable
> differences I included in my mail.
>
> > > This is a much bigger change than that series, so if the performance
> > > impact is similar perhaps the smaller, migration-free change would be
> > > preferable?
> >
> > From what I'm hearing, these large JOINs are generally killing us and
> > only get worse the larger the instance is. I was envisioning the above
> > series as something that would fix performance for 2.0, while this as a
> > longer term solution than the above series that could be applied for
> > 2.1. As noted above though, I need some way to test this so any help
> > here would be appreciated.
>
> On much reflection, my concern with this series is how we would make a
> change to the events API. As I understand it, we're going to have one
> static blob stored in the db, so if we change what constitues an event
> we're going to get a discontinuity in what's stored in the database.
>
> For example if we add a field to the event - say for the sake of the
> argument the SHA1 the message contents - we'd need to figure something
> out. I can think of a few options, none of which thrill me:
>
> - We unversion the events API and push the burden to consumers to deal
> with changes to time-series data.
>
> - post-process the json coming out of the database on every query. For
> v1 consumers of new data, strip out the new field. For v2 consumers
> of old data, we calculate the new data and supply it (possibly saving
> the combined data so we only have to do this once per event.)
>
> - Or we have to do scary migrations where we read-modify-write all of
> the many, many, previous events.
>
> What would be your plan for dealing with a change to the event model?
I don't really have one, to be honest. However, I've avoided storing
things that will change (urls) and in v2 I am using JSON types for the
database column, where possible. This means we can use the various JSON
features of the database backend to do whatever migration steps are
necessary.
Personally, if we _did_ have to make changes to the events, I would add
a new column, 'payload_b' or similar, which either contain the slightly
modified version of the 'payload' field or would only be populated for
new events. This would mean we could tailor our serializers for various
different versions. If not this, then I'd go for option (a) above
(don't worry about versioning it).
Stephen
> Regards,
> Daniel
> > Stephen
> >
> > > Regards,
> > > Daniel
> > >
> > > >
> > > > Stephen Finucane (4):
> > > > REST: Support embedded serializers without context
> > > > signals: Only enable events when REST API enabled
> > > > models: Migrate event fields to JSON field
> > > > REST: Only fetch required fields event filtering
> > > >
> > > > patchwork/api/embedded.py | 60 ++++++++---
> > > > patchwork/api/event.py | 110 +++++++++++++--------
> > > > patchwork/api/filters.py | 7 ++
> > > > patchwork/fields.py | 32 ++++++
> > > > .../migrations/0025_add_event_payload_field.py | 21 ++++
> > > > ...26_migrate_data_from_event_fields_to_payload.py | 63 ++++++++++++
> > > > .../migrations/0027_remove_old_event_fields.py | 34 +++++++
> > > > patchwork/models.py | 28 +-----
> > > > patchwork/signals.py | 70 +++++++++++--
> > > > patchwork/tests/test_events.py | 110 +++++++++++++--------
> > > > .../events-require-rest-api-47eab4a3be745f75.yaml | 5 +
> > > > 11 files changed, 413 insertions(+), 127 deletions(-)
> > > > create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
> > > > create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
> > > > create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
> > > > create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
> > > >
> > > > --
> > > > 2.14.3
> > > >
> > > > _______________________________________________
> > > > Patchwork mailing list
> > > > Patchwork at lists.ozlabs.org
> > > > https://lists.ozlabs.org/listinfo/patchwork
More information about the Patchwork
mailing list