[PATCH] models: Add Series.is_open

Stephen Finucane stephen at that.guru
Thu Sep 23 21:46:16 AEST 2021


On Thu, 2021-09-23 at 12:43 +0100, Stephen Finucane wrote:
> On Thu, 2021-09-23 at 12:40 +0100, Stephen Finucane wrote:
> > On Sun, 2021-09-05 at 11:34 +1000, Daniel Axtens wrote:
> > > Stephen Finucane <stephen at that.guru> writes:
> > > 
> > > > Start making series more useful by tracking whether the state is open,
> > > > meaning one or more patches are in a state with action_required=True, or
> > > > not. This means that the 'git-pw series list' command will only list
> > > > series that are actually actionable.
> > > > 
> > > > Signed-off-by: Stephen Finucane <stephen at that.guru>
> > > > ---
> > > > Open questions:
> > > > - Is there a more efficient way to do the migration than I've done?
> > > > - Should we expose an 'is_open' boolean attribute of a 'state' (open,
> > > >   closed) attribute for the series views?
> > > > - Is there a race condition possible in how I've implemented the series
> > > >   state checks?
> > > 
> > > AIUI, series.is_open is a derived property (or I suppose in db terms a
> > > 'view') of series.patches.
> > > 
> > > Is the cost of computing it 'live' so bad that we need to cache it in
> > > the db? (a 'materialised view'[1])
> > > 
> > > Is there any chance we could just always compute on demand? I'm trying
> > > to think of all the ways we expose series:
> > > 
> > >  - on the patch list page, we already also fetch all the patches
> > > 
> > >  - on the API /series/ page, we also fetch all the patches even for just
> > >    the list view.
> > > 
> > > so it seems to me we probably shouldn't suffer a performance hit for
> > > doing it live.
> > 
> > The biggest impact I see is filtering. This is the main thing I want because
> > presently 'git pw series list' is pretty much useless without this. I don't
> > doing this dynamically would work because attempting to fetch, for example, the
> > first 100 "closed" series will require loading every series and every patch. You
> > couldn't just load the first 100 since they could be entirely/mostly "open",
> > meaning you'd need to load another 100, and another, and another, until you get
> > 100 closed. This will need to be stored _somewhere_ fwict.
> > 
> > > [1] https://www.datacamp.com/community/tutorials/materialized-views-postgresql
> > > and noting that materialised views are not supported natively in mysql -
> > > but you've basically done https://linuxhint.com/materialized-views-mysql
> > > with the refresh logic in the application layer rather than a stored
> > > procedure.
> > > 
> > > > diff --git patchwork/models.py patchwork/models.py
> > > > index 58e4c51e..7441510b 100644
> > > > --- patchwork/models.py
> > > > +++ patchwork/models.py
> > > > @@ -14,6 +14,7 @@ from django.conf import settings
> > > >  from django.contrib.auth.models import User
> > > >  from django.core.exceptions import ValidationError
> > > >  from django.db import models
> > > > +from django.db import transaction
> > > >  from django.urls import reverse
> > > >  from django.utils.functional import cached_property
> > > >  from django.core.validators import validate_unicode_slug
> > > > @@ -494,6 +495,19 @@ class Patch(SubmissionMixin):
> > > >          for tag in tags:
> > > >              self._set_tag(tag, counter[tag])
> > > >  
> > > > +    def refresh_series_state(self):
> > > > +        if not self.series:
> > > > +            return
> > > > +
> > > > +        # If any of the patches in the series is in an "action required" state,
> > > > +        # then the series is still open
> > > > +        # TODO(stephenfin): Can this still race?
> > > > +        with transaction.atomic():
> > > > +            self.series.is_open = self.series.patches.filter(
> > > > +                state__action_required=True
> > > > +            ).exists()
> > > > +            self.series.save()
> > > > +
> > > 
> > > AIUI based some experimentation and my experience dealing with our
> > > series race conditions in 2018, this can race and transactions won't
> > > save you here.  The transaction will roll back all the changes if any
> > > statement fails (e.g. if you had an integrityerror from a foreign key),
> > > and other users won't see intermediate writes from your
> > > transaction. What it won't do is preserve the data dependency you're
> > > creating in the way a traditional mutex would.
> > > 
> > > Basically, consider a concurrent parsemail and an API client changing a
> > > patch state and refreshing series state on the same series:
> > > 
> > > 
> > >  | parsemail                    | API client                    |
> > >  ----------------------------------------------------------------
> > >  |                              | series.patch[2].save()        | 
> > >  |                              | BEGIN TRANSACTION             |
> > >  |                              | is_open = SELECT(...) = False |
> > >  | series.patch[3].save()       |                               |
> > >  | BEGIN TRANSACTION            |                               |
> > >  | is_open = SELECT(...) = True |                               |
> > >  | UPDATE series...             |                               |
> > >  | COMMIT TRANSACTION           |                               |
> > >  |                              | UPDATE series...              |
> > >  |                              | COMMIT TRANSACTION            |
> > > 
> > > both transactions can complete successfully and the end result is
> > > wrong. (You can play with this sort of thing with two concurrent
> > > dbshells both updating a commit_ref in a transaction, which I found very
> > > helpful in getting the details right.)
> > > 
> > > Not materialising the view avoids persisting a result where we've lost
> > > the race.
> > 
> > Ack, yeah, this makes sense. I need to investigate these materialized views
> > things, but assuming they're not an option with MySQL and SQLite then I guess
> > I'll need to find a way to lock the whole thing in a mutex.
> 
> It looks like 'select_for_update()' [1] will potentially do the trick? So long
> as I grab this lock before I create or update a patch, I should be okay? Will
> experiment.
> 
> Stephen
> 
> [1] https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-for-update

...and both MySQL and Postgres support generated columns, which is what we
effectively have here, but at least MySQL only works only on columns of the same
row of the same table [1]

Stephen

[1] https://stackoverflow.com/q/38448583/




More information about the Patchwork mailing list