[PATCH 08/41] mm: introduce CONFIG_PER_VMA_LOCK

Thu Jan 12 03:28:49 AEDT 2023

On Wed, Jan 11, 2023 at 2:03 AM David Laight <David.Laight at aculab.com> wrote:
>
> From: Ingo Molnar
> > Sent: 11 January 2023 09:54
> >
> > * Michal Hocko <mhocko at suse.com> wrote:
> >
> > > On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > > > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso <dave at stgolabs.net> wrote:
> > > > >
> > > > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > > > >
> > > > > >This configuration variable will be used to build the support for VMA
> > > > > >locking during page fault handling.
> > > > > >
> > > > > >This is enabled by default on supported architectures with SMP and MMU
> > > > > >set.
> > > > > >
> > > > > >The architecture support is needed since the page fault handler is called
> > > > > >from the architecture's page faulting code which needs modifications to
> > > > > >handle faults under VMA lock.
> > > > >
> > > > > I don't think that per-vma locking should be something that is user-configurable.
> > > > > It should just be depdendant on the arch. So maybe just remove CONFIG_PER_VMA_LOCK?
> > > >
> > > > Thanks for the suggestion! I would be happy to make that change if
> > > > there are no objections. I think the only pushback might have been the
> > > > vma size increase but with the latest optimization in the last patch
> > > > maybe that's less of an issue?
> > >
> > > Has vma size ever been a real problem? Sure there might be a lot of those
> > > but your patch increases it by rwsem (without the last patch) which is
> > > something like 40B on top of 136B vma so we are talking about 400B in
> > > total which even with wild mapcount limits shouldn't really be
> > > prohibitive. With a default map count limit we are talking about 2M
> > > increase at most (per address space).
> > >
> > > Or are you aware of any specific usecases where vma size is a real
> > > problem?

Well, when fixing the cacheline bouncing problem in the initial design
I was adding 44 bytes to 152-byte vm_area_struct (CONFIG_NUMA enabled)
and pushing it just above 192 bytes while allocating these structures
from cache-aligned slab (keeping the lock in a separate cacheline to
prevent cacheline bouncing). That would use the whole 256 bytes per
VMA and it did make me nervous. The current design with no need to
cache-align vm_area_structs and with 44-byte overhead trimmed down to
16 bytes seems much more palatable.

> >
> > 40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:
> >
> >   + int vm_lock_seq;
> >   + struct rw_semaphore lock;
> >
> > So it's +44 bytes.

Correct.

>
> Depend in whether vm_lock_seq goes into a padding hole or not
> it will be 40 or 48 bytes.
>
> But if these structures are allocated individually (not an array)
> then it depends on how may items kmalloc() fits into a page (or 2,4).

Yep. Depends on how we arrange the fields.

Anyhow. Sounds like the overhead of the current design is small enough
to remove CONFIG_PER_VMA_LOCK and let it depend only on architecture
support?
Thanks,
Suren.

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>