[PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

Tue Jan 10 19:04:18 AEDT 2023

On 1/9/23 21:53, Suren Baghdasaryan wrote:
> rw_semaphore is a sizable structure of 40 bytes and consumes
> considerable space for each vm_area_struct. However vma_lock has
> two important specifics which can be used to replace rw_semaphore
> with a simpler structure:
> 1. Readers never wait. They try to take the vma_lock and fall back to
> mmap_lock if that fails.
> 2. Only one writer at a time will ever try to write-lock a vma_lock
> because writers first take mmap_lock in write mode.
> Because of these requirements, full rw_semaphore functionality is not
> needed and we can replace rw_semaphore with an atomic variable.
> When a reader takes read lock, it increments the atomic unless the
> value is negative. If that fails read-locking is aborted and mmap_lock
> is used instead.
> When writer takes write lock, it resets atomic value to -1 if the
> current value is 0 (no readers). Since all writers take mmap_lock in
> write mode first, there can be only one writer at a time. If there
> are readers, writer will place itself into a wait queue using new
> mm_struct.vma_writer_wait waitqueue head. The last reader to release
> the vma_lock will signal the writer to wake up.
> vm_lock_seq is also moved into vma_lock and along with atomic_t they
> are nicely packed and consume 8 bytes, bringing the overhead from
> vma_lock from 44 to 16 bytes:
> 
>     slabinfo before the changes:
>      <name>            ... <objsize> <objperslab> <pagesperslab> : ...
>     vm_area_struct    ...    152   53    2 : ...
> 
>     slabinfo with vma_lock:
>      <name>            ... <objsize> <objperslab> <pagesperslab> : ...
>     rw_semaphore      ...      8  512    1 : ...

I guess the cache is called vma_lock, not rw_semaphore?

>     vm_area_struct    ...    160   51    2 : ...
> 
> Assuming 40000 vm_area_structs, memory consumption would be:
> baseline: 6040kB
> vma_lock (vm_area_structs+vma_lock): 6280kB+316kB=6596kB
> Total increase: 556kB
> 
> atomic_t might overflow if there are many competing readers, therefore
> vma_read_trylock() implements an overflow check and if that occurs it
> restors the previous value and exits with a failure to lock.
> 
> Signed-off-by: Suren Baghdasaryan <surenb at google.com>

This patch is indeed an interesting addition indeed, but I can't help but
think it obsoletes the previous one :) We allocate an extra 8 bytes slab
object for the lock, and the pointer to it is also 8 bytes, and requires an
indirection. The vma_lock cache is not cacheline aligned (otherwise it would
be a major waste), so we have potential false sharing with up to 7 other
vma_lock's.
I'd expect if the vma_lock was placed with the relatively cold fields of
vm_area_struct, it shouldn't cause much cache ping pong when working with
that vma. Even if we don't cache align the vma to save memory (would be 192
bytes instead of 160 when aligned) and place the vma_lock and the cold
fields at the end of the vma, it may be false sharing the cacheline with the
next vma in the slab. But that's a single vma, not up to 7, so it shouldn't
be worse?