[PATCH v5 0/4] shoot lazy tlbs

Tue Nov 9 15:11:15 AEDT 2021

Since v4, this fixes a kthread_use_mm refcounting bug and adds some
comments in code and changelogs around the kthread_use_mm change in
patch 1 (due to akpm's comment -- thanks).

It also adds and improves comments in code, changelogs, and Kconfig
options. The overall design is unchanged though. Please merge.

This series has suffered some issues getting agreement, so I would
like to address a few sticking points or misconceptions up front,
which hopefully can result in constructive disagreement and actual
actionable feedback.

* That the lazy mm scheme is complicated or bug prone.

  This is not true, the concept is trivial and core code is extremely
  simple and basically unchanged since Linus' active_mm email 20 years
  ago in 2.3 days.

  This series leaves the lazy tlb switching and ->active_mm semantics
  entirely unchanged. It does change the refcounting, but the effects
  are hidden under wrappers. It does not add anything new for code
  outside those few places to think about except that they must specify
  _lazy_mm when refcounting this particular type of reference. This
  is not much of a problem since lazy mm references never "escape"
  from specific switching sequences and become hard to track. Refs
  that go into the wider world are always normal ones (i.e., created
  by explicit mmgrab or kthread_use_mm).

* That membarrier code is complicated

  This is true. My series changes exactly nothing to do with
  membarriers. My series is entirely about lazy mm, which has been
  virtually unchanged for many years before membarrier.
  membarrier code takes advantage of memory ordering in scheduler
  switch code that lazy mm refcounting was providing, so this series
  adds one commented smp_mb() ifdef there to replace the refcount op
  being removed. That does not affect the ability to change membarrier
  code in future because the refcounted path has to be accounted for
  here anyway.

  In other words, any changes to membarrier code which deal with the
  refcounted lazy mm path that exists today, then dealing with the non
  refcounted option is trivial.

* That active_mm should be removed from core code.

  I don't know how to address this other than it's not a good or well
  thought out idea. This is not happening and is certainly not related
  to my series which does not change ->active_mm semantics at all.

* That this series provides an option for archs to enable which result
  in stale ->active_mm pointers, whereby it is up to the arch to
  ensure nothing dereferences those pointers.

  This is FUD. It has always been false. Archs that enable
  MMU_LAZY_TLB_SHOOTDOWN never ever have stale ->active_mm pointers,
  ever. If active_mm is non-NULL, then that gives exactly the same
  guarantees as you have today.

* That performance of IPIs or other things is a problem.

  I posted actual numbers showing this was not a concern, and listed
  some options that could reduce them further if needed. No numbers
  were ever posted to support the other side of the argument.

* That the series is a powerpc specific thing.

  Untrue. I have trivial sparc and alpha conversions as the first two
  things I looked at which I have SMP qemu environments for.

* That this series somehow prevents future changes or improvements.

  It doesn't.

* That the series is very complex, code is bad or has problems.

  Look at the patches. They seem pretty small and simple to me. I am
  happy to address specific issues that are pointed out though, and
  have done so.

* That x86 is relevant here.

  This patch does not touch or affect x86 in any way. x86 has gone off
  and done its own horrendously complicated and under-documented thing
  with active_mm and the lazy mm concept. But that has been entirely
  hidden from core code by the arch context switching hooks. Core code
  continues to operate on the concept of ->mm and ->active_mm, and this
  series does not change that at all. x86 is no more or less divorced
  from that after the series.

  Nothing the series does constrains x86 or changes to it in future. The
  option can not be used immediately by x86, but there is no reason x86
  could not be adapted to use it, or change their scheme to something
  else entirely. Where code can be adapted to be shared or made usable by
  x86, I have no problem with doing that.

If I've missed something or I've got anything wrong with the above,
I'm happy to hear it.

Thanks,
Nick

Nicholas Piggin (4):
  lazy tlb: introduce lazy mm refcount helper functions
  lazy tlb: allow lazy tlb mm refcounting to be configurable
  lazy tlb: shoot lazies, a non-refcounting lazy tlb option
  powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN

 Documentation/vm/active_mm.rst       |  6 ++++
 arch/Kconfig                         | 32 +++++++++++++++++
 arch/arm/mach-rpc/ecard.c            |  2 +-
 arch/powerpc/Kconfig                 |  1 +
 arch/powerpc/kernel/smp.c            |  2 +-
 arch/powerpc/mm/book3s64/radix_tlb.c |  4 +--
 fs/exec.c                            |  2 +-
 include/linux/sched/mm.h             | 20 +++++++++++
 kernel/cpu.c                         |  2 +-
 kernel/exit.c                        |  2 +-
 kernel/fork.c                        | 51 ++++++++++++++++++++++++++++
 kernel/kthread.c                     | 21 +++++++-----
 kernel/sched/core.c                  | 35 +++++++++++++------
 kernel/sched/sched.h                 |  4 ++-
 14 files changed, 158 insertions(+), 26 deletions(-)

-- 
2.23.0