[PATCH mm-unstable v1 3/5] kvm/arm64: add kvm_arch_test_clear_young()

Thu Feb 23 16:25:41 AEDT 2023

On Fri, Feb 17, 2023 at 9:00 AM Sean Christopherson <seanjc at google.com> wrote:
>
> On Fri, Feb 17, 2023, Oliver Upton wrote:
> > Hi Yu,
> >
> > scripts/get_maintainers.pl is your friend for getting the right set of
> > emails for a series :) Don't know about others, but generally I would
> > prefer to be Cc'ed on an entire series (to gather context) than just an
> > individual patch.
>
> +1
>
> >
> > On Thu, Feb 16, 2023 at 09:12:28PM -0700, Yu Zhao wrote:
> > > This patch adds kvm_arch_test_clear_young() for the vast majority of
> > > VMs that are not pKVM and run on hardware that sets the accessed bit
> > > in KVM page tables.
>
> At least for the x86 changes, please read Documentation/process/maintainer-tip.rst
> and rewrite the changelogs.

I see -- will remove "this patch".

> > > It relies on two techniques, RCU and cmpxchg, to safely test and clear
> > > the accessed bit without taking the MMU lock. The former protects KVM
> > > page tables from being freed while the latter clears the accessed bit
> > > atomically against both the hardware and other software page table
> > > walkers.
> > >
> > > Signed-off-by: Yu Zhao <yuzhao at google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h       |  7 +++
> > >  arch/arm64/include/asm/kvm_pgtable.h    |  8 +++
> > >  arch/arm64/include/asm/stage2_pgtable.h | 43 ++++++++++++++
> > >  arch/arm64/kvm/arm.c                    |  1 +
> > >  arch/arm64/kvm/hyp/pgtable.c            | 51 ++--------------
> > >  arch/arm64/kvm/mmu.c                    | 77 ++++++++++++++++++++++++-
> > >  6 files changed, 141 insertions(+), 46 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 35a159d131b5..572bcd321586 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1031,4 +1031,11 @@ static inline void kvm_hyp_reserve(void) { }
> > >  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> > >  bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu);
> > >
> > > +/* see the comments on the generic kvm_arch_has_test_clear_young() */
>
> Please eliminate all of these "see the comments on blah", in every case they do
> nothing more than redirect the reader to something they're likely already aware of.
>
> > > +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young
> > > +static inline bool kvm_arch_has_test_clear_young(void)
> > > +{
> > > +   return IS_ENABLED(CONFIG_KVM) && cpu_has_hw_af() && !is_protected_kvm_enabled();
> > > +}
>
> ...
>
> > Also, I'm at a loss for why we'd need to test if CONFIG_KVM is enabled.
> > My expectation is that we should provide an implementation that returns
> > false if !CONFIG_KVM, avoiding the need to repeat that bit in every
> > single implementation of the function.
>
> mm/vmscan.c uses kvm_arch_has_test_clear_young().  I have opinions on that, but
> I'll hold off on expressing them until there's actual justification presented
> somewhere.
>
> Yu, this series and each patch needs a big pile of "why".  I get that the goal
> is to optimize memory oversubscribe, but there needs to be justification for
> why this is KVM only, why nested VMs and !A/D hardware are out of scope, why yet
> another mmu_notifier hook is being added, etc.

I totally agree.

This is an optimization, not a bug fix. It can't be justified without
performance numbers from some common use cases. That burden of proof
clearly rests on me -- I will follow up on that.

For now, I want to make sure the methodical part is clear:
1. We only have limited resources and we need to prioritize major use cases.
2. We can only improve one thing at a time and we can't cover
everything at the same time.
3. We need to focus on the return on investment and the future.

I hope everyone by now agrees with my "the vast majority of VMs ..."
assertion. If not, I'm happy to revisit that [1]. If so, the next step
would be whether we want to focus on the vast majority first. I think
this naturally answers why the nested VMs and !AD h/w are out of
scope, at the moment (I didn't spell this out; probably I should in
v2). After we have taken the first step, we probably can decide
whether there is enough resource and demand to cover the low return on
investment part (but complexity but less common use cases).

[1] https://lore.kernel.org/linux-mm/20230217041230.2417228-1-yuzhao@google.com/