[PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's
Nicholas Piggin
npiggin at gmail.com
Mon Jul 24 21:25:33 AEST 2017
On Mon, 24 Jul 2017 14:28:02 +1000
Benjamin Herrenschmidt <benh at kernel.crashing.org> wrote:
> Instead of comparing the whole CPU mask every time, let's
> keep a counter of how many bits are set in the mask. Thus
> testing for a local mm only requires testing if that counter
> is 1 and the current CPU bit is set in the mask.
>
> Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> ---
> arch/powerpc/include/asm/book3s/64/mmu.h | 3 +++
> arch/powerpc/include/asm/mmu_context.h | 9 +++++++++
> arch/powerpc/include/asm/tlb.h | 11 ++++++++++-
> arch/powerpc/mm/mmu_context_book3s64.c | 2 ++
> 4 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 1a220cdff923..c3b00e8ff791 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -83,6 +83,9 @@ typedef struct {
> mm_context_id_t id;
> u16 user_psize; /* page size index */
>
> + /* Number of bits in the mm_cpumask */
> + atomic_t active_cpus;
> +
> /* NPU NMMU context */
> struct npu_context *npu_context;
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index ff1aeb2cd19f..cf8f50cd4030 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
> struct mm_struct *mm) { }
> #endif
>
> +#ifdef CONFIG_PPC_BOOK3S_64
> +static inline void inc_mm_active_cpus(struct mm_struct *mm)
> +{
> + atomic_inc(&mm->context.active_cpus);
> +}
> +#else
> +static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
> +#endif
This is a bit awkward. Can we just move the entire function to test
cpumask and set / increment into helper functions and define them
together with mm_is_thread_local, so it's all in one place?
The extra atomic does not need to be defined when it's not used either.
Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
If it's <= then it should be similar load and compare, no?
Looks like a good optimisation though.
Thanks,
Nick
More information about the Linuxppc-dev
mailing list