[RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE

Nicholas Piggin npiggin at gmail.com
Wed Aug 4 15:14:22 AEST 2021


Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
> With shared mapping, even though we are unmapping a large range, the kernel
> will force a TLB flush with ptl lock held to avoid the race mentioned in
> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
> This results in the kernel issuing a high number of TLB flushes even for a large
> range. This can be improved by making sure the kernel switch to pid based flush if the
> kernel is unmapping a 2M range.

It would be good to have a bit more description here.

In any patch that changes a heuristic like this, I would like to see 
some justification or reasoning that could be refuted or used as a 
supporting argument if we ever wanted to change the heuristic later.
Ideally with some of the obvious downsides listed as well.

This "improves" things here, but what if it hurt things elsewhere, how 
would we come in later and decide to change it back?

THP flushes for example, I think now they'll do PID flushes (if they 
have to be broadcast, which they will tend to be when khugepaged does
them). So now that might increase jitter for THP and cause it to be a
loss for more workloads.

So where do you notice this? What's the benefit?

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..21d0f098e43b 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>  
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>  	if (fullmm)
>  		flush_pid = true;
>  	else if (type == FLUSH_TYPE_GLOBAL)
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;

Arguably >= is nicer than > here, but this shouldn't be in the same 
patch as the value change.

>  	else
>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;

And it should change everything to be consistent. Although I'm not sure 
it's worth changing even though I highly doubt any administrator would
be tweaking this.

Thanks,
Nick

>  	/*
> @@ -1335,7 +1335,7 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
>  	if (fullmm)
>  		flush_pid = true;
>  	else if (type == FLUSH_TYPE_GLOBAL)
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>  	else
>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
>  
> @@ -1505,7 +1505,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
>  			continue;
>  
>  		nr_pages = (end - start) >> def->shift;
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>  
>  		/*
>  		 * If the number of pages spanning the range is above
> -- 
> 2.31.1
> 
> 


More information about the Linuxppc-dev mailing list