[RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

Anshuman Khandual khandual at linux.vnet.ibm.com
Tue Jun 20 20:20:25 AEST 2017


On 06/17/2017 09:22 AM, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 4K backed hpte pages. These bits continue to be used
> for 64K backed hpte pages in this patch, but will be freed
> up in the next patch.

The counting 3, 4, 5 and 6 are in BE format I believe, I was
initially trying to see that from right to left as we normally
do in the kernel and was getting confused. So basically these
bits (which are only applicable for 64K mapping IIUC) are going
to be freed up from the PTE format.

#define _RPAGE_RSV1		0x1000000000000000UL
#define _RPAGE_RSV2		0x0800000000000000UL
#define _RPAGE_RSV3		0x0400000000000000UL
#define _RPAGE_RSV4		0x0200000000000000UL

As you have mentioned before this feature is available for 64K
page size only and not for 4K mappings. So I assume we support
both the combinations.

* 64K mapping on 64K
* 64K mapping on 4K

These are the current users of the above bits

#define H_PAGE_BUSY		_RPAGE_RSV1 /* software: PTE & hash are busy */
#define H_PAGE_F_SECOND		_RPAGE_RSV2	/* HPTE is in 2ndary HPTEG */
#define H_PAGE_F_GIX		(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
#define H_PAGE_HASHPTE		_RPAGE_RPN43	/* PTE has associated HPTE */

> 
> The patch does the following change to the 64K PTE format
> 
> H_PAGE_BUSY moves from bit 3 to bit 9

and what is in there on bit 9 now ? This ?

#define _RPAGE_SW2		0x00400

which is used as 

#define _PAGE_SPECIAL		_RPAGE_SW2 /* software: special page */

which will not be required any more ?

> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> 	of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> 	second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a

Release immediately means we attempt again for a new hash slot ?

> valid slot we discard  and consider it as an invalid
> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.

So we have to see the slot number in the second half for each PTE to
figure out if it has got a valid slot in the hash page table.

> 
> When  we  release  a    hpte   in the 0xF   slot we also   release a
> legitimate primary   slot  and    unmap    that  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.

Okay.

> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.

Why you say that ? I thought every slot number has the same probability
of hit from the hash function.

> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the

How it reduces false hash table updates ?

> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
> 
> 4K PTE format remain unchanged currently.
> 
> Signed-off-by: Ram Pai <linuxram at us.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 20 +++++++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++++++----
>  arch/powerpc/include/asm/book3s/64/hash.h     | 15 +++--
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 ++
>  arch/powerpc/mm/dump_linuxpagetables.c        |  3 +-
>  arch/powerpc/mm/hash64_4k.c                   | 14 ++---
>  arch/powerpc/mm/hash64_64k.c                  | 81 ++++++++++++---------------
>  arch/powerpc/mm/hash_utils_64.c               | 30 +++++++---
>  8 files changed, 122 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..5ef1d81 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,18 @@
>  #define H_PUD_TABLE_SIZE	(sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE	(sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> 
> +
> +/*
> + * Only supported by 4k linux page size
> + */
> +#define H_PAGE_F_SECOND        _RPAGE_RSV2     /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX           (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT     56
> +
> +#define H_PAGE_BUSY	_RPAGE_RSV1     /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE	_RPAGE_RPN43    /* PTE has associated HPTE */
> +
> +

So we moved the common 64K definitions here.


>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
>  			 H_PAGE_F_SECOND | H_PAGE_F_GIX)
> @@ -48,6 +60,14 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
>  }
>  #endif
> 
> +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> +			unsigned int subpg_index, unsigned long slot)
> +{
> +	return (slot << H_PAGE_F_GIX_SHIFT) &
> +		(H_PAGE_F_SECOND | H_PAGE_F_GIX);
> +}

Why we are passing the first 3 arguments of the function if we never
use it inside. Is the caller expected to take care of it ?

> +
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 
>  static inline char *get_hpte_slot_array(pmd_t *pmdp)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 9732837..0eb3c89 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -10,23 +10,25 @@
>   * 64k aligned address free up few of the lower bits of RPN for us
>   * We steal that here. For more deatils look at pte_pfn/pfn_pte()
>   */
> -#define H_PAGE_COMBO	_RPAGE_RPN0 /* this is a combo 4k page */
> -#define H_PAGE_4K_PFN	_RPAGE_RPN1 /* PFN is for a single 4k page */
> +#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
> +#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */

Its the same thing, changes nothing.

> +#define H_PAGE_F_SECOND	_RPAGE_RSV2	/* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX	(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT	56
> +
> +
> +#define H_PAGE_BUSY	_RPAGE_RPN42     /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE	_RPAGE_RPN43    /* PTE has associated HPTE */

H_PAGE_BUSY seems to be differently defined here.

> +
>  /*
>   * We need to differentiate between explicit huge page and THP huge
>   * page, since THP huge page also need to track real subpage details
>   */
>  #define H_PAGE_THP_HUGE  H_PAGE_4K_PFN
> 
> -/*
> - * Used to track subpage group valid if H_PAGE_COMBO is set
> - * This overloads H_PAGE_F_GIX and H_PAGE_F_SECOND
> - */
> -#define H_PAGE_COMBO_VALID	(H_PAGE_F_GIX | H_PAGE_F_SECOND)

H_PAGE_COMBO_VALID is not defined alternately ?

> -
>  /* PTE flags to conserve for HPTE identification */
> -#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
> -			 H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
> +#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
> +

Slot information has moved to the second half, hence _PAGE_HPTEFLAGS
need not carry that.

>  /*
>   * we support 16 fragments per PTE page of 64K size.
>   */
> @@ -74,6 +76,16 @@ static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index)
>  	return (pte_val(rpte.pte) >> H_PAGE_F_GIX_SHIFT) & 0xf;
>  }
> 
> +static inline unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> +		unsigned int subpg_index, unsigned long slot)
> +{
> +	unsigned long *hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
> +
> +	rpte.hidx &= ~(0xfUL << (subpg_index << 2));
> +	*hidxp = rpte.hidx  | (slot << (subpg_index << 2));
> +	return 0x0UL;
> +}

New method to insert the slot information in the second half.

> +
>  #define __rpte_to_pte(r)	((r).pte)
>  extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long index);
>  /*
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index 4e957b0..e7cf03a 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -8,11 +8,8 @@
>   *
>   */
>  #define H_PTE_NONE_MASK		_PAGE_HPTEFLAGS
> -#define H_PAGE_F_GIX_SHIFT	56
> -#define H_PAGE_BUSY		_RPAGE_RSV1 /* software: PTE & hash are busy */
> -#define H_PAGE_F_SECOND		_RPAGE_RSV2	/* HPTE is in 2ndary HPTEG */
> -#define H_PAGE_F_GIX		(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> -#define H_PAGE_HASHPTE		_RPAGE_RPN43	/* PTE has associated HPTE */

Removing the common definitions.

> +
> +#define INIT_HIDX (~0x0UL)
> 
>  #ifdef CONFIG_PPC_64K_PAGES
>  #include <asm/book3s/64/hash-64k.h>
> @@ -160,6 +157,14 @@ static inline int hash__pte_none(pte_t pte)
>  	return (pte_val(pte) & ~H_PTE_NONE_MASK) == 0;
>  }
> 
> +static inline bool hpte_soft_invalid(unsigned long slot)
> +{
> +	return ((slot & 0xfUL) == 0xfUL);
> +}
> +
> +unsigned long get_hidx_gslot(unsigned long vpn, unsigned long shift,
> +		int ssize, real_pte_t rpte, unsigned int subpg_index);
> +
>  /* This low level function performs the actual PTE insertion
>   * Setting the PTE depends on the MMU type and other factors. It's
>   * an horrible mess that I'm not going to try to clean up now but
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 6981a52..cfb8169 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -435,6 +435,11 @@ extern int __hash_page_4K(unsigned long ea, unsigned long access,
>  extern int __hash_page_64K(unsigned long ea, unsigned long access,
>  			   unsigned long vsid, pte_t *ptep, unsigned long trap,
>  			   unsigned long flags, int ssize);
> +extern unsigned long set_hidx_slot(pte_t *ptep, real_pte_t rpte,
> +			unsigned int subpg_index, unsigned long slot);
> +extern unsigned long get_hidx_slot(unsigned long vpn, unsigned long shift,
> +			int ssize, real_pte_t rpte, unsigned int subpg_index);

I wonder what purpose set_hidx_slot() defined previously, served.

> +
>  struct mm_struct;
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap);
>  extern int hash_page_mm(struct mm_struct *mm, unsigned long ea,
> diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c
> index 44fe483..b832ed3 100644
> --- a/arch/powerpc/mm/dump_linuxpagetables.c
> +++ b/arch/powerpc/mm/dump_linuxpagetables.c
> @@ -213,7 +213,7 @@ struct flag_info {
>  		.val	= H_PAGE_4K_PFN,
>  		.set	= "4K_pfn",
>  	}, {
> -#endif
> +#else
>  		.mask	= H_PAGE_F_GIX,
>  		.val	= H_PAGE_F_GIX,
>  		.set	= "f_gix",
> @@ -224,6 +224,7 @@ struct flag_info {
>  		.val	= H_PAGE_F_SECOND,
>  		.set	= "f_second",
>  	}, {
> +#endif /* CONFIG_PPC_64K_PAGES */

Are we adding H_PAGE_F_GIX as an element for 4K mapping ?

>  #endif
>  		.mask	= _PAGE_SPECIAL,
>  		.val	= _PAGE_SPECIAL,
> diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
> index 6fa450c..c673829 100644
> --- a/arch/powerpc/mm/hash64_4k.c
> +++ b/arch/powerpc/mm/hash64_4k.c
> @@ -20,6 +20,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
>  		   pte_t *ptep, unsigned long trap, unsigned long flags,
>  		   int ssize, int subpg_prot)
>  {
> +	real_pte_t rpte;
>  	unsigned long hpte_group;
>  	unsigned long rflags, pa;
>  	unsigned long old_pte, new_pte;
> @@ -54,6 +55,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
>  	 * need to add in 0x1 if it's a read-only user page
>  	 */
>  	rflags = htab_convert_pte_flags(new_pte);
> +	rpte = __real_pte(__pte(old_pte), ptep);
> 
>  	if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
>  	    !cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
> @@ -64,13 +66,10 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
>  		/*
>  		 * There MIGHT be an HPTE for this pte
>  		 */
> -		hash = hpt_hash(vpn, shift, ssize);
> -		if (old_pte & H_PAGE_F_SECOND)
> -			hash = ~hash;
> -		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> -		slot += (old_pte & H_PAGE_F_GIX) >> H_PAGE_F_GIX_SHIFT;
> +		unsigned long gslot = get_hidx_gslot(vpn, shift,
> +						ssize, rpte, 0);

I am wondering why there is a 'g' before the slot in all these
functions.

Its already too much of changes in a single patch. Being a single
logical change it needs to be inside a single change but then we
need much more description in the commit message for some one to
understand what all changed and how.



More information about the Linuxppc-dev mailing list