[RFC PATCH v4 13/16] powerpc/e500: Use contiguous PMD instead of hugepd

Christophe Leroy christophe.leroy at csgroup.eu
Wed May 29 19:58:35 AEST 2024



Le 29/05/2024 à 10:49, Oscar Salvador a écrit :
> [Vous ne recevez pas souvent de courriers de osalvador at suse.com. D?couvrez pourquoi ceci est important ? https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Mon, May 27, 2024 at 03:30:11PM +0200, Christophe Leroy wrote:
>> e500 supports many page sizes among which the following size are
>> implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G.
>>
>> On e500, TLB miss for hugepages is exclusively handled by SW even
>> on e6500 which has HW assistance for 4k pages, so there are no
>> constraints like on the 8xx.
>>
>> On e500/32, all are at PGD/PMD level and can be handled as
>> cont-PMD.
>>
>> On e500/64, smaller ones are on PMD while bigger ones are on PUD.
>> Again, they can easily be handled as cont-PMD and cont-PUD instead
>> of hugepd.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
> 
> ...
> 
>> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
>> index 90d6a0943b35..f7421d1a1693 100644
>> --- a/arch/powerpc/include/asm/nohash/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
>> @@ -52,11 +52,36 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p
>>   {
>>        pte_basic_t old = pte_val(*p);
>>        pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
>> +     unsigned long sz;
>> +     unsigned long pdsize;
>> +     int i;
>>
>>        if (new == old)
>>                return old;
>>
>> -     *p = __pte(new);
>> +#ifdef CONFIG_PPC_E500
>> +     if (huge)
>> +             sz = 1UL << (((old & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20);
>> +     else
> 
> I think this will not compile when CONFIG_PPC_85xx && !CONFIG_PTE_64BIT.

Yes, I got a feedback on this from the robots.

> 
> You have declared _PAGE_HSIZE_MSK and _PAGE_HSIZE_SHIFT in
> arch/powerpc/include/asm/nohash/hugetlb-e500.h.
> 
> But hugetlb-e500.h is only included if CONFIG_PPC_85xx && CONFIG_PTE_64BIT
> (see arch/powerpc/include/asm/nohash/32/pgtable.h).
> 
> 
> 
>> +#endif
>> +             sz = PAGE_SIZE;
>> +
>> +     if (!huge || sz < PMD_SIZE)
>> +             pdsize = PAGE_SIZE;
>> +     else if (sz < PUD_SIZE)
>> +             pdsize = PMD_SIZE;
>> +     else if (sz < P4D_SIZE)
>> +             pdsize = PUD_SIZE;
>> +     else if (sz < PGDIR_SIZE)
>> +             pdsize = P4D_SIZE;
>> +     else
>> +             pdsize = PGDIR_SIZE;
>> +
>> +     for (i = 0; i < sz / pdsize; i++, p++) {
>> +             *p = __pte(new);
>> +             if (new)
>> +                     new += (unsigned long long)(pdsize / PAGE_SIZE) << PTE_RPN_SHIFT;
> 
> I guess 'new' can be 0 if pte_update() is called on behave of clearing the pte?

It is exactly that, and without that verification I had pmd_bad() 
returning bad pmds after freeing page tables.

> 
>> +static inline unsigned long pmd_leaf_size(pmd_t pmd)
>> +{
>> +     return 1UL << (((pmd_val(pmd) & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20);
> 
> Can we have the '20' somewhere defined with a comment on top explaining
> what is so it is not a magic number?
> Otherwise people might come look at this and wonder why 20.

Yes I now have :

+#define _PAGE_HSIZE_MSK (_PAGE_U0 | _PAGE_U1 | _PAGE_U2 | _PAGE_U3)
+#define _PAGE_HSIZE_SHIFT              14
+#define _PAGE_HSIZE_SHIFT_OFFSET       20

and have added a helper to avoid doing the calculation at several places:

+static inline unsigned long pte_huge_size(pte_t pte)
+{
+       pte_basic_t val = pte_val(pte);
+
+       return 1UL << (((val & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 
_PAGE_HSIZE_SHIFT_OFFSET);
+}


> 
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -331,6 +331,37 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>>                __set_huge_pte_at(pmdp, ptep, pte_val(pte));
>>        }
>>   }
>> +#elif defined(CONFIG_PPC_E500)
>> +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>> +                  pte_t pte, unsigned long sz)
>> +{
>> +     unsigned long pdsize;
>> +     int i;
>> +
>> +     pte = set_pte_filter(pte, addr);
>> +
>> +     /*
>> +      * Make sure hardware valid bit is not set. We don't do
>> +      * tlb flush for this update.
>> +      */
>> +     VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>> +
>> +     if (sz < PMD_SIZE)
>> +             pdsize = PAGE_SIZE;
>> +     else if (sz < PUD_SIZE)
>> +             pdsize = PMD_SIZE;
>> +     else if (sz < P4D_SIZE)
>> +             pdsize = PUD_SIZE;
>> +     else if (sz < PGDIR_SIZE)
>> +             pdsize = P4D_SIZE;
>> +     else
>> +             pdsize = PGDIR_SIZE;
>> +
>> +     for (i = 0; i < sz / pdsize; i++, ptep++, addr += pdsize) {
>> +             __set_pte_at(mm, addr, ptep, pte, 0);
>> +             pte = __pte(pte_val(pte) + ((unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT));
> 
> You can use pte_advance_pfn() here? Just give have
> 
>   nr = (unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT)
>   pte_advance_pfn(pte, nr)

That's what I did before but it didn't work. The problem is that 
pte_advance_pfn() takes a long not a long long:

static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr)
{
	return __pte(pte_val(pte) + (nr << PFN_PTE_SHIFT));
}

And when I called it with nr = PMD_SIZE / PAGE_SIZE = 2M / 4k = 512, as 
we have PFN_PTE_SHIFT = 24, I got 512 << 24 = 0

> 
> Which 'sz's can we have here? You mentioned that e500 support:
> 
> 4M, 16M, 64M, 256M, 1G.
> 
> which of these ones can be huge?

All are huge.

> 
> 
> --
> Oscar Salvador
> SUSE Labs


More information about the Linuxppc-dev mailing list