[RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx

Christophe Leroy christophe.leroy at csgroup.eu
Tue Apr 16 05:12:29 AEST 2024



Le 12/04/2024 à 16:30, Peter Xu a écrit :
> On Fri, Apr 12, 2024 at 02:08:03PM +0000, Christophe Leroy wrote:
>>
>>
>> Le 11/04/2024 à 18:15, Peter Xu a écrit :
>>> On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote:
>>>> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote:
>>>>> This series reimplements hugepages with hugepd on powerpc 8xx.
>>>>>
>>>>> Unlike most architectures, powerpc 8xx HW requires a two-level
>>>>> pagetable topology for all page sizes. So a leaf PMD-contig approach
>>>>> is not feasible as such.
>>>>>
>>>>> Possible sizes are 4k, 16k, 512k and 8M.
>>>>>
>>>>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
>>>>> must point to a single entry level-2 page table. Until now that was
>>>>> done using hugepd. This series changes it to use standard page tables
>>>>> where the entry is replicated 1024 times on each of the two pagetables
>>>>> refered by the two associated PMD entries for that 8M page.
>>>>>
>>>>> At the moment it has to look into each helper to know if the
>>>>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or
>>>>> a lower size. I hope this can me handled by core-mm in the future.
>>>>>
>>>>> There are probably several ways to implement stuff, so feedback is
>>>>> very welcome.
>>>>
>>>> I thought it looks pretty good!
>>>
>>> I second it.
>>>
>>> I saw the discussions in patch 1.  Christophe, I suppose you're exploring
>>> the big hammer over hugepd, and perhaps went already with the 32bit pmd
>>> solution for nohash/32bit challenge you mentioned?
>>>
>>> I'm trying to position my next step; it seems like at least I should not
>>> adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks,
>>> or you're going to have an RFC soon then I can base on top?
>>
>> Depends on what you expect by "soon".
>>
>> I sure won't be able to send any RFC before end of April.
>>
>> Should be possible to have something during May.
> 
> That's good enough, thanks.  I'll see what is the best I can do.
> 
> Then do you think I can leave p4d/pgd leaves alone?  Please check the other
> email where I'm not sure whether pgd leaves ever existed for any of
> PowerPC.  That's so far what I plan to do, on teaching pgtable walkers
> recognize pud and lower for all leaves.  Then if Power can switch from
> hugepd to this it should just work.

Well, if I understand correctly, something with no PMD will include 
<asm-generic/pgtable-nopmd.h> and will therefore only have .... pmd 
entries (hence no pgd/p4d/pud entries). Looks odd but that's what it is. 
pgd_populate(), p4d_populate(), pud_populate() are all "do { } while 
(0)" and only pmd_populate exists. So only pmd_leaf() will exist in that 
case.

And therefore including <asm-generic/pgtable-nop4d.h> means .... you 
have p4d entries. Doesn't mean you have p4d_leaf() but that needs to be 
checked.


> 
> Even if pgd exists (then something I overlooked..), I'm wondering whether
> we can push that downwards to be either pud/pmd (and looks like we all
> agree p4d is never used on Power).  That may involve some pgtable
> operations moving from pgd level to lower, e.g. my pure imagination would
> look like starting with:

Yes I think there is no doubt that p4d is never used:

arch/powerpc/include/asm/book3s/32/pgtable.h:#include 
<asm-generic/pgtable-nopmd.h>
arch/powerpc/include/asm/book3s/64/pgtable.h:#include 
<asm-generic/pgtable-nop4d.h>
arch/powerpc/include/asm/nohash/32/pgtable.h:#include 
<asm-generic/pgtable-nopmd.h>
arch/powerpc/include/asm/nohash/64/pgtable-4k.h:#include 
<asm-generic/pgtable-nop4d.h>

But that means that PPC32 have pmd entries and PPC64 have p4d entries ...

> 
> #define PTE_INDEX_SIZE	PTE_SHIFT
> #define PMD_INDEX_SIZE	0
> #define PUD_INDEX_SIZE	0
> #define PGD_INDEX_SIZE	(32 - PGDIR_SHIFT)
> 
> To:
> 
> #define PTE_INDEX_SIZE	PTE_SHIFT
> #define PMD_INDEX_SIZE	(32 - PMD_SHIFT)
> #define PUD_INDEX_SIZE	0
> #define PGD_INDEX_SIZE	0

But then you can't anymore have #define PTRS_PER_PMD 1 from 
<asm-generic/pgtable-nop4d.h>

> 
> And the rest will need care too.  I hope moving downward is easier
> (e.g. the walker should always exist for lower levels but not always for
> higher levels), but I actually have little idea on whether there's any
> other implications, so please bare with me on stupid mistakes.
> 
> I just hope pgd leaves don't exist already, then I think it'll be simpler.
> 
> Thanks,
> 


More information about the Linuxppc-dev mailing list