[v3 00/24] mm: thp: lazy PTE page table allocation at PMD split time

Usama Arif usama.arif at linux.dev
Thu Apr 9 22:48:14 AEST 2026



On 08/04/2026 20:49, Matthew Wilcox wrote:
> On Wed, Apr 08, 2026 at 04:06:29PM +0100, Usama Arif wrote:
>> On 06/04/2026 00:34, Hugh Dickins wrote:
>>> What would help a lot would be the implementation of swap entries
>>> at the PMD level.  Whether that would help enough, I'm sceptical:
>>> I do think it's foolish to depend upon the availability of huge
>>> contiguous swap extents, whatever the recent improvements there;
>>> but it would at least be an arguable justification.
>>>
>> Thanks for pointing this out. I should have thought of this as I
>> have been thinking about fork a lot for 1G THP and for this series.
>>
>> I am working on trying to make PMD level swap entires work. I hope
>> to have a RFC soon.
> 
> I think you may have missed Hugh's point a little bit.  If we do
> support PMD-level swap entries, that means we have to be able to find
> contiguous space in the swap space for 512 entries.  I don't know how
> hard that will be, but I can imagine it's not that easy.

Ah so my understanding is that with CONFIG_THP_SWAP enabled, the swap
allocator already tries to allocate 512 contiguous swap slots for a THP.
With CONFIG_THP_SWAP, each swap cluster is exactly SWAPFILE_CLUSTER (512)
entries in size, meaning 2M will fit perfectly. Clusters track their
allocation order (ci->order), and the swap allocator maintains per-order
free lists (nonfull_clusters[order]), so THP-order allocations are
directed to clusters already dedicated to that order rather than
competing with base-page allocations.
The per-CPU caching (percpu_swap_cluster.si[order] / offset[order])
should further ensure that consecutive THP swap-outs from the same CPU
reuse the same cluster efficiently.

With PMD swap entry we will change how the page table records it
(1 PMD entry vs 512 PTE entries). Hence we wont need to allocate
page tables and would help to address Hugh's valid concern
of have to allocate pagetables if there is no pagetable depost.




More information about the Linuxppc-dev mailing list