[v3 05/24] mm: thp: handle split failure in zap_pmd_range()

David Hildenbrand (Arm) david at kernel.org
Tue Mar 31 02:09:35 AEDT 2026


On 3/30/26 16:13, Kiryl Shutsemau wrote:
> On Thu, Mar 26, 2026 at 07:08:47PM -0700, Usama Arif wrote:
>> zap_pmd_range() splits a huge PMD when the zap range doesn't cover the
>> full PMD (partial unmap).  If the split fails, the PMD stays huge.
>> Falling through to zap_pte_range() would dereference the huge PMD entry
>> as a PTE page table pointer.
>>
>> Skip the range covered by the PMD on split failure instead.
> 
> Ughh... This is hacky as hell.
> 
>> The skip is safe across all call paths into zap_pmd_range():
>>
>> - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so
>>   every PMD is fully covered (next - addr == HPAGE_PMD_SIZE).  The
>>   zap_huge_pmd() branch handles these without splitting.  The split
>>   failure path is unreachable.
>>
>> - munmap / mmap overlay: vma_adjust_trans_huge() (called from
>>   __split_vma) splits any PMD straddling the VMA boundary before the
>>   VMA is split.  If that PMD split fails, __split_vma() returns
>>   -ENOMEM and the munmap is aborted before reaching zap_pmd_range().
>>   The split failure path is unreachable.
>>
>> - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it.
>>   The pages remain valid and accessible.  A subsequent access returns
>>   existing data without faulting.
> 
> Em, no. MADV_DONTNEED users expect memory to be zeroed after the
> "advise" is complete. At very least you need to zero the skipped range.

Fully agreed. This definitely needs more thought :)

-- 
Cheers,

David


More information about the Linuxppc-dev mailing list