[PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP
Yin, Fengwei
fengwei.yin at intel.com
Wed Jan 31 21:43:38 AEDT 2024
On 1/31/2024 6:30 PM, David Hildenbrand wrote:
> On 31.01.24 03:30, Yin Fengwei wrote:
>>
>>
>> On 1/29/24 22:32, David Hildenbrand wrote:
>>> +static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm,
>>> + unsigned long addr, pte_t *ptep, unsigned int nr, int full)
>>> +{
>>> + pte_t pte, tmp_pte;
>>> +
>>> + pte = ptep_get_and_clear_full(mm, addr, ptep, full);
>>> + while (--nr) {
>>> + ptep++;
>>> + addr += PAGE_SIZE;
>>> + tmp_pte = ptep_get_and_clear_full(mm, addr, ptep, full);
>>> + if (pte_dirty(tmp_pte))
>>> + pte = pte_mkdirty(pte);
>>> + if (pte_young(tmp_pte))
>>> + pte = pte_mkyoung(pte);
>> I am wondering whether it's worthy to move the pte_mkdirty() and
>> pte_mkyoung()
>> out of the loop and just do it one time if needed. The worst case is
>> that they
>> are called nr - 1 time. Or it's just too micro?
>
> I also thought about just indicating "any_accessed" or "any_dirty" using
> flags to the caller, to avoid the PTE modifications completely. Felt a
> bit micro-optimized.
>
> Regarding your proposal: I thought about that as well, but my assumption
> was that dirty+young are "cheap" to be set.
>
> On x86, pte_mkyoung() is setting _PAGE_ACCESSED.
> pte_mkdirty() is setting _PAGE_DIRTY | _PAGE_SOFT_DIRTY, but it also has
> to handle the saveddirty handling, using some bit trickery.
>
> So at least for pte_mkyoung() there would be no real benefit as far as I
> can see (might be even worse). For pte_mkdirty() there might be a small
> benefit.
>
> Is it going to be measurable? Likely not.
Yeah. We can do more investigation when performance profiling call this
out.
Regards
Yin, Fengwei
>
> Am I missing something?
>
> Thanks!
>
More information about the Linuxppc-dev
mailing list