[PATCH v2] powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes

Michael Ellerman mpe at ellerman.id.au
Thu Nov 2 22:39:44 AEDT 2023

Matthew Wilcox <willy at infradead.org> writes:
> On Tue, Oct 24, 2023 at 08:06:04PM +0530, Aneesh Kumar K.V wrote:
>>  		ptep++;
>> -		pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>>  		addr += PAGE_SIZE;
>> +		/*
>> +		 * increment the pfn.
>> +		 */
>> +		pte = pfn_pte(pte_pfn(pte) + 1, pte_pgprot((pte)));
> when i looked at this, it generated shit code.  did you check?

I didn't look ...

<goes and looks>

It's not super clear cut. There's some difference because pfn_pte()
contains two extra VM_BUG_ONs.

But with DEBUG_VM *off* the version using pfn_pte() generates *better*
code, or at least less code, ~160 instructions vs ~200.

For some reason the version using PTE_RPN_SHIFT seems to be byte
swapping the pte an extra two times, each of which generates ~8
instructions. But I can't see why.

I tried a few other things and couldn't come up with anything that
generated better code. But I'll keep poking at it tomorrow.


