[PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages

Christophe Leroy christophe.leroy at csgroup.eu
Fri Jun 19 00:19:33 AEST 2020



Le 18/06/2020 à 03:00, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy at csgroup.eu> writes:
>> Le 17/06/2020 à 16:38, Peter Zijlstra a écrit :
>>> On Thu, Jun 18, 2020 at 12:21:22AM +1000, Michael Ellerman wrote:
>>>> Peter Zijlstra <peterz at infradead.org> writes:
>>>>> On Mon, Jun 15, 2020 at 12:57:59PM +0000, Christophe Leroy wrote:
>>>
>>>>>> +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>>>>>> +#define __HAVE_ARCH_PTEP_GET
>>>>>> +static inline pte_t ptep_get(pte_t *ptep)
>>>>>> +{
>>>>>> +	pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0};
>>>>>> +
>>>>>> +	return pte;
>>>>>> +}
>>>>>> +#endif
>>>>>
>>>>> Would it make sense to have a comment with this magic? The casual reader
>>>>> might wonder WTH just happened when he stumbles on this :-)
>>>>
>>>> I tried writing a helpful comment but it's too late for my brain to form
>>>> sensible sentences.
>>>>
>>>> Christophe can you send a follow-up with a comment explaining it? In
>>>> particular the zero entries stand out, it's kind of subtle that those
>>>> entries are only populated with the right value when we write to the
>>>> page table.
>>>
>>> static inline pte_t ptep_get(pte_t *ptep)
>>> {
>>> 	unsigned long val = READ_ONCE(ptep->pte);
>>> 	/* 16K pages have 4 identical value 4K entries */
>>> 	pte_t pte = {val, val, val, val);
>>> 	return pte;
>>> }
>>>
>>> Maybe something like that?
>>
>> This should work as well. Indeed nobody cares about what's in the other
>> three. They are only there to ensure that ptep++ increases the ptep
>> pointer by 16 bytes. Only the HW require 4 identical values, that's
>> taken care of in set_pte_at() and pte_update().
> 
> Right, but it seems less error-prone to have the in-memory
> representation match what we have in the page table (well that's
> in-memory too but you know what I mean).
> 
>> So we should use the most efficient. Thinking once more, maybe what you
>> propose is the most efficient as there is no need to load another
>> register with value 0 in order to write it in the stack.
> 
> On 64-bit I'd say it makes zero difference, the only thing that's going
> to matter is the load from ptep->pte. I don't know whether that's true
> on the 8xx cores though.

On 8xx core, loading a register with value 0 will take one cycle unless 
there is some bubble left by another instruction (like a load from 
memory or a taken branch). But that's in the noise.

Christophe


More information about the Linuxppc-dev mailing list