[PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

Thu Jan 14 23:09:18 AEDT 2021

Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>> 
>> 
>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>> The page fault handling still has some complex logic particularly around
>>> hash table handling, in asm. Implement this in C instead.
>>> 
>>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>>> ---
>>>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>   arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>>   arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>>   arch/powerpc/mm/fault.c                       |  46 ++++--
>>>   4 files changed, 107 insertions(+), 148 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> index 066b1d34c7bc..60a669379aa0 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>   #define HPTE_NOHPTE_UPDATE	0x2
>>>   #define HPTE_USE_KERNEL_KEY	0x4
>>>   
>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>>   extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>   			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>   			  unsigned long flags, int ssize, int subpage_prot);
>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>>> index 6e53f7638737..bcb5e81d2088 100644
>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>    *
>>>    * Handling:
>>>    * - Hash MMU
>>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>    *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>>    *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>>> - *   backed by Linux page tables.
>>> + *   backed by Linux page table entries.
>>>    *
>>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>>> - *   kernel mode due to user copy operations of course.
>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>>> + *   copy operations of course.
>>>    *
>>>    *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>    *   MMU context, which may cause a DSI in the host, which must go to the
>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>>   	GEN_COMMON data_access
>>>   	ld	r4,_DAR(r1)
>>>   	ld	r5,_DSISR(r1)
>> 
>> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
>> - It would be more similar to other arches
> 
> Other sub-archs?
> 
>> - Would avoid doing it also in instruction fault
> 
> True but it's hidden under an unlikely branch so won't really help 
> instruction fault.
> 
>> - Would avoid that -1 return which looks more like a hack.
> 
> I don't really see it as a hack, we return a code to asm caller to
> direct whether to restore registers or not, we alrady have this
> pattern.
> 
> (I'm hoping all that might be go away one day by conrolling NV
> regs from C if we can get good code generation but even if not we
> still have it in the interrupt returns).
> 
> That said I will give it a try here. At very least it might be a
> better intermediate step.

Ah yes, this way doesn't work well for later patches because you end
e.g., with the do_break call having to call the interrupt handler
wrappers again when they actually expect to be in the asm entry state
(e.g., irq soft-mask state) when called, and return via interrupt_return
after the exit wrapper runs (which 64s uses to implement better context
tracking for example).

That could possibly be hacked up to deal with multiple interrupt 
wrappers per interrupt, but I'd rather not go backwards.

That does leave the other sub archs as having this issue, but they don't 
do so much in their handlers. 32 doesn't have soft-mask or context 
tracking to deal with for example. We will need to fix this up though 
and unify things more.

Thanks,
Nick