[PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C

Fri Jan 15 00:17:59 AEDT 2021

Excerpts from Christophe Leroy's message of January 14, 2021 10:25 pm:
> 
> 
> Le 14/01/2021 à 13:09, Nicholas Piggin a écrit :
>> Excerpts from Nicholas Piggin's message of January 14, 2021 1:24 pm:
>>> Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
>>>>
>>>>
>>>> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>>>>> The page fault handling still has some complex logic particularly around
>>>>> hash table handling, in asm. Implement this in C instead.
>>>>>
>>>>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>>>>> ---
>>>>>    arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>>>>    arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>>>>    arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>>>>    arch/powerpc/mm/fault.c                       |  46 ++++--
>>>>>    4 files changed, 107 insertions(+), 148 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> index 066b1d34c7bc..60a669379aa0 100644
>>>>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>>>>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>>>>    #define HPTE_NOHPTE_UPDATE	0x2
>>>>>    #define HPTE_USE_KERNEL_KEY	0x4
>>>>>    
>>>>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>>>>    extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>>>>    			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>>>>    			  unsigned long flags, int ssize, int subpage_prot);
>>>>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>>>>> index 6e53f7638737..bcb5e81d2088 100644
>>>>> --- a/arch/powerpc/kernel/exceptions-64s.S
>>>>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>>>>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>>>>     *
>>>>>     * Handling:
>>>>>     * - Hash MMU
>>>>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>>>>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>>>>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>>>>     *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>>>>     *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>>>>> - *   backed by Linux page tables.
>>>>> + *   backed by Linux page table entries.
>>>>>     *
>>>>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>>>>> - *   kernel mode due to user copy operations of course.
>>>>> + *   If no entry is found the Linux page fault handler is invoked (by
>>>>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>>>>> + *   copy operations of course.
>>>>>     *
>>>>>     *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>>>>     *   MMU context, which may cause a DSI in the host, which must go to the
>>>>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>>>>    	GEN_COMMON data_access
>>>>>    	ld	r4,_DAR(r1)
>>>>>    	ld	r5,_DSISR(r1)
>>>>
>>>> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
>>>> - It would be more similar to other arches
>>>
>>> Other sub-archs?
>>>
>>>> - Would avoid doing it also in instruction fault
>>>
>>> True but it's hidden under an unlikely branch so won't really help
>>> instruction fault.
>>>
>>>> - Would avoid that -1 return which looks more like a hack.
>>>
>>> I don't really see it as a hack, we return a code to asm caller to
>>> direct whether to restore registers or not, we alrady have this
>>> pattern.
>>>
>>> (I'm hoping all that might be go away one day by conrolling NV
>>> regs from C if we can get good code generation but even if not we
>>> still have it in the interrupt returns).
>>>
>>> That said I will give it a try here. At very least it might be a
>>> better intermediate step.
>> 
>> Ah yes, this way doesn't work well for later patches because you end
>> e.g., with the do_break call having to call the interrupt handler
>> wrappers again when they actually expect to be in the asm entry state
>> (e.g., irq soft-mask state) when called, and return via interrupt_return
>> after the exit wrapper runs (which 64s uses to implement better context
>> tracking for example).
>> 
>> That could possibly be hacked up to deal with multiple interrupt
>> wrappers per interrupt, but I'd rather not go backwards.
>> 
>> That does leave the other sub archs as having this issue, but they don't
>> do so much in their handlers. 32 doesn't have soft-mask or context
>> tracking to deal with for example. We will need to fix this up though
>> and unify things more.
>> 
> 
> Not sure I understand what you mean exactly.
> 
> On the 8xx, do_break() is called by totally different exceptions:
> - Exception 0x1c00 Data breakpoint ==> do_break()
> - Exception 0x1300 Instruction TLB error ==> handle_page_fault()
> - Exception 0x1400 Data TLB error ==> handle_page_fault()
> 
> On book3s/32, we now (after my patch ie patch 1 in your series ) have either do_break() or 
> handle_page_fault() being called from very early in ASM.
> 
> If you do the same in book3s/64, then there is no issue with interrupt wrappers being called twice, 
> is it ?

bad_page_fault is the problem, it has to go afterwards.

Once we have the changed 64s behaviour of do_page_fault, I don't know if 
there is any point leaving do_break in asm is there? I guess it is neat 
to treat it quite separately, I might need to count fast path branches...
I have done the split anyway already, so I will post your way first.

Thanks,
Nick