instruction storage exception handling
Christophe Leroy
christophe.leroy at csgroup.eu
Wed Oct 27 18:52:42 AEDT 2021
Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of October 27, 2021 3:51 pm:
>>
>>
>> Le 27/10/2021 à 07:25, Nicholas Piggin a écrit :
>>> Excerpts from Christophe Leroy's message of October 27, 2021 3:00 pm:
>>>>
>>>>
>>>> Le 27/10/2021 à 06:10, Nicholas Piggin a écrit :
>>>>> Excerpts from Jacques de Laval's message of October 26, 2021 6:07 am:
>>>>>> Hi,
>>>>>>
>>>>>> We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
>>>>>> T1023 SOC with two e5500 cores, and are running in 32-bit mode:
>>>>>>
>>>>>> CONFIG_PPC32=y
>>>>>> # CONFIG_PPC64 is not set
>>>>>>
>>>>>> #
>>>>>> # Processor support
>>>>>> #
>>>>>> # CONFIG_PPC_BOOK3S_32 is not set
>>>>>> CONFIG_PPC_85xx=y
>>>>>> # CONFIG_PPC_8xx is not set
>>>>>> # CONFIG_40x is not set
>>>>>> # CONFIG_44x is not set
>>>>>> CONFIG_GENERIC_CPU=y
>>>>>> # CONFIG_E5500_CPU is not set
>>>>>> # CONFIG_E6500_CPU is not set
>>>>>> CONFIG_E500=y
>>>>>> CONFIG_PPC_E500MC=y
>>>>>> CONFIG_PPC_FPU_REGS=y
>>>>>> CONFIG_PPC_FPU=y
>>>>>> CONFIG_FSL_EMB_PERFMON=y
>>>>>> CONFIG_FSL_EMB_PERF_EVENT=y
>>>>>> CONFIG_FSL_EMB_PERF_EVENT_E500=y
>>>>>> CONFIG_BOOKE=y
>>>>>> CONFIG_FSL_BOOKE=y
>>>>>> CONFIG_PPC_FSL_BOOK3E=y
>>>>>> CONFIG_PTE_64BIT=y
>>>>>> CONFIG_PHYS_64BIT=y
>>>>>> CONFIG_PPC_MMU_NOHASH=y
>>>>>> CONFIG_PPC_BOOK3E_MMU=y
>>>>>> # CONFIG_PMU_SYSFS is not set
>>>>>> CONFIG_SMP=y
>>>>>> CONFIG_NR_CPUS=2
>>>>>> CONFIG_PPC_DOORBELL=y
>>>>>> # end of Processor support
>>>>>>
>>>>>> We compile using 32-bit Bootlin PPC toolchain:
>>>>>>
>>>>>> powerpc-e500mc glibc bleeding-edge 2020.08-1.
>>>>>>
>>>>>> When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
>>>>>> process is running, and for debugging purposes our init currently looks like
>>>>>> this:
>>>>>>
>>>>>> int main(int argc, char *argv[]){
>>>>>> for (int i = 0; ; i++) {
>>>>>> FILE *fp = fopen("/dev/kmsg", "w");
>>>>>> if (fp) {
>>>>>> fprintf(fp, "%d\n", i);
>>>>>> fclose(fp);
>>>>>> }
>>>>>> sleep(1);
>>>>>> }
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> When the hangup occur we don't get any output at all from our PID 1.
>>>>>> The last output is from the kernel:
>>>>>>
>>>>>> Run /sbin/init as init process
>>>>>> with arguments:
>>>>>> /sbin/init
>>>>>> with environment:
>>>>>> HOME=/
>>>>>> TERM=linux
>>>>>> kgdboc=ttyS0,115200
>>>>>>
>>>>>> When issuing a backtrace on all active cpus we can see that the kernel is
>>>>>> handling an instruction storage exception:
>>>>>>
>>>>>> sysrq: Show backtrace of all active CPUs
>>>>>> sysrq: CPU0:
>>>>>> CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
>>>>>> NIP: c02aac78 LR: c02aac2c CTR: 00000000
>>>>>> REGS: c1907d40 TRAP: 0500 Not tainted (5.14.11)
>>>>>> MSR: 00029002 <CE,EE,ME> CR: 82244284 XER: 20000000
>>>>>> GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
>>>>>> GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
>>>>>> GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>>> GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
>>>>>> NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
>>>>>> LR [c02aac2c] handle_mm_fault+0xac/0x11f0
>>>>>> Call Trace:
>>>>>> [c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
>>>>>> [c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
>>>>>> [c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
>>>>>> [c1907f10] [c0000988] InstructionStorage+0x108/0x120
>>>>>> --- interrupt: 400 at 0x65a238
>>>>>> NIP: 0065a238 LR: 0052f26c CTR: 0052f260
>>>>>> REGS: c1907f20 TRAP: 0400 Not tainted (5.14.11)
>>>>>> MSR: 0002d002 <CE,EE,PR,ME> CR: 42242284 XER: 00000000
>>>>>> GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
>>>>>> GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
>>>>>> GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
>>>>>> GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
>>>>>> NIP [0065a238] 0x65a238
>>>>>> LR [0052f26c] 0x52f26c
>>>>>> --- interrupt: 400
>>>>>> Instruction dump:
>>>>>> 60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
>>>>>> 90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
>>>>>>
>>>>>> We have also observed that the CPU is continuously servicing the same interrupt
>>>>>> (north of 140k times per sec), it is not deadlocked.
>>>>>>
>>>>>> We have not yet been able to reproduce this behavior under QEMU system
>>>>>> emulation.
>>>>>>
>>>>>> When bisecting between 5.10 and 5.14.11 we can see that this behavior started
>>>>>> with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
>>>>>>
>>>>>> powerpc: remove arguments from fault handler functions
>>>>>
>>>>> Thank you for the excellent work to investigate and report this.
>>>>>
>>>>>>
>>>>>> Our best guess that the instruction storage exception is not properly handled
>>>>>> and the kernel is never able to recover from the page fault, but we don't
>>>>>> really know how to proceed. Does anyone have any suggestions or insights?
>>>>>
>>>>> Before my patch, zero was passed to the page fault error code, after
>>>>> my patch it passes the contents of ESR SPR.
>>>>>
>>>>> It looks like the BookE instruction access interrupt does not set ESR
>>>>> (except for BO interrupts maybe?) so you're getting what was in the ESR
>>>>> register from a previous interrupt, and maybe if that was a store then
>>>>> access_error won't cause a segfault because is_exec is true so that
>>>>> test bails out early, then it might just keep retrying the interrupt.
>>>>>
>>>>> That could explain why you don't always see the same thing.
>>>>>
>>>>> Now previous code still saved ESR in regs->esr/dsisr for some reason.
>>>>> I can't quite see why that should have been necessary though. Does
>>
>> According to the e500 Reference Manual, on ISI:
>>
>> BO is set if the instruction fetch caused a byte-ordering exception;
>> otherwise cleared. *All* other defined ESR bits are *cleared*.
>
> You're right. In that case it shouldn't change anything unless there
> was a BO fault. I'm not sure what the problem is then. Guessing based
> on the NIP and instructions, it looks like it's probably got the correct
> user address that it's storing into vmf on the stack, so it has got past
> the access checks so my theory would be wrong anyway.
>
> Must be something simple but I can't see it yet.
>
Anyway, I think it is still worth doing the check with setting 0 in
_ESR(r11), maybe the Reference Manual is wrong.
So Jacques, please do the test anyway if you can.
Thanks
Christophe
More information about the Linuxppc-dev
mailing list