instruction storage exception handling

Thu Oct 28 23:42:36 AEDT 2021

Excerpts from Nicholas Piggin's message of October 28, 2021 7:35 pm:
> Excerpts from Jacques de Laval's message of October 28, 2021 7:08 pm:
>> On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin at gmail.com> wrote:
>>> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>>>
>>> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy at csgroup.eu wrote:
>>> >
>>> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>>> > >
>>> > > > You're right. In that case it shouldn't change anything unless there
>>> > > >
>>> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
>>> > > >
>>> > > > on the NIP and instructions, it looks like it's probably got the correct
>>> > > >
>>> > > > user address that it's storing into vmf on the stack, so it has got past
>>> > > >
>>> > > > the access checks so my theory would be wrong anyway.
>>> > > >
>>> > > > Must be something simple but I can't see it yet.
>>> > >
>>> > > Anyway, I think it is still worth doing the check with setting 0 in
>>> > >
>>> > > _ESR(r11), maybe the Reference Manual is wrong.
>>> > >
>>> > > So Jacques, please do the test anyway if you can.
>>> > >
>>> > > Thanks
>>> > >
>>> > > Christophe
>>> >
>>> > I tested with the last patch from Nicholas, and with that I can not
>>> >
>>> > reproduce the issue, so it seems like that solves it for my case and setup
>>> >
>>> > at least.
>>> >
>>> > Big thanks Christophe and Nicholas for looking in to this!
>>>
>>> Thanks for reporting and testing. We can certainly send this patch
>>>
>>> upstream to fix the regression, but I'm still not exactly sure what is
>>>
>>> going on. If it is an errata or part of specification we missed that
>>>
>>> could explain it but it would be good to understand and comment it.
>>>
>>> If you have time to test again with only the following patch applied,
>>>
>>> it might give a better clue. This patch should keep running but it
>>>
>>> would print some dmesg output.
>>>
>>> Thanks,
>>>
>>> Nick
>>>
>>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>>
>>> index a8d0ce85d39a..cf56f23ff90a 100644
>>>
>>> --- a/arch/powerpc/mm/fault.c
>>>
>>> +++ b/arch/powerpc/mm/fault.c
>>>
>>> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>>>
>>> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>>>
>>> {
>>>
>>> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>>>
>>> -         if (regs->dsisr != 0) {
>>>
>>>
>>> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>>>
>>>
>>> -         	regs->dsisr = 0; // fix?
>>>
>>>
>>> -         }
>>>
>>>
>>> -   }
>>>
>>>     __do_page_fault(regs);
>>>
>>>     }
>>>
>> 
>> As expected it keeps running. The output, and number of prints is naturally
>> a bit different from time to time, but dsisr/ESR is always 0x800000.
>> 
>> Here's a representative output from one run:
>> 
>> 	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
>> 	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
>> 	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000
> 
> Great, thanks for testing that is interesting.
> 
> Looking a bit more,
> 
> https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf
> 
> This is the manual for e500 family including e5500.
> 
> Table 8-4. Interrupt Summary by IVOR shows ISI interrupt as affecting 
> ESR register with [PT], this means the PT bit may be set. There is no
> BO bit specified here.
> 
> The architecture (and this manual) says that if an interrupt type 
> affects one of the ESR bits then all others are cleared. However if
> we look at the 4.8.7 Exception Syndrome Register (ESR) definition,
> the PT bit is specified with <E.PT>. This means it is implemented if
> the processor supports the E.PT extension.
> 
> According to this table, the e5500 does not support E.PT.
> https://www.linux-kvm.org/page/E500_virtual_CPU_specification
> 
> So it seems possible that a processor which does not support E.PT and 
> therefore ISI will never set any bits in ESR, will not zero the ESR
> when it takes an ISI intrrupt, without violating the specification.
> 
> It looks like that is what is happening here, ESR is being left from
> a previous store DSI.

Actually it could be another explanation, it takes a instruction 
TLB error interrupt first which always sets no ESR bits, and then
it jumps to the ISI handler without clearing ESR.

I have not actually worked out why it is causing an infinite loop,
it doesn't seem to be the main fault handler, maybe it is some
low level TLB or PTE code, I can't get a kernel to boot under
quemu due to some bug, but a 64-bit kernel does not cause the problem if 
you manually add ESR_DST into dsisr on instruction faults. Don't really
need to know that exactly though, it's clearly a bug.

Thanks,
Nick