[PATCH] powerpc/64s: Fix irq tracing corruption in interrupt/syscall return caused by perf interrupts

Nicholas Piggin npiggin at gmail.com
Thu Jul 23 20:29:25 AEST 2020


Excerpts from Alexey Kardashevskiy's message of July 22, 2020 8:50 pm:
> 
> 
> On 22/07/2020 17:34, Nicholas Piggin wrote:
>> Alexey reports lockdep_assert_irqs_enabled() warnings when stress testing perf, e.g.,
>> 
>> WARNING: CPU: 0 PID: 1556 at kernel/softirq.c:169 __local_bh_enable_ip+0x258/0x270
>> CPU: 0 PID: 1556 Comm: syz-executor
>> NIP:  c0000000001ec888 LR: c0000000001ec884 CTR: c000000000ef0610
>> REGS: c000000022d4f8a0 TRAP: 0700   Not tainted  (5.8.0-rc3-x)
>> MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28008844  XER: 20040000
>> CFAR: c0000000001dc1d0 IRQMASK: 0
>> 
>> The interesting thing is MSR[EE] and IRQMASK shows interrupts are enabled,
>> suggesting the current->hardirqs_enabled irq tracing state is going out of sync
>> with the actual interrupt enable state.
>> 
>> The cause is a window in interrupt/syscall return where irq tracing state is being
>> adjusted for an irqs-enabled return while MSR[EE] is still enabled. A perf
>> interrupt hits and ends up calling trace_hardirqs_off() when restoring
>> interrupt flags to a disable state.
>> 
>> Fix this by disabling perf interrupts as well while adjusting irq tracing state.
>> 
>> Add a debug check that catches the condition sooner.
>> 
>> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
>> Reported-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>> 
>> I can reproduce similar symptoms and this patch fixes my test case,
>> still trying to confirm Alexey's test case or whether there's another
>> similar bug causing it.
> 
> 
> This does not fix my testcase. I applied this on top of 4fa640dc5230
> ("Merge tag 'vfio-v5.8-rc7' of git://github.com/awilliam/linux-vfio into
> master")  without any of my testing code, just to be clear. Sorry...

Okay it seems to be a bigger problem and not actually caused by that
patch but was possible for lockdep hardirqs_enabled state to get out 
of synch with the local_irq_disable() state before that too. Root
cause is similar -- perf interrupts hitting between updating the two
different bits of state.

Not quite sure why Alexey's test wasn't hitting it before the patch,
but possibly the way masked interrupts get replayed. But I was able
to hit the problem with a different assertion.

I think I have a fix, but it seems to be a generic irq tracing code
issue. So this patch can be dropped, and it's not an urgent issue for
the next release (it only triggers warns on rare occasions and only
when lockdep is enabled).

Thanks,
Nick



More information about the Linuxppc-dev mailing list