[RFC PATCH powerpc] Fix a lazy irq related WARING in arch_local_irq_restore()
Li Zhong
zhong at linux.vnet.ibm.com
Mon Oct 22 18:23:19 EST 2012
On Thu, 2012-10-18 at 10:54 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2012-09-26 at 18:10 +0800, Li Zhong wrote:
>
> ../...
>
> Sorry got distracted, got back on this patch today:
>
> > > We might need to "sanitize" the enable state in the PACA before we
> > > actually enter NAP or in the return from NAP code, like we do for normal
> > > idle code...
> >
> > Hi Ben,
> >
> > After some further reading of the code, I updated the code as following,
> > but I'm not very sure, and guess it most probably has some issues ...
> > Could you please help to review and give your comments?
>
> I think it's still not right, see below
>
> > In extended_cede_processor(), if there is still lazy irq pending, I used
> > local_irq_enable() to make sure the irq replayed and flags cleared, but
> > I'm not sure whether it is a proper way.
>
> Right but that will break things for idle. In idle, if you have
> re-enabled interrupts, you need to return and essentially abort the
> attempt at going to nap mode, because the interrupt might have set need
> resched. That's why we normally just check if something's pending and
> return, letting the upper levels re-enable interrupts and do all the
> dirty work for us.
>
> Now, hotplug might differ here in what it needs, but in any case,
> extended_cede_processor doesn't seem to be the right place to handle it,
> at best that function should return if it thinks there's something that
> needs to be done and let the upper layers deal with it appropriately.
>
> > In pseries_mach_cpu_die(), I added local_irq_disable() after cede, and
> > prepare for the start_secondary_resume(), but I'm not sure whether we
> > also need a hard_irq_disable() here.
>
> You probably do if it's going to go back to the start secondary path. It
> shouldn't hurt in any case as long as start_secondary_resume()
> eventually does a local_irq_enable().
>
> > I'm still a little confused by the meaning of PACA_IRQ_HARD_DIS in
> > irq_happened. From the checking at the warning point, it seems only
> > irq_happened equaling 0x1(PACA_IRQ_HARD_DIS) means hard irqs are
> > disabled.
Hi Ben,
Below are my current understandings and a few more questions, please
correct me if there are any misunderstandings. Thank you.
> No. They are disabled if any bit in there corresponding to a "level"
> interrupt is set as well. Only the "edge" interrupts (and decrementer
> which we treat as edge and reset to a high value when it kicks) are
> ignored for the sake of HW irq state.
>From the code, it seems that the hardware_interrupt/external_input
(corresponding to PACA_IRQ_EE) are "level" interrupts. For "level"
interrupts, is it because we will see it again, so we need to hard
disable? or else we might enter into an infinite loop?
> The reason we have this IRQ_HARD_DIS bit is to indicate that a 'manual'
> hard disabling occurred (by opposition to one happening as a result of
> an external interrupt).
The external interrupt causing the hard disabling, is done in the code
of masked_##_H##interrupt: for book3s, or masked_interrupt_book3e
PACA_IRQ_EE 1 for book3e ?
> We need that so we can avoid doing an mfspr() in local_irq_enable() and
> entirely rely on the content of irq_happened to know whether interrupts
> are hard enabled or hard disabled.
Is this about the code in arch_local_irq_restore()? so if (!
irq_happended), we could return directly as we know it is hard enabled.
And here
if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
__hard_irq_disable();
We don't need to hard disable if (irq_happened == PACA_IRQ_HARD_DIS), as
we know it is hard disabled ('manual').
Then here, can we save a few more mtmsrd by also checking PACA_IRQ_EE
bit? like following:
- if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
+ if (unlikely(!(irq_happened & (PACA_IRQ_HARD_DIS | PACA_IRQ_EE))))
> We do that because mfspr is a fairly expensive instruction. But that
> means that we need to make sure we always have a consistent content in
> irq_happened. That's also why I've added all those sanity checks if you
> enable IRQ tracing.
See.
> > Is it possible to set this bit at anyplace the hard irqs are disabled,
> > so then we could check whether this bit is set to know whether hard irqs
> > are disabled? Then it seems that in MASKED_INTERRUPT, we need set this
> > bit where MSR_EE is cleared for something other than decrementer. Maybe
> > I missed too much things?
>
> Either that bit or PACA_IRQ_EE. Both indicate that interrupts are hard
> disabled.
>
> There are some rare cases where do do change MSR:EE without touching
> those bits, only when we're going to restore it shortly afterward in the
> kernel asm exception entry/exit path for example.
I don't get it very clearly here. I might need some more time to read
and understand all the related asm codes.
Currently, it seems to me in EXCEPTION_COMMON, SOFT_DISABLE_INTS is
called to set PACA_IRQ_HARD_DIS, and other bits might be set when
__SOFTEN_TEST (or masked_interrupt_book3e) is called. And in the
exception exit path, something like
SOFT_DISABLE_INTS, .restore_interrupts, restore_check_irq_replay, etc
are called to handle the irq_happened bits.
Thanks, Zhong
> Cheers,
> Ben.
>
> > Thanks, Zhong
> >
> > ===================
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index 64c97d8..b5f7597 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -130,6 +130,8 @@ static void pseries_mach_cpu_die(void)
> > extended_cede_processor(cede_latency_hint);
> > }
> >
> > + local_irq_disable();
> > +
> > if (!get_lppaca()->shared_proc)
> > get_lppaca()->donate_dedicated_cpu = 0;
> > get_lppaca()->idle = 0;
> > diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
> > index 13e8cc4..07560d8 100644
> > --- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
> > +++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
> > @@ -2,6 +2,7 @@
> > #define _PSERIES_PLPAR_WRAPPERS_H
> >
> > #include <linux/string.h>
> > +#include <linux/irqflags.h>
> >
> > #include <asm/hvcall.h>
> > #include <asm/paca.h>
> > @@ -41,7 +42,19 @@ static inline long extended_cede_processor(unsigned long latency_hint)
> > u8 old_latency_hint = get_cede_latency_hint();
> >
> > set_cede_latency_hint(latency_hint);
> > +
> > + while (!prep_irq_for_idle()) {
> > + local_irq_enable();
> > + local_irq_disable();
> > + }
> > +
> > rc = cede_processor();
> > +#ifdef CONFIG_TRACE_IRQFLAGS
> > + /* Ensure that H_CEDE returns with IRQs on */
> > + if (WARN_ON(!(mfmsr() & MSR_EE)))
> > + __hard_irq_enable();
> > +#endif
> > +
> > set_cede_latency_hint(old_latency_hint);
> >
> > return rc;
> > ===================
> >
> >
> > >
> > > Cheers,
> > > Ben.
> > >
> > > > [ 56.618846] WARNING: at arch/powerpc/kernel/irq.c:240
> > > > [ 56.618851] Modules linked in: rcutorture ipv6 dm_mod ext3 jbd mbcache sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt ibmveth
> > > > [ 56.618883] NIP: c00000000000ff94 LR: c00000000067a5e0 CTR: 0000000000000001
> > > > [ 56.618889] REGS: c0000001fef6bbe0 TRAP: 0700 Not tainted (3.6.0-rc1-autokern1)
> > > > [ 56.618894] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42000082 XER: 20000000
> > > > [ 56.618913] SOFTE: 1
> > > > [ 56.618916] CFAR: c00000000067a5dc
> > > > [ 56.618920] TASK = c0000001feed79a0[0] 'swapper/5' THREAD: c0000001fef68000 CPU: 5
> > > > GPR00: 0000000000000001 c0000001fef6be60 c000000000f9ca08 0000000000000001
> > > > GPR04: 0000000000000001 0000000000000008 0000000000000001 0000000000000000
> > > > GPR08: 0000000000000000 c0000001feed79a0 0008a80000000000 0000000000000000
> > > > GPR12: 0000000022000082 c00000000f330f00 c0000001fef6bf90 000000000f394b4c
> > > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > > GPR24: c000000000fe8f80 0000000000000008 0000000000000028 0000000000000000
> > > > GPR28: 0000000000000000 0000000000000020 c000000000f1ab40 0000000000000001
> > > > [ 56.619014] NIP [c00000000000ff94] .arch_local_irq_restore+0x34/0xa0
> > > > [ 56.619020] LR [c00000000067a5e0] .start_secondary+0x368/0x37c
> > > > [ 56.619025] Call Trace:
> > > > [ 56.619030] [c0000001fef6be60] [c000000001ba0500] 0xc000000001ba0500 (unreliable)
> > > > [ 56.619038] [c0000001fef6bed0] [c00000000067a5e0] .start_secondary+0x368/0x37c
> > > > [ 56.619046] [c0000001fef6bf90] [c000000000009380] .start_secondary_resume+0x10/0x14
> > > > [ 56.619052] Instruction dump:
> > > > [ 56.619056] f8010010 f821ff91 986d022a 2fa30000 419e0054 880d022b 78000621 41820048
> > > > [ 56.619071] 2f800001 40de0064 7c0000a6 78008fe2 <0b000000> 2fa00000 40de0050 38000000
> > > > [ 56.619088] ---[ end trace 0199c0d783d7f9ba ]---
> > > >
> > > > Reported-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > Signed-off-by: Li Zhong <zhong at linux.vnet.ibm.com>
> > > > ---
> > > > arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 ++
> > > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > > > index 64c97d8..8de539a 100644
> > > > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > > > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > > > @@ -137,6 +137,8 @@ static void pseries_mach_cpu_die(void)
> > > > if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
> > > > unregister_slb_shadow(hwcpu);
> > > >
> > > > + __hard_irq_disable();
> > > > +
> > > > /*
> > > > * Call to start_secondary_resume() will not return.
> > > > * Kernel stack will be reset and start_secondary()
> > >
> > >
> >
>
>
More information about the Linuxppc-dev
mailing list