Feedback requested on switching the exception wrapper used for the PMU interrupt on ppc64

Corey J Ashford cjashfor at us.ibm.com
Fri May 16 12:04:36 EST 2008


Paul Mackerras <paulus at samba.org> wrote on 05/15/2008 06:02:03 PM:

> Corey J Ashford writes:
> 
> > Thanks for the feedback.  I don't believe I need a separate flag, 
because
> > the PMU interrupt (via the PMAO bit) will still be pending when 
interrupts
> > are hard enabled again, and the handler will be reentered 
automatically.
> 
> If that were the case then we wouldn't have had the problem with
> losing PMU interrupts that meant we had to change the PMU interrupt
> handler from a MASKABLE_EXCEPTION to a STD_EXCEPTION.  This was in
> commit 449d846dbcbf61bdf7d50a923e4791102168c292.
> 
> My understanding is that the PMU only requests an interrupt when PMAO
> goes from 0 to 1 (i.e. it's edge-triggered).  If the CPU takes the
> interrupt and then sets MSR.EE again (e.g. by returning from the
> interrupt handler), and PMAO has not been reset to 0, then I don't
> think the PMU requests another interrupt at that point.

I think that is the case on early POWER4 and PPC970 chips where this is no 
PMAO bit.  However, from personal [bad] experience, I can say that if PMAO 
is not cleared in the interrupt handler, the exception will be taken again 
when interrupts are reenabled.  I've seen the system lockup in interrupt 
handling "loops" because it wasn't being cleared.

> 
> > I discovered through some trial and error that get_paca() doesn't work
> > correctly in the interrupt handler.  It appears that the value of r13 
(the
> > PACA pointer) is not initialized.  Perhaps r13 is a scratch register 
used
> > by the compiler?
> 
> Weird.  It definitely should be correct.  Loading r13 is one of the
> first things the interrupt entry code does, and r13 is not a scratch
> register (it's normally the thread-local storage pointer).
> 
> If r13 is bogus then that's definitely a serious bug.

My evidence for this was that I was seeing get_paca()->soft_enable and 
get_paca()->hard_enable set to zero, but regs->softe set to 1.

Looking at the code again, I see that in STD_EXCEPTION_COMMON in 
exception.h, the macro DISABLE_INTS is used, which zeros out the soft and 
hard disable flags.  So r13 is ok, and I think using the regs struct is 
the right way to go.


> 
> > Fortunately, because the soft enable flag is available in the pt_regs
> > structure, and because the hard enable flag will be set to the same as 
the
> > value of regs->msr's MSR_EE flag in the restore code, I now have this 
code
> > in the interrupt handler:
> > 
> > void perfmon_pmu_int_handler(struct pt_regs *regs) {
> > 
> >       if (regs->softe == 0) {
> >             /* disable hardware interrupts */
> >             regs->msr &= ~MSR_EE;
> >             return;
> >       }
> >       ...
> > }
> > 
> > This code does seem to be working, but needs more testing.
> 
> I expect that after a little while you'll stop getting PMU
> interrupts...
> 
> Paul.

I haven't seen that yet, but I am seeing another kernel hang with my test 
case that creates and deals with about 2000 interrupts per second.  After 
a random amount of time running it, it hangs the system.  If I restart the 
system into xmon, I see a callstack that doesn't have any perfmon code in 
it, but does have some kernel interrupt handling code.  The hang is the 
same when using MASKABLE_EXCEPTION_PSERIES and STD_EXCEPTION_PSERIES (+ 
the above fix).

I'd love to be able to run that call stack by you, if you have time to 
look at it.

Thanks,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR 
503-578-3507 
cjashfor at us.ibm.com




More information about the Linuxppc-dev mailing list