[PATCH] Fix performance monitor exception in 2.6.20-series
Livio Soares
livio at eecg.toronto.edu
Mon Jan 15 04:56:34 EST 2007
Hi Ben,
First, I'd like to state that I have, since writting my first e-mail,
experimented with Oprofile on 2.6.20-rc4, and it _is_ affected as I theorized. I
get something around 5 to 7 PMU exceptions, and no more. With my patch,
exceptions keep coming as before the lazy IRQ patch.
Benjamin Herrenschmidt writes:
>
> > IMHO, option #1 is very nice, as long as the PMU interrupt handler behaves
> > itself. One reason option #1 is desirable is, with PC-sampling, we are now able
> > to sample regions _inside_ interrupt-disabled sections (assuming an actual
> > external interrupt hasn't really occured yet). Before, with hardware disabling
> > of interrupts, the PMU exceptions were necessarily delivered outside of
> > interrupt disabled sections.
> >
> > Anyways, does anyone see a problem with the following patch?
>
> Well, are you absolutely sure that nothing will break as a result of
> having a PMU interrupt happening right when it's not expected to ?
>
> You are basically turning the PMU interrupt into an NMI... I'm not sure
> how safe that is.
Yes, it is turning the PMU exception into an NMI. And, you are correct, it has
potential for problems. However, if you look closely through the current
Oprofile code it doesn't seem to execute anything dangerous. We have:
a) Looking at local CPU registers
b) Looking at current stack (when logging backtrace is enabled)
c) Writting information to a per-CPU pre-allocated buffer. This is done without
any form of locking.
d) PMU exception nesting cannot occur (at least on the PowerPC machines I've
looked at). Handling must 'rfid' before the PMU can deliver another
exception.
So, unless I missed something, the current code seems to be safe.
Another thing I tried was stress testing 2.6.20-rc4 with my patch and Oprofile
turned on. I used an Apache2 benchmark for about 30 minutes. Everything worked
as usual. I realize this test does not guarantee the safeness of the code,
however, it served as a sanity check for obvious, easy to trigger bugs.
Thanks,
Livio
More information about the Linuxppc-dev
mailing list