[PATCH] powerpc: Emulate most Book I instructions in emulate_step()

Thu Jun 3 10:47:58 EST 2010

On Wed, Jun 02, 2010 at 07:45:27AM -0500, Kumar Gala wrote:

> Why do we need to have emu support for all of these instructions?

Fair question.  This arose in the context of the support for data
breakpoint events in perf_events.  Since the data breakpoint facility
on our processors (DABR on server, DAC/DVC on Book 3E) interrupts
before doing the access, we have to execute the instruction that
caused the breakpoint without the data breakpoint set, then put the
data breakpoint back and carry on.

The interesting case comes when the interrupt occurs on a
lwarx/ldarx.  If we just single-step it, we'll lose the reservation
and most likely get into an infinite loop, making no progress.   So we
have two alternatives: either try to arrange that we can single-step
the lwarx and get to the stwcx without losing the reservation, or
emulate the lwarx and all the instructions up to and including the
stwcx.

The first alternative seemed pretty fragile to me since it means that
we have to arrange that we can single-step and take data breakpoints
without using any spinlocks, mutexes or atomic ops (including
bitops).  Also, the architecture says that some embedded
implementations might clear the reservation on taking an interrupt
(which presumably could include debug interrupts).

The second alternative -- emulating the lwarx/stwcx and all the
instructions in between -- sounds complicated but turns out to be
pretty straightforward in fact, since the code for each instruction is
pretty small, easy to verify that it's correct, and has little
interaction with other code.

Note that we have to do this emulation both for the kernel and for
user code, since a data breakpoint event could occur in the kernel or
in usermode.  While we can constrain what occurs between lwarx/stwcx
in the kernel pretty tightly, userspace is not so well constrained, so
I though it best to do all the integer ops that can be done reasonably
easily and can occur in C code.

The other thing I want to do is use this to replace the alignment
fixup code, since they're doing very similar things now.  That will
need little-endian support plus implementing the rest of the Altivec
and VSX loads and stores, along with dcbz, l/stswi, l/stswx, etc.

Finally, emulating should be faster than single-stepping, and so
extending the set of emulated instructions should improve the
performance of kprobes and uprobes.

Paul.