[RFC PATCH 0/2] powerpc: CR based local atomic operation implementation

Thu Dec 18 20:52:37 AEDT 2014

From: Rusty Russell
> David Laight <David.Laight at ACULAB.COM> writes:
> > From: Madhavan Srinivasan [mailto:maddy at linux.vnet.ibm.com]
> > ...
> >> >>> I also wonder if it is possible to inspect the interrupted
> >> >>> code to determine the start/end of the RAS block.
> >> >>> (Easiest if you assume that there is a single 'write' instruction
> >> >>> as the last entry in the block.)
> >> >>>
> >> >> So each local_* function also have code in the __ex_table section. IIUC,
> >> >> __ex_table contains two address. So if the return address found in the
> >> >> first column of the _ex_table, use the corresponding address in the
> >> >> second column to continue from.
> >> >
> >> > That really doesn't scale.
> >> > I don't know how many 1000 address pairs you table will have (and the
> >> > ones in each loadable module), but the search isn't going to be cheap.
> >> >
> >> > If these sequences are restartable then they can only have one write
> >> > to memory.
...
> >> 2) resulting code with lot of condition and branch (for opcode decode)
> >> will be lot messy and may be an issue incase of maintenance,
> >
> > You don't need to decode the instructions.
> > Just look for the two specific instructions used as markers.
> > This is only really possible with fixed-size instructions.
> >
> > It might also be that the 'interrupt entry' path is easier to
> > modify than the 'interrupt exit' one (fewer code paths) and
> > you just need to modify the 'pc' in the stack frame.
> > You are only interested in interrupts from kernel space.
> 
> It's an overoptimization for case that statistically never happens.
> You won't even be able to measure the difference.
> 
> The question of bloat remains, but that's also easily measured.  In
> practice, I'd guess less than 1k.

IIRC they were 'static inline' so the table of addresses is generated
for every use site.
(copyin/out generates a similarly enormous table of addresses on amd64)

If they were real functions (so only appeared once) it wouldn't be as bad.
Indeed, in that case, by putting all such functions into a separate code
section a simple 'window test' can be done on the return address instead
of reserving one of the CR bits.

You also only need to save the start and end of each block, not the
restart address for every instruction within the block.

	David