[RFC PATCH 0/2] powerpc: CR based local atomic operation implementation

Rusty Russell rusty at rustcorp.com.au
Thu Dec 18 21:53:56 AEDT 2014


David Laight <David.Laight at ACULAB.COM> writes:
> From: Rusty Russell
>> David Laight <David.Laight at ACULAB.COM> writes:
>> > From: Madhavan Srinivasan [mailto:maddy at linux.vnet.ibm.com]
>> > ...
>> >> >>> I also wonder if it is possible to inspect the interrupted
>> >> >>> code to determine the start/end of the RAS block.
>> >> >>> (Easiest if you assume that there is a single 'write' instruction
>> >> >>> as the last entry in the block.)
>> >> >>>
>> >> >> So each local_* function also have code in the __ex_table section. IIUC,
>> >> >> __ex_table contains two address. So if the return address found in the
>> >> >> first column of the _ex_table, use the corresponding address in the
>> >> >> second column to continue from.
>> >> >
>> >> > That really doesn't scale.
>> >> > I don't know how many 1000 address pairs you table will have (and the
>> >> > ones in each loadable module), but the search isn't going to be cheap.
>> >> >
>> >> > If these sequences are restartable then they can only have one write
>> >> > to memory.
> ...
>> >> 2) resulting code with lot of condition and branch (for opcode decode)
>> >> will be lot messy and may be an issue incase of maintenance,
>> >
>> > You don't need to decode the instructions.
>> > Just look for the two specific instructions used as markers.
>> > This is only really possible with fixed-size instructions.
>> >
>> > It might also be that the 'interrupt entry' path is easier to
>> > modify than the 'interrupt exit' one (fewer code paths) and
>> > you just need to modify the 'pc' in the stack frame.
>> > You are only interested in interrupts from kernel space.
>> 
>> It's an overoptimization for case that statistically never happens.
>> You won't even be able to measure the difference.
>> 
>> The question of bloat remains, but that's also easily measured.  In
>> practice, I'd guess less than 1k.
>
> IIRC they were 'static inline' so the table of addresses is generated
> for every use site.
> (copyin/out generates a similarly enormous table of addresses on amd64)

There are about 20 callers in the entire kernel.

Cheers,
Rusty.


More information about the Linuxppc-dev mailing list