power and percpu: Could we move the paca into the percpu area?

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu Jun 12 06:22:11 EST 2014


On Wed, 2014-06-11 at 14:37 -0500, Christoph Lameter wrote:
> Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
> comes from a local_paca field and local_paca is in r13. That means that
> for all percpu operations we first have to determine the address through a
> memory access.
> 
> Would it be possible to put the paca at the beginning of the percpu data
> area and then have r31 point to the percpu area?
> 
> power has these nice instructions that fetch from an offset relative to a
> base register which could be used throughout for percpu operations in the
> kernel (similar to x86 segment registers).
> 
> With that we may also be able to use the atomic ops for fast percpu access
> so that we can avoid the irq enable/disable sequence that is now required
> for percpu atomics. Would result in fast and reliable percpu
> counters for powerpc.

So.... this is complicated :) And it's something I did want to tackle
for a while but haven't had a chance.

The issues off the top of my head are:

 - The PACA must be accessible in real mode, which means that when
running under a hypervisor, it must be allocated in the "RMA" which is
the low part of memory up to a limit that depends on the hypervisor, but
can be as low as 128M on some older machines.

 - However, we use percpu more than paca in normal kernel C code, the
PACA is mostly used during exception entry/exit, KVM, and for interrupt
soft-enable/disable. So it might make sense to change things so that r13
contains the per-cpu offset instead. However, doing that change and
updating the asm to cope isn't a trivial undertaking.

 - Direct offset from r13 in asm ... works as long as the offset is
within the signed 32k range. Otherwise we need at least one more addis
instruction. Anton mentioned the linker may have some smarts however for
removing that addis if the high part of the offset happens to be 0.

 - For atomics, the jury is still out as to whether it would be faster
or not. The atomic ops (lwarx/stwcx.) are expensive. They flush the
value out of the L1 (to L2) among others. On the other hand we have
interrupts soft-disable so masking interrupts isn't very expensive.
Unmasking, while cheap, is currently out of line however. I have been
wondering if we could move some of the soft-irq state instead to a CR
field and mark that -ffixed with gcc so we can make irq
soft-disable/enable even faster and more in-line.

Cheers,
Ben.




More information about the Linuxppc-dev mailing list