[PATCH] powerpc/64s: Add support for ASB_Notify on POWER9
christophe lombard
clombard at linux.vnet.ibm.com
Fri Aug 11 06:32:12 AEST 2017
Le 07/08/2017 à 05:08, Michael Ellerman a écrit :
> Hi Christophe,
>
> I'm not across any of the details of this so hopefully most of these
> comments aren't too stupid :)
>
>
> Christophe Lombard <clombard at linux.vnet.ibm.com> writes:
>
>> The POWER9 core supports a new feature: ASB_Notify which requires the
>> support of the Special Purpose Register: TIDR.
> TIDR is defined in ISA 3.0B, which would be worth mentioning.
>
> Can you spell out what an ASB_Notifiy is.
In addition to waking a thread up from the wait state via an
interrupt, the thread can also be removed from the wait state
via a ASB_notify command.
We see the acronyme ASB present in several documents, but
I never saw the definition of ASB.
>> The ASB_Notify command, generated by the AFU, will attempt to
> And please tell us what an AFU is, this patch and the code it touches
> are not obviously CAPI related, so the reader may not know what an AFU
> is, or the broader context.
You are right, this patch is not obviously CAPI related.
The Coherent Accelerator Process Interface (CAPI) is a general
term for the infrastructure of attaching a coherent
accelerator (CAPI card) to a POWER system.
The AFU (Accelerator Function Unit), located on the CAPI card,
executes computation-heavy functions helping the
application running on the host processor. There is a direct
communication between the application and the accelerator.
The purpose of an AFU is to provide applications with a higher
computational unit to improve the performance of the
application and off-load the host processor.
>> wake-up the host thread identified by the particular LPID:PID:TID.
> Host implies it doesn't work for threads in guests, but then LPID
> implies it does.
>
> You say "PID" a few times here, but I think you always mean PIDR? It
> would be worth spelling that out, otherwise people will assume you're
> talking about the Linux "pid" (which is actually the thread group id
> (TGID)), which is not the same as PIDR.
You are completely right. The use of the term 'PIDR' is much
more accurate.
>> The special register TIDR has to be updated to with the same value
>> present in the process element.
> What's a process element? :)
An other CAPI term :-)
When an application (running on the host) requests use of an AFU,
a process element is added to the process-element linked list
that describes the application’s process state. The process element
also contains a work element descriptor (WED) provided by the
application. The WED contains the full description of the job to be
performed by the AFU.
>> If the length of the register TIDR is 64bits, the CAPI Translation
> Is it 64-bits? The architecture says it is.
yes. The length is 64-bits.
>> Service Layer core (XSL9) for Power9 systems limits the size (16bits) of
>> the Thread ID when it generates the ASB_Notify message adding
>> PID:LPID:TID information from the context.
> When you say "Thread ID" here I think you're referring to the "TID" in
> the "PID:LPID:TID" tuple, and you're saying it is limited to 16-bits, by
> the P9 CAPI hardware.
correct.
>> The content of the internal kernel Thread ID (32bits) can not therefore
> "as returned by sys_gettid()" would help to disambiguate here.
>
>> be used to fulfill the register TIDR.
> .. if you're intention is to use TIDR with CAPI.
>
>
> I'm assuming here that by "kernel Thread ID" you mean the kernel "pid"
> which is returned to userspace as the "TID" by gettid().
correct.
>
> That value is global in the system, so it's overkill for this usage, as
> all you need is a value unique to all threads that share a PIDR
> (== mm_struct).
>
> So it seems like we could also solve this by simply having a counter in
> the mm_struct (actually mm_context_t), and handing that out when
> userspace tries to read TIDR.
>
> Or we could do the same and have some other API for reading it, ie. not
> using mfspr() but an actual CAPI API. Or does userspace even need the
> raw value?
Your remarks are very interesting. I need more time to look
into that.
> Is there a usecase where multiple threads would assign themselves the
> same TIDR so that all (some? one?) of them is woken up by the
> ASB_notify? Or is that an error?
Normally no. Only one thread can have a non zero TIDR value.
>> This patch allows to avoid this limitation by adding a new interface
> This doesn't avoid the 16-bit CAPI limitation AFAICS. What it does is
> let (force) userspace manage the values of TIDR, and therefore it can
> control them such that they fit in 16-bits.
right. The control will be done by libcxl. Libcxl is the focal point
to manage the values of TIDR.
>> for the user. The instructions mfspr/mtspr SPRN_TIDR are emulated,
>> save/restore SPRs (context switch) are updated and a new feature
>> (CPU_FTR_TIDR) is added to POWER9 system.
> I'm not clear if we need a CPU feature.
>
> TIDR is defined in ISA 3.0B, so anything that implements ISA 3.0B must
> implemented TIDR.
>
> It's not entirely clear if the kernel's CPU_FTR_ARCH_300 means 3.0 or
> 3.0B, but given 3.0B is essentially "what got implemented in P9" it
> should probably be the latter.
>
> In which case CPU_FTR_TIDR == CPU_FTR_ARCH_300.
okay. good point.
> I see you don't define it on P9 DD1, but that could be handled as a DD1
> workaround, rather than a separate feature bit for TIDR on non-DD1.
>
>> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
>> index d02ad93..706f668 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -215,6 +215,7 @@ enum {
>> #define CPU_FTR_DABRX LONG_ASM_CONST(0x0800000000000000)
>> #define CPU_FTR_PMAO_BUG LONG_ASM_CONST(0x1000000000000000)
>> #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000000000000000)
>> +#define CPU_FTR_TIDR LONG_ASM_CONST(0x8000000000000000)
>>
>> #ifndef __ASSEMBLY__
>>
>> @@ -474,7 +475,8 @@ enum {
>> CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>> CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>> CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
>> - CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
>> + CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | \
>> + CPU_FTR_TIDR)
>> #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
>> (~CPU_FTR_SAO))
>> #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>> diff --git a/arch/powerpc/include/asm/emulated_ops.h b/arch/powerpc/include/asm/emulated_ops.h
>> index f00e10e..e83ad42 100644
>> --- a/arch/powerpc/include/asm/emulated_ops.h
>> +++ b/arch/powerpc/include/asm/emulated_ops.h
>> @@ -54,6 +54,8 @@ extern struct ppc_emulated {
>> #ifdef CONFIG_PPC64
>> struct ppc_emulated_entry mfdscr;
>> struct ppc_emulated_entry mtdscr;
>> + struct ppc_emulated_entry mftidr;
>> + struct ppc_emulated_entry mttidr;
>> struct ppc_emulated_entry lq_stq;
>> #endif
>> } ppc_emulated;
>> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
>> index fa9ebae..3ebc446 100644
>> --- a/arch/powerpc/include/asm/ppc-opcode.h
>> +++ b/arch/powerpc/include/asm/ppc-opcode.h
>> @@ -241,6 +241,10 @@
>> #define PPC_INST_MFSPR_DSCR_USER_MASK 0xfc1ffffe
>> #define PPC_INST_MTSPR_DSCR_USER 0x7c0303a6
>> #define PPC_INST_MTSPR_DSCR_USER_MASK 0xfc1ffffe
>> +#define PPC_INST_MFSPR_TIDR 0x7d2452a6
> Are you sure that's right?
I am pretty sure. But I will check that.
> The arch says TIDR is SPR 144 == 0x90.
>
> Binutils tells me that ^ is:
>
> mfspr r9,324
>
> Which is 0x144 ?
>
>> +#define PPC_INST_MFSPR_TIDR_MASK 0xfd2ffffe
>> +#define PPC_INST_MTSPR_TIDR 0x7d2453a6
>> +#define PPC_INST_MTSPR_TIDR_MASK 0xfd2ffffe
>> #define PPC_INST_MFVSRD 0x7c000066
>> #define PPC_INST_MTVSRD 0x7c000166
>> #define PPC_INST_SLBFEE 0x7c0007a7
>> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
>> index fab7ff8..58cc212 100644
>> --- a/arch/powerpc/include/asm/processor.h
>> +++ b/arch/powerpc/include/asm/processor.h
>> @@ -329,6 +329,7 @@ struct thread_struct {
>> */
>> int dscr_inherit;
>> unsigned long ppr; /* used to save/restore SMT priority */
>> + unsigned long tidr;
>> #endif
>> #ifdef CONFIG_PPC_BOOK3S_64
>> unsigned long tar;
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 9f3e2c9..f06ea10 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1084,6 +1084,9 @@ static inline void save_sprs(struct thread_struct *t)
>> if (cpu_has_feature(CPU_FTR_DSCR))
>> t->dscr = mfspr(SPRN_DSCR);
>>
>> + if (cpu_has_feature(CPU_FTR_TIDR))
>> + t->tidr = mfspr(SPRN_TIDR);
> You shouldn't ever need to save it here.
>
> It's only ever written by the emulated mtspr, so the kernel can save the
> new value when that happens, meaning we don't have to save it on every
> context switch of every process.
>
> In fact you already do that, so you can just drop this hunk.
okay.
>> @@ -1120,6 +1123,11 @@ static inline void restore_sprs(struct thread_struct *old_thread,
>> mtspr(SPRN_DSCR, dscr);
>> }
>>
>> + if (cpu_has_feature(CPU_FTR_TIDR)) {
>> + if (old_thread->tidr != new_thread->tidr)
>> + mtspr(SPRN_TIDR, new_thread->tidr);
>> + }
> We could also optimise this for the common case of threads that don't
> use TIDR at all.
>
> Can we declare TIDR = 0 invalid? If so that could become:
>
> if (new_thread->tidr && (old_thread->tidr != new_thread->tidr))
> mtspr(SPRN_TIDR, new_thread->tidr);
This optimization sounds good.
>
> Or if TIDR = 0 needs to be allowed, we could add flag to the thread struct
> saying whether TIDR has been read or written.
>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index bfcfd9e..b829de7 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -1141,6 +1141,25 @@ static int emulate_instruction(struct pt_regs *regs)
>> mtspr(SPRN_DSCR, current->thread.dscr);
>> return 0;
>> }
>> + /* Emulate the mfspr rD, TIDR. */
>> + if (((instword & PPC_INST_MFSPR_TIDR_MASK) ==
>> + PPC_INST_MFSPR_TIDR) &&
>> + cpu_has_feature(CPU_FTR_TIDR)) {
>> + PPC_WARN_EMULATED(mftidr, regs);
>> + rd = (instword >> 21) & 0x1f;
>> + regs->gpr[rd] = mfspr(SPRN_TIDR);
>> + return 0;
>> + }
>> + /* Emulate the mtspr TIDR, rD. */
>> + if (((instword & PPC_INST_MTSPR_TIDR_MASK) ==
>> + PPC_INST_MTSPR_TIDR) &&
>> + cpu_has_feature(CPU_FTR_TIDR)) {
>> + PPC_WARN_EMULATED(mttidr, regs);
>> + rd = (instword >> 21) & 0x1f;
>> + current->thread.tidr = regs->gpr[rd];
>> + mtspr(SPRN_TIDR, regs->gpr[rd]);
>> + return 0;
>> + }
>> #endif
>
>
> cheers
>
Thanks a lot for this review.
More information about the Linuxppc-dev
mailing list