[RFC] Add IBM Blue Gene/Q Platform
Jimi Xenidis
jimix at pobox.com
Mon Dec 10 16:56:07 EST 2012
On Dec 9, 2012, at 6:47 PM, Michael Neuling <mikey at neuling.org> wrote:
> Jimi Xenidis <jimix at pobox.com> wrote:
>
>>
>> On Dec 7, 2012, at 7:38 AM, Jimi Xenidis <jimix at pobox.com> wrote:
>>
>>>
>>> On Dec 6, 2012, at 11:54 PM, Michael Neuling <mikey at neuling.org> wrote:
>>>
>>>>> commit 279c0615917b959a652e81f4ad0d886e2d426d85
>>>>> Author: Jimi Xenidis <jimix at pobox.com>
>>>>> Date: Wed Dec 5 13:43:22 2012 -0500
>>>>>
>>>>> powerpc/book3e: IBM Blue Gene/Q Quad Processing eXtention (QPX)
>>>>>
>>>>> This enables kernel support for the QPX extention and is intended for
>>>>> processors that support it, usually an IBM Blue Gene processor.
>>>>> Turning it on does not effect other processors but it does add code
>>>>> and will quadruple the per thread save and restore area for the FPU
>>>>> (hense the name). If you have enabled VSX it will only double the
>>>>> space.
>>>>>
>>>>> Signed-off-by: Jimi Xenidis <jimix at pobox.com>
>>
>> <<snip>>
>>>>
>>>>
>>>>
>>>> +BEGIN_FTR_SECTION \
>>>> + SAVE_32VSRS(n,c,base); \
>>>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
>>>> +BEGIN_FTR_SECTION \
>>>> + SAVE_32QRS(n,c,base); \
>>>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX);
>>>>
>>>> I don't think we want to do this. We are going to end up with 64
>>>> NOPS here somewhere.
>>>
>>> Excellent point, NOPs are cheap on most processors but not A2 and a lot of embedded, I can wrap some branches with the FTR instead.
>>> Do you have a concern on the code size?
>>
>> Thought about it a bit and came up with this solution for arch/powerpc/kernel/fpu.S.
>> This should address the following issues
>> - MSR_VSX vs MSR_VEC
>> - Big chunks of NOPs in the code path
>> - Less FTR space fixups at boot time.
>> - IMNHSO easier to read especially when disassembled
>
> Indeed, I think it looks better. I was going to mention that it was
> already pretty complex to read, so a rewrite like this was probably
> needed. So thanks!!
>
> That being said, there is a pretty complex testing matrix of
> CONFIG_VSX/VMX/FPU/QPX/SMP/64/32 CPU_FTR/VSX/FPU/QPX/VMX so I'd need to
> look/test more carefully to make sure all of these are covered.
>
> Also, transactional memory (see
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2012-November/102216.html)
> will change this code. You should rebase on top of that if you really
> want it considered for upstream.
Is this in a git tree anywhere? perhaps BenH's next branch?
-jx
>
> Mikey
>
>>
>> I did consider using the LR and BLR, but the !CONFIG_SMP case only adds one more special block and uses a different register set.
>> Also if this is agreeable I would like us to consider removing the *_32FPVSRS* macros entirely and put the FTR tests in the actual code.
>> This would allow us to use #ifdefs and reduce the amount of code that actually gets compiled.
>>
>> Thoughts?
>>
>> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
>> index e0ada05..5964067 100644
>> --- a/arch/powerpc/kernel/fpu.S
>> +++ b/arch/powerpc/kernel/fpu.S
>> @@ -25,30 +25,81 @@
>> #include <asm/asm-offsets.h>
>> #include <asm/ptrace.h>
>>
>> -#ifdef CONFIG_VSX
>> -#define __REST_32FPVSRS(n,c,base) \
>> -BEGIN_FTR_SECTION \
>> - b 2f; \
>> -END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
>> - REST_32FPRS(n,base); \
>> - b 3f; \
>> -2: REST_32VSRS(n,c,base); \
>> -3:
>> -
>> -#define __SAVE_32FPVSRS(n,c,base) \
>> -BEGIN_FTR_SECTION \
>> - b 2f; \
>> -END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
>> - SAVE_32FPRS(n,base); \
>> - b 3f; \
>> -2: SAVE_32VSRS(n,c,base); \
>> -3:
>> -#else
>> -#define __REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
>> -#define __SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
>> -#endif
>> -#define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
>> -#define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
>> +
>> +/*
>> + * Restore subroutines, R4 is scratch and R5 is base
>> + */
>> +vsx_restore:
>> + REST_32VSRS(0, __REG_R4, __REG_R5)
>> + b after_restore
>> +qpx_restore:
>> + REST_32QRS(0, __REG_R4, __REG_R5)
>> + b after_restore
>> +fpu_restore:
>> + REST_32FPRS(0, __REG_R5)
>> + b after_restore
>> +
>> +#define REST_32FPVSRS(n, c, base) \
>> +BEGIN_FTR_SECTION \
>> + b vsx_restore; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
>> +BEGIN_FTR_SECTION \
>> + b qpx_restore; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
>> + b fpu_restore; \
>> +after_restore:
>> +
>> +/*
>> + * Save subroutines, R4 is scratch and R3 is base
>> + */
>> +vsx_save:
>> + SAVE_32VSRS(0, __REG_R4, __REG_R3)
>> + b after_save
>> +qpx_save:
>> + SAVE_32QRS(0, __REG_R4, __REG_R3)
>> + b after_save
>> +fpu_save:
>> + SAVE_32FPRS(0, __REG_R3)
>> + b after_save
>> +
>> +#define SAVE_32FPVSRS(n, c, base) \
>> +BEGIN_FTR_SECTION \
>> + b vsx_save; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
>> +BEGIN_FTR_SECTION \
>> + b qpx_save; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
>> + b fpu_save; \
>> +after_save:
>> +
>> +#ifndef CONFIG_SMP
>> +/*
>> + * we need an extra save set for the !CONFIG_SMP case, see below
>> + * Scratch it R5 and base is R4
>> + */
>> +vsx_save_nosmp:
>> + SAVE_32VSRS(0,R5,R4)
>> + b after_save_nosmp
>> +
>> +qpx_save_nosmp:
>> + SAVE_32QRS(0,R5,R4)
>> + b after_save_nosmp
>> +
>> +fpu_save_nosmp:
>> + SAVE_32FPRS(0,R4)
>> + b after_save_nosmp
>> +
>> +#define SAVE_32FPVSRS_NOSMP(n,c,base) \
>> +BEGIN_FTR_SECTION \
>> + b vsx_save_nosmp; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX) \
>> +BEGIN_FTR_SECTION \
>> + b qpx_save_nosmp; \
>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX) \
>> + b fpu_save_nosmp; \
>> +after_save_nosmp:
>> +
>> +#endif /* !CONFIG_SMP */
>>
>> /*
>> * This task wants to use the FPU now.
>> @@ -65,6 +116,11 @@ BEGIN_FTR_SECTION
>> oris r5,r5,MSR_VSX at h
>> END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> #endif
>> +#ifdef CONFIG_PPC_QPX
>> +BEGIN_FTR_SECTION
>> + oris r5,r5,MSR_VEC at h
>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
>> +#endif
>> SYNC
>> MTMSRD(r5) /* enable use of fpu now */
>> isync
>> @@ -81,7 +137,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> beq 1f
>> toreal(r4)
>> addi r4,r4,THREAD /* want last_task_used_math->thread */
>> - SAVE_32FPVSRS(0, R5, R4)
>> + SAVE_32FPVSRS_NOSMP(0, R5, R4)
>> mffs fr0
>> stfd fr0,THREAD_FPSCR(r4)
>> PPC_LL r5,PT_REGS(r4)
>> @@ -94,7 +150,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> #endif /* CONFIG_SMP */
>> /* enable use of FP after return */
>> #ifdef CONFIG_PPC32
>> - mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
>> + mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */
>> lwz r4,THREAD_FPEXC_MODE(r5)
>> ori r9,r9,MSR_FP /* enable FP for current */
>> or r9,r9,r4
>> @@ -132,6 +188,11 @@ BEGIN_FTR_SECTION
>> oris r5,r5,MSR_VSX at h
>> END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> #endif
>> +#ifdef CONFIG_PPC_QPX
>> +BEGIN_FTR_SECTION
>> + oris r5,r5,MSR_VEC at h
>> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
>> +#endif
>> SYNC_601
>> ISYNC_601
>> MTMSRD(r5) /* enable use of fpu now */
>> @@ -148,10 +209,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> beq 1f
>> PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>> li r3,MSR_FP|MSR_FE0|MSR_FE1
>> -#ifdef CONFIG_VSX
>> +#if defined(CONFIG_VSX) || defined(CONFIG_PPC_QPX)
>> BEGIN_FTR_SECTION
>> oris r3,r3,MSR_VSX at h
>> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX | CPU_FTR_QPX)
>> #endif
>> andc r4,r4,r3 /* disable FP for previous task */
>> PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>>
>> -jx
>>
>>
>>>
>>>>
>>>> I'd like to see this patch broken into different parts.
>>>
>>> I'm not sure how _this_ patch:
>>> <https://github.com/jimix/linux-bgq/commit/279c0615917b959a652e81f4ad0d886e2d426d85>
>>> could be broken up, please advise.
>>>
>>>>
>>>> Also, have you boot tested this change on a VSX enabled box?
>>>
>>> I can try, I may bug you for help.
>>> Is there a commonly test (or apps) I should run?
>>>
>>> -jx
>>>
>>>
>>>>
>>>> Mikey
>>
More information about the Linuxppc-dev
mailing list