[RFC] Add IBM Blue Gene/Q Platform

Mon Dec 10 11:47:05 EST 2012

Jimi Xenidis <jimix at pobox.com> wrote:

> 
> On Dec 7, 2012, at 7:38 AM, Jimi Xenidis <jimix at pobox.com> wrote:
> 
> > 
> > On Dec 6, 2012, at 11:54 PM, Michael Neuling <mikey at neuling.org> wrote:
> > 
> >>> commit 279c0615917b959a652e81f4ad0d886e2d426d85
> >>> Author: Jimi Xenidis <jimix at pobox.com>
> >>> Date:   Wed Dec 5 13:43:22 2012 -0500
> >>> 
> >>>   powerpc/book3e: IBM Blue Gene/Q Quad Processing eXtention (QPX)
> >>> 
> >>>   This enables kernel support for the QPX extention and is intended for
> >>>   processors that support it, usually an IBM Blue Gene processor.
> >>>   Turning it on does not effect other processors but it does add code
> >>>   and will quadruple the per thread save and restore area for the FPU
> >>>   (hense the name).  If you have enabled VSX it will only double the
> >>>   space.
> >>> 
> >>>   Signed-off-by: Jimi Xenidis <jimix at pobox.com>
> >> 
> 
> <<snip>>
> >> 
> >> 
> >> 
> >> +BEGIN_FTR_SECTION							\
> >> +	SAVE_32VSRS(n,c,base);						\
> >> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> >> +BEGIN_FTR_SECTION							\
> >> +	SAVE_32QRS(n,c,base);						\
> >> +END_FTR_SECTION_IFSET(CPU_FTR_QPX);	
> >> 
> >> I don't think we want to do this.  We are going to end up with 64
> >> NOPS here somewhere.
> > 
> > Excellent point, NOPs are cheap on most processors but not A2 and a lot of embedded, I can wrap some branches with the FTR instead.
> > Do you have a concern on the code size?
> 
> Thought about it a bit and came up with this solution for arch/powerpc/kernel/fpu.S.
> This should address the following issues
>  - MSR_VSX vs MSR_VEC
>  - Big chunks of NOPs in the code path
>  - Less FTR space fixups at boot time.
>  - IMNHSO easier to read especially when disassembled

Indeed, I think it looks better.  I was going to mention that it was
already pretty complex to read, so a rewrite like this was probably
needed.  So thanks!!

That being said, there is a pretty complex testing matrix of
CONFIG_VSX/VMX/FPU/QPX/SMP/64/32 CPU_FTR/VSX/FPU/QPX/VMX so I'd need to
look/test more carefully to make sure all of these are covered.

Also, transactional memory (see
http://lists.ozlabs.org/pipermail/linuxppc-dev/2012-November/102216.html)
will change this code.  You should rebase on top of that if you really
want it considered for upstream.

Mikey

> 
> I did consider using the LR and BLR, but the !CONFIG_SMP case only adds one more special block and uses a different register set.
> Also if this is agreeable I would like us to consider removing the *_32FPVSRS* macros entirely and put the FTR tests in the actual code.
> This would allow us to use #ifdefs and reduce the amount of code that actually gets compiled.
> 
> Thoughts?
> 
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index e0ada05..5964067 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -25,30 +25,81 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/ptrace.h>
>  
> -#ifdef CONFIG_VSX
> -#define __REST_32FPVSRS(n,c,base)					\
> -BEGIN_FTR_SECTION							\
> -	b	2f;							\
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> -	REST_32FPRS(n,base);						\
> -	b	3f;							\
> -2:	REST_32VSRS(n,c,base);						\
> -3:
> -
> -#define __SAVE_32FPVSRS(n,c,base)					\
> -BEGIN_FTR_SECTION							\
> -	b	2f;							\
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> -	SAVE_32FPRS(n,base);						\
> -	b	3f;							\
> -2:	SAVE_32VSRS(n,c,base);						\
> -3:
> -#else
> -#define __REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
> -#define __SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
> -#endif
> -#define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
> -#define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
> +
> +/*
> + * Restore subroutines, R4 is scratch and R5 is base
> + */
> +vsx_restore:
> +	REST_32VSRS(0, __REG_R4, __REG_R5)
> +	b after_restore
> +qpx_restore:
> +	REST_32QRS(0, __REG_R4, __REG_R5)
> +	b after_restore
> +fpu_restore:
> +	REST_32FPRS(0, __REG_R5)
> +	b after_restore
> +
> +#define REST_32FPVSRS(n, c, base)		\
> +BEGIN_FTR_SECTION				\
> +	b vsx_restore;				\
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX)		\
> +BEGIN_FTR_SECTION				\
> +	b qpx_restore;				\
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)		\
> +	b fpu_restore;				\
> +after_restore:
> +
> +/*
> + * Save subroutines, R4 is scratch and R3 is base
> + */
> +vsx_save:
> +	SAVE_32VSRS(0, __REG_R4, __REG_R3)
> +	b after_save
> +qpx_save:
> +	SAVE_32QRS(0, __REG_R4, __REG_R3)
> +	b after_save
> +fpu_save:
> +	SAVE_32FPRS(0, __REG_R3)
> +	b after_save
> +
> +#define SAVE_32FPVSRS(n, c, base)		\
> +BEGIN_FTR_SECTION				\
> +	b vsx_save;				\
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX)		\
> +BEGIN_FTR_SECTION				\
> +	b qpx_save;				\
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)		\
> +	b fpu_save;				\
> +after_save:
> +
> +#ifndef CONFIG_SMP
> +/*
> + * we need an extra save set for the !CONFIG_SMP case, see below
> + * Scratch it R5 and base is R4
> + */
> +vsx_save_nosmp:
> +	SAVE_32VSRS(0,R5,R4)
> +	b after_save_nosmp
> +
> +qpx_save_nosmp:
> +	SAVE_32QRS(0,R5,R4)
> +	b after_save_nosmp
> +
> +fpu_save_nosmp:
> +	SAVE_32FPRS(0,R4)
> +	b after_save_nosmp
> +
> +#define SAVE_32FPVSRS_NOSMP(n,c,base)		\
> +BEGIN_FTR_SECTION				\
> +	b vsx_save_nosmp;			\
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX)		\
> +BEGIN_FTR_SECTION				\
> +	b qpx_save_nosmp;			\
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)		\
> +	b fpu_save_nosmp;			\
> +after_save_nosmp:
> +
> +#endif /* !CONFIG_SMP */
>  
>  /*
>   * This task wants to use the FPU now.
> @@ -65,6 +116,11 @@ BEGIN_FTR_SECTION
>  	oris	r5,r5,MSR_VSX at h
>  END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  #endif
> +#ifdef CONFIG_PPC_QPX
> +BEGIN_FTR_SECTION
> +	oris	r5,r5,MSR_VEC at h
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
> +#endif
>  	SYNC
>  	MTMSRD(r5)			/* enable use of fpu now */
>  	isync
> @@ -81,7 +137,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  	beq	1f
>  	toreal(r4)
>  	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
> -	SAVE_32FPVSRS(0, R5, R4)
> +	SAVE_32FPVSRS_NOSMP(0, R5, R4)
>  	mffs	fr0
>  	stfd	fr0,THREAD_FPSCR(r4)
>  	PPC_LL	r5,PT_REGS(r4)
> @@ -94,7 +150,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  #endif /* CONFIG_SMP */
>  	/* enable use of FP after return */
>  #ifdef CONFIG_PPC32
> -	mfspr	r5,SPRN_SPRG_THREAD		/* current task's THREAD (phys) */
> +	mfspr	r5,SPRN_SPRG_THREAD	/* current task's THREAD (phys) */
>  	lwz	r4,THREAD_FPEXC_MODE(r5)
>  	ori	r9,r9,MSR_FP		/* enable FP for current */
>  	or	r9,r9,r4
> @@ -132,6 +188,11 @@ BEGIN_FTR_SECTION
>  	oris	r5,r5,MSR_VSX at h
>  END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  #endif
> +#ifdef CONFIG_PPC_QPX
> +BEGIN_FTR_SECTION
> +	oris	r5,r5,MSR_VEC at h
> +END_FTR_SECTION_IFSET(CPU_FTR_QPX)
> +#endif
>  	SYNC_601
>  	ISYNC_601
>  	MTMSRD(r5)			/* enable use of fpu now */
> @@ -148,10 +209,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  	beq	1f
>  	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>  	li	r3,MSR_FP|MSR_FE0|MSR_FE1
> -#ifdef CONFIG_VSX
> +#if defined(CONFIG_VSX) || defined(CONFIG_PPC_QPX)
>  BEGIN_FTR_SECTION
>  	oris	r3,r3,MSR_VSX at h
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX | CPU_FTR_QPX)
>  #endif
>  	andc	r4,r4,r3		/* disable FP for previous task */
>  	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> 
> -jx
> 
> 
> > 
> >> 
> >> I'd like to see this patch broken into different parts.
> > 
> > I'm not sure how _this_ patch:
> >  <https://github.com/jimix/linux-bgq/commit/279c0615917b959a652e81f4ad0d886e2d426d85>
> > could be broken up, please advise.
> > 
> >> 
> >> Also, have you boot tested this change on a VSX enabled box?
> > 
> > I can try, I may bug you for help.
> > Is there a commonly test (or apps) I should run?
> > 
> > -jx
> > 
> > 
> >> 
> >> Mikey
> > 
>