[RFC PATCH 1/2]powerpc: foundation code to handle CR5 for local_t

Gabriel Paubert paubert at iram.es
Thu Dec 4 04:07:43 AEDT 2014


On Wed, Dec 03, 2014 at 08:29:37PM +0530, Madhavan Srinivasan wrote:
> On Tuesday 02 December 2014 03:05 AM, Gabriel Paubert wrote:
> > On Thu, Nov 27, 2014 at 05:48:40PM +0530, Madhavan Srinivasan wrote:
> >> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> >> index 77f52b2..c42919a 100644
> >> --- a/arch/powerpc/include/asm/exception-64s.h
> >> +++ b/arch/powerpc/include/asm/exception-64s.h
> >> @@ -306,7 +306,26 @@ do_kvm_##n:								\
> >>  	std	r10,0(r1);		/* make stack chain pointer	*/ \
> >>  	std	r0,GPR0(r1);		/* save r0 in stackframe	*/ \
> >>  	std	r10,GPR1(r1);		/* save r1 in stackframe	*/ \
> >> -	beq	4f;			/* if from kernel mode		*/ \
> >> +BEGIN_FTR_SECTION;							   \
> >> +	lis	r9,4096;		/* Create a mask with HV and PR */ \
> >> +	rldicr	r9,r9,32,31;		/* bits, AND with the MSR       */ \
> >> +	mr	r10,r9;			/* to check for Hyp state       */ \
> >> +	ori	r9,r9,16384;						   \
> >> +	and	r9,r12,r9;						   \
> >> +	cmpd	cr3,r10,r9;							   \
> >> +	beq	cr3,66f;		/* Jump if we come from Hyp mode*/ \
> >> +	mtcrf	0x04,r10;		/* Clear CR5 if coming from usr */ \
> > 
> > I think you can do better than this, powerpc has a fantastic set
> > of rotate and mask instructions. If I understand correctly your
> > code you can replace it with the following:
> > 
> > 	rldicl	r10,r12,4,63       /* Extract HV bit to LSB of r10*/
> > 	rlwinm  r9,r12,19,0x02     /* Extract PR bit to 2nd to last bit of r9 */
> > 	or	r9,r9,10
> > 	cmplwi  cr3,r9,1           /* Check for HV=1 and PR=0 */		
> > 	beq     cr3,66f
> > 	mtcrf   0x04,r10           /* Bits going to cr5 bits are 0 in r10 */
> > 
> 
> >From previous comment, at this point, CR0 already has MSR[PR], and only
> HV need to be checked. So I guess there still room for optimization.
> But good to learn this seq.

Actually the sequence I suggested can even be shortened again since the or
is not necessary and you can use an rlwimi instead.

	rldicl r9,r12,5,62      /*  r9 = 62 zeroes | HV | ?  */
	rlwimi r9,r12,18,0x01   /*  r9 = 62 zeroes | HV | PR */
	cmplwi cr3,r9,2         /* Check for HV=1 and PR=0 */

For 32 bit operands, rlwimi is a generic bit field to bit field move, but
GCC is (was, maybe GCC5 is better?) quite bad at recognizing opportunities
to use it.

I'm not sure that it is better to have two separate branches, one for
testing PR and the other for testing HV. Through how many branches can
the hardware speculate? 

Unless I'm mistaken, this is code which is likely not to be in the
cache so I would optimize it first towards minimal code size.

	Regards,
	Gabriel


More information about the Linuxppc-dev mailing list