[RFC PATCH 1/2]powerpc: foundation code to handle CR5 for local_t
Gabriel Paubert
paubert at iram.es
Thu Dec 4 04:07:43 AEDT 2014
On Wed, Dec 03, 2014 at 08:29:37PM +0530, Madhavan Srinivasan wrote:
> On Tuesday 02 December 2014 03:05 AM, Gabriel Paubert wrote:
> > On Thu, Nov 27, 2014 at 05:48:40PM +0530, Madhavan Srinivasan wrote:
> >> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> >> index 77f52b2..c42919a 100644
> >> --- a/arch/powerpc/include/asm/exception-64s.h
> >> +++ b/arch/powerpc/include/asm/exception-64s.h
> >> @@ -306,7 +306,26 @@ do_kvm_##n: \
> >> std r10,0(r1); /* make stack chain pointer */ \
> >> std r0,GPR0(r1); /* save r0 in stackframe */ \
> >> std r10,GPR1(r1); /* save r1 in stackframe */ \
> >> - beq 4f; /* if from kernel mode */ \
> >> +BEGIN_FTR_SECTION; \
> >> + lis r9,4096; /* Create a mask with HV and PR */ \
> >> + rldicr r9,r9,32,31; /* bits, AND with the MSR */ \
> >> + mr r10,r9; /* to check for Hyp state */ \
> >> + ori r9,r9,16384; \
> >> + and r9,r12,r9; \
> >> + cmpd cr3,r10,r9; \
> >> + beq cr3,66f; /* Jump if we come from Hyp mode*/ \
> >> + mtcrf 0x04,r10; /* Clear CR5 if coming from usr */ \
> >
> > I think you can do better than this, powerpc has a fantastic set
> > of rotate and mask instructions. If I understand correctly your
> > code you can replace it with the following:
> >
> > rldicl r10,r12,4,63 /* Extract HV bit to LSB of r10*/
> > rlwinm r9,r12,19,0x02 /* Extract PR bit to 2nd to last bit of r9 */
> > or r9,r9,10
> > cmplwi cr3,r9,1 /* Check for HV=1 and PR=0 */
> > beq cr3,66f
> > mtcrf 0x04,r10 /* Bits going to cr5 bits are 0 in r10 */
> >
>
> >From previous comment, at this point, CR0 already has MSR[PR], and only
> HV need to be checked. So I guess there still room for optimization.
> But good to learn this seq.
Actually the sequence I suggested can even be shortened again since the or
is not necessary and you can use an rlwimi instead.
rldicl r9,r12,5,62 /* r9 = 62 zeroes | HV | ? */
rlwimi r9,r12,18,0x01 /* r9 = 62 zeroes | HV | PR */
cmplwi cr3,r9,2 /* Check for HV=1 and PR=0 */
For 32 bit operands, rlwimi is a generic bit field to bit field move, but
GCC is (was, maybe GCC5 is better?) quite bad at recognizing opportunities
to use it.
I'm not sure that it is better to have two separate branches, one for
testing PR and the other for testing HV. Through how many branches can
the hardware speculate?
Unless I'm mistaken, this is code which is likely not to be in the
cache so I would optimize it first towards minimal code size.
Regards,
Gabriel
More information about the Linuxppc-dev
mailing list