[PATCH 1/3] KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV

Nicholas Piggin npiggin at gmail.com
Tue Dec 6 19:31:33 AEDT 2016


On Tue, 6 Dec 2016 17:09:07 +1100
Paul Mackerras <paulus at ozlabs.org> wrote:

> On Thu, Dec 01, 2016 at 06:18:10PM +1100, Nicholas Piggin wrote:
> > Change the calling convention to put the trap number together with
> > CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
> > handler, and r9 free.  
> 
> Cute idea!  Some comments below...
> 
> > The 64-bit PR handler entry translates the calling convention back
> > to match the previous call convention (i.e., shared with 32-bit), for
> > simplicity.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
> >  arch/powerpc/include/asm/exception-64s.h | 28 +++++++++++++++-------------
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 15 +++++++--------
> >  arch/powerpc/kvm/book3s_segment.S        | 27 ++++++++++++++++++++-------
> >  3 files changed, 42 insertions(+), 28 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> > index 9a3eee6..bc8fc45 100644
> > --- a/arch/powerpc/include/asm/exception-64s.h
> > +++ b/arch/powerpc/include/asm/exception-64s.h
> > @@ -233,7 +233,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
> >  
> >  #endif
> >  
> > -#define __KVM_HANDLER_PROLOG(area, n)					\
> > +#define __KVM_HANDLER(area, h, n)					\
> >  	BEGIN_FTR_SECTION_NESTED(947)					\
> >  	ld	r10,area+EX_CFAR(r13);					\
> >  	std	r10,HSTATE_CFAR(r13);					\
> > @@ -243,30 +243,32 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
> >  	std	r10,HSTATE_PPR(r13);					\
> >  	END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);	\
> >  	ld	r10,area+EX_R10(r13);					\
> > -	stw	r9,HSTATE_SCRATCH1(r13);				\
> > -	ld	r9,area+EX_R9(r13);					\
> >  	std	r12,HSTATE_SCRATCH0(r13);				\
> > -
> > -#define __KVM_HANDLER(area, h, n)					\
> > -	__KVM_HANDLER_PROLOG(area, n)					\
> > -	li	r12,n;							\
> > +	li	r12,(n);						\
> > +	sldi	r12,r12,32;						\
> > +	or	r12,r12,r9;						\  
> 
> Did you consider doing it the other way around, i.e. with r12
> containing (cr << 32) | trap?  That would save 1 instruction in each
> handler:

When I tinkered with it I thought it came out slightly nicer this way, but
your suggested versions seem to prove me wrong. I can change it if you'd
like.

> 
> +	sldi	r12,r9,32;						\
> +	ori	r12,r12,(n);						\
> 
> > +	ld	r9,area+EX_R9(r13);					\
> > +	std	r9,HSTATE_SCRATCH1(r13);				\  
> 
> Why not put this std in kvmppc_interrupt[_hv] rather than in each
> handler?

Patch 3/3 uses r9 to load the ctr when CONFIG_RELOCATABLE is turned on, so
this resulted in the smaller difference between the two cases. I agree it's
not ideal when config relocatable is off.

[snip]

Thanks,
Nick


More information about the Linuxppc-dev mailing list