AW: AW: SPE & Interrupt context (was how to make use of SPE instructions)

Fri Jan 30 16:37:29 AEDT 2015

> Von: Scott Wood [scottwood at freescale.com]
> Gesendet: Freitag, 30. Januar 2015 01:49
> An: Markus Stockhausen
> Cc: Michael Ellerman; linuxppc-dev at lists.ozlabs.org; Herbert Xu
> Betreff: Re: AW: SPE & Interrupt context (was how to make use of SPE instructions)
> 
> On Wed, 2015-01-28 at 05:00 +0000, Markus Stockhausen wrote:
> > > > Von: Scott Wood [scottwood at freescale.com]
> > > > Gesendet: Mittwoch, 28. Januar 2015 05:21
> > > > An: Markus Stockhausen
> > > > Cc: Michael Ellerman; linuxppc-dev at lists.ozlabs.org; Herbert Xu
> > > > Betreff: Re: SPE & Interrupt context (was how to make use of SPE instructions)
> > > >
> > > > Hi Scott,
> > > >
> > > > thanks for your helpful feedback. As you might have seen I sent a first
> > > > patch for the sha256 kernel module that takes care about preemption.
> > > >
> > > > Herbert Xu noticed that my module won't run in for IPsec as all
> > > > work will be done from interrupt context. Do you have a tip how I can
> > > > mitigate the check I implemented:
> > > >
> > > > static bool spe_usable(void)
> > > > {
> > > >   return !in_interrupt();
> > > > }
> > > >
> > > > Intel guys have something like that
> > > >
> > > > bool irq_fpu_usable(void)
> > > > {
> > > >   return !in_interrupt() ||
> > > >     interrupted_user_mode() ||
> > > >     interrupted_kernel_fpu_idle();
> > > > }
> > > >
> > > > But I have no idea how to transfer it to the PPC/SPE case.
> > >
> > > I'm not sure what sort of tip you're looking for, other than
> > > implementing it myself. :-)
> >
> > Hi Scott,
> >
> > maybe I did not explain it correctly. interrupted_kernel_fpu_idle()
> > is x86 specific. The same applies to interrupted_user_mode().
> > I'm just searching for a similar feature in the PPC/SPE world.
> 
> There isn't one.
> 
> > I can see that enable_kernel_spe() does something with the
> > MSR_SPE flag, but I have no idea  how to determine if I'm allowed
> > to enable SPE although I'm inside an interrupt context.
> 
> As with x86, you'd want to check whether the kernel interrupted
> userspace.  I don't know what x86 is doing with TS, but on PPC you might
> check whether the interrupted thread had MSR_FP enabled.
> 
> > I'm asking because from the previous posts I conclude that
> > running SPE instructions inside an interrupt might be critical.
> > Because of registers not being saved?
> 
> Yes.  Currently callers of enable_kernel_spe() only need to disable
> preemption, not interrupts.
> 
> > Or can I just save the register contents myself and interrupt
> > context is no longer a showstopper?
> 
> If you only need a small number of registers that might be reasonable,
> but if you need a bunch then you don't want to save them when you don't
> have to.
> 
> Another option is to change enable_kernel_spe() to require interrupts to
> be disabled.

Phew, that is going deeper than I expected. 

I'm a newbie in the topic of interrupts and FPU/SPE registers. Nevertheless
enforcing enable_kernel_spe() to only be available outside of interrupt
context sounds too restrictive for me. Also checking for thread/CPU flags 
of an interrupted process is nothing I can or want to implement. There
might be the risk that I'm starting something that will be too complex
for me.

BUT! Given the fact that SPE registers are only extended GPRs and my
algorithm needs just 10 of them I can live with the following design.

- I must already save several non-volatile registers. Putting the 64 bit values 
into them would require me to save their contents with evstdd instead of 
stw. Of course stack alignment to 8 bytes required. So only a few alignment
instructions needed additionally during initialization.

- During function cleanup I will restore the registers the same way.

- In case I interrupted myself, I might have saved sensitive data of another 
thread on my stack. So I will zero that area after I restored the registers.
That needs an additional 10 instructions. In contrast to ~2000 instructions
for one sha256 round that should be neglectable.

This little overhead will save me lots of trouble at other locations:

- I can avoid checking for an interrupt context.

- I don't need a fallback to the generic implementation. 

Thinking about it more and more I think I performance will stay the same. 
Can you confirm that this will work? If yes I will send a v2 patch.

Markus
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InterScan_Disclaimer.txt
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20150130/b990f441/attachment.txt>