AW: How to make use of SPE instructions?

Fri Jan 16 16:27:44 AEDT 2015

> Von: Scott Wood [scottwood at freescale.com]
> Gesendet: Donnerstag, 15. Januar 2015 23:56
> An: Markus Stockhausen
> Cc: linuxppc-dev at lists.ozlabs.org
> Betreff: Re: How to make use of SPE instructions?
> 
> On Thu, 2015-01-08 at 09:58 +0000, Markus Stockhausen wrote:
> > Hello,
> >
> > I developed a SHA224/256 kernel crypto module with SPE instructions.
> > The result looks quite promising (~ +50% speedup). Nevertheless the
> > flooding of kernel messages "SPE used in kernel" makes me feel
> > uncomfortable.
> >
> > My findings so far:
> >
> > - I can configure the kernel with "SPE support".
> > - arch/powerpc/kernel/head_fsl_booke.S suggests that the message is
> >   triggerd unconditionally whenwever we make use of SPE in kernel.
> - There exists a function enable_kernel_spe() but I don't know how
>   this could help me in my work.
> >
> > I guess I need some kind of "brackets" around my coding to make sure
> > the upper 32 bit of the registers are stored correctly during task switch.
> > Or is the use of SPE instructions inside the kernel totally forbidden? Any
> > expert with some helpful advise?
> 
> You need to disable preemption, call enable_kernel_spe(), and finish
> using SPE before you enable preemption.  This assumes that SPE is never
> used from interrupt context.  Be careful to not disable preemption for
> too long.

Thanks for your feedback. That did the trick. I'm currently working on
a (low power) 800 MHz single core P1014 CPU. That should be the
cheapest and slowest hardware that is available with SPE. My target 
is to use the module for calculating hash values of IPsec packets. So 
we are talking about input data of up to ~1500 bytes. 

I did some tests with the tcrypt module and I get a hashing speed of
~ 46MByte/s for 2K data chunks. Stock module gives 29MByte/s. In 
other words ~22,000 hashes per second. Overhead of the tcrypt data 
feeder of around 10% included. That are worst case 46us per hash and 
therefore 46us inside a non preemptive task.

In beetween I spent some time to do the same for SHA-1. There
we have ~46,000 hashes per second or 21us per 2K data. That are
+13% compared to the already available PPC assembler module.

Three questions are left:

- Does the setup conflict with the mentioned interrupt context?
- Is that a reasonable time interval for disabling preemption?
- Should I send the patches to this or to the crypto list?

Markus
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: InterScan_Disclaimer.txt
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20150116/9d7fcaa6/attachment-0001.txt>