floating point support in the driver.
Misbah khan
misbah_khan at engineer.com
Tue Aug 5 19:49:17 EST 2008
Hi David ,
Thank you for your reply.
I am running the algorithm on OMAP processor (arm-core) and i did tried the
same on iMX processor which takes 1.7 times more than OMAP.
It is true that the algorithm is performing the vector operation which is
blowing the cache .
But the question is How to lock the cache ? In driver how should we
implement the same ?
An example code or a document could be helpful in this regard.
--- Misbah <><
David Hawkins-3 wrote:
>
>
> Hi Misbah,
>
> I would recommend you look at your floating-point code again
> and benchmark each section. You should be able to estimate
> the number of clock cycles required to complete an operation
> and then check that against your measurements.
>
> Depending on whether your algorithm is processing intensive
> or data movement intensive, you may find that the big time
> waster is moving data on or off chip, or perhaps its a large
> vector operation that is blowing out the cache. If you
> do find that, then on some processors you can lock the
> cache, so your algorithm would require a custom driver
> that steals part of the cache from the OS, but the floating point
> code would not run in the kernel, it would run on data
> stored in the stolen cache area. You can lock both instructions
> and data in the cache; eg. an FFT routine can be locked in
> the instruction cache, while FFT data is in the data cache.
> I'm not sure how easy this is to do under Linux though.
>
> Here's an example of the level of detail you can get
> downto when benchmarking code:
>
> http://www.ovro.caltech.edu/~dwh/correlator/pdf/dsp_programming.pdf
>
> The FFT routine used on this processor made use of both
> the instruction and data cache (on-chip SRAM) on the
> DSP.
>
> This code is being re-developed to run on a MPC8349EA PowerPC
> with FPU. I did some initial testing to confirm that the
> FPU operates as per the data sheet, and will eventually get
> around to more complete testing.
>
> Which processor were you running your code on, and what
> frequency were you operating the processor at? How does
> the algorithm timing compare when run on other processors,
> eg. your desktop or laptop machine?
>
> Cheers,
> Dave
> _______________________________________________
> Linuxppc-embedded mailing list
> Linuxppc-embedded at ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
>
>
--
View this message in context: http://www.nabble.com/floating-point-support-in-the-driver.-tp18772109p18827857.html
Sent from the linuxppc-embedded mailing list archive at Nabble.com.
More information about the Linuxppc-embedded
mailing list