floating point support in the driver.

Wed Aug 6 02:53:45 EST 2008

Hi Misbah,

> I am running the algorithm on OMAP processor (arm-core)
> and i did tried the same on iMX processor which
> takes 1.7 times more than OMAP.

Ok, thats a 10,000ft benchmark. The observation being
that it fails your requirement.

How does that time compare to the operations
required, and their expected times?

> It is true that the algorithm is performing the vector
> operation which is blowing the cache.

Determined how? Obviously if your cache is 16K and your
data is 64K, there's no way it'll fit in there at once,
but the algorithm could be crafted such that 1K at a time
was processed, while another data packet was moved onto
the cache ... but this is very processor specific.

> But the question is How to lock the cache ? In driver
> how should we implement the same ?
> 
> An example code or a document could be helpful in this regard.

Indeed :)

I have no idea how the OMAP works, so the following are
just random, and possibly incorrect ramblings ...

The MPC8349EA startup code uses a trick where it zeros
out sections of the cache while providing an address.
Once the addresses and zeros are in the cache, its locked.
 From that point on, memory accesses to those addresses
result in cache 'hits'. This is the startup stack used
by the U-Boot bootloader.

If something similar was done under Linux, then *I guess*
you could implement mmap() and ioremap() the section of
addresses associated with the locked cache lines.
You could then DMA data to and from the cache area,
and run your algorithm there. That would provide you
'fast SRAM'.

However, you might be able to get the same effect by
setting up your processing algorithm such that it handled
smaller chunks of data.

Feel free to explain your data processing :)

Cheers,
Dave