scatter/gather DMA and cache coherency
Phil Nitschke
phil at avalon.com.au
Fri Feb 17 00:52:11 EST 2006
>>>>> "ES" == Eugene Surovegin <ebs at ebshome.net> writes:
ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
>> Hi,
>>
>> I've been using a PCI device driver developed by a third party
>> company. It uses a scatter/gather DMA I/O to transfer data from
>> the PCI device into user memory. When using a buffer size of
>> about 1 MB, the driver achieves a transfer bandwidth of about 60
>> MB/s, on a 66 MHz, 32-bit bus.
>>
>> The problem is, that sometimes the data is corrupt (usually on
>> the first transfer). We've concluded that the problem is related
>> to cache coherency. The Artesyn 2.6.10 reference kernel
>> (branched from the kernel at penguinppc.org) must be built with
>> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
>> verified operation with hardware coherency enabled. My
>> understanding is that their Marvel system controller (MV64460)
>> supports cache snooping, but their Linux kernel support hasn't
>> caught up yet.
>>
>> So if I understand my situation correctly, the device driver must
>> use software-enforced coherency to avoid data corruption. Is
>> this correct?
>>
>> What currently happens is this:
>>
>> The buffers are allocated with get_user_pages(...)
>>
>> After each DMA transfer is complete, the driver invalidates the
>> cache using __dma_sync_page(...)
ES> No, buffers must be invalidated _before_ DMA transfer, not
ES> after. Also, don't use internal PPC functions like
ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
ES> official API.
Thanks for the suggestions. I'd like to point out, however, a few
points:
1/. I did not write the driver (see my first line above). I'm
reading someone else's source and trying to figure out whether it
is right or wrong, so I can discuss with them authoritatively
what is going on.
2/. I'm not _sure_ I understand terms like software-enforced
coherency, non-consistent platforms, etc. So should I be looking
at the API in section I or II of DMA-API.txt ? (I think section 'Id')
3/. I think I did not explain the DMA process clearly enough. This
is how the third party documentation says the driver should be
used (my annotations in parenthesis):
- Allocate and lock buffer into physical memory
(Call driver ioctl function to map user DMA buffer using
get_user_pages())
- Configure DMA chain
- Start DMA transfer
(Set ID of the DMA descriptor that the DMA controller
shall load first. Allow target to perform bus-mastered
DMA into platform memory)
- Wait for DMA transfer to complete
(interrupt signals end of transfer from target)
- Do Cache Invalidate
(Call driver ioctl which calls __dma_sync_page(), to
invalidate the cache prior to reading the buffer from the
host CPU. Then copy data from buffer into other user
memory.)
- Unlock and free buffer from physical memory
(Call device driver ioctl function which calls
free_user_pages())
So is __dma_sync_page being called by their driver routines at
the wrong time?
4/. The DMA-API.txt says:
"Memory coherency operates at a granularity called the cache
line width. In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache
line boundary and end exactly on one (to prevent two
separately mapped regions from sharing a single cache line)."
Given that we're not relying on cache snooping, and we call
functions to invalidate the cache, does this statement still
apply?
Thanks again,
--
Phil
More information about the Linuxppc-embedded
mailing list