scatter/gather DMA and cache coherency

Phil Nitschke phil at avalon.com.au
Fri Feb 17 00:52:11 EST 2006


>>>>> "ES" == Eugene Surovegin <ebs at ebshome.net> writes:

  ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
  >> Hi,
  >> 
  >> I've been using a PCI device driver developed by a third party
  >> company.  It uses a scatter/gather DMA I/O to transfer data from
  >> the PCI device into user memory.  When using a buffer size of
  >> about 1 MB, the driver achieves a transfer bandwidth of about 60
  >> MB/s, on a 66 MHz, 32-bit bus.
  >> 
  >> The problem is, that sometimes the data is corrupt (usually on
  >> the first transfer).  We've concluded that the problem is related
  >> to cache coherency.  The Artesyn 2.6.10 reference kernel
  >> (branched from the kernel at penguinppc.org) must be built with
  >> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
  >> verified operation with hardware coherency enabled.  My
  >> understanding is that their Marvel system controller (MV64460)
  >> supports cache snooping, but their Linux kernel support hasn't
  >> caught up yet.
  >> 
  >> So if I understand my situation correctly, the device driver must
  >> use software-enforced coherency to avoid data corruption.  Is
  >> this correct?
  >> 
  >> What currently happens is this:
  >> 
  >> The buffers are allocated with get_user_pages(...)
  >> 
  >> After each DMA transfer is complete, the driver invalidates the
  >> cache using __dma_sync_page(...)

  ES> No, buffers must be invalidated _before_ DMA transfer, not
  ES> after.  Also, don't use internal PPC functions like
  ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
  ES> official API.

Thanks for the suggestions.  I'd like to point out, however, a few
points: 

1/.  I did not write the driver (see my first line above).  I'm
     reading someone else's source and trying to figure out whether it
     is right or wrong, so I can discuss with them authoritatively
     what is going on.

2/.  I'm not _sure_ I understand terms like software-enforced
     coherency, non-consistent platforms, etc.  So should I be looking
     at the API in section I or II of DMA-API.txt ?  (I think section 'Id')

3/.  I think I did not explain the DMA process clearly enough.  This
     is how the third party documentation says the driver should be
     used (my annotations in parenthesis): 

	- Allocate and lock buffer into physical memory
            (Call driver ioctl function to map user DMA buffer using
            get_user_pages()) 
	- Configure DMA chain
	- Start DMA transfer
            (Set ID of the DMA descriptor that the DMA controller
            shall load first.  Allow target to perform bus-mastered
            DMA into platform memory)
	- Wait for DMA transfer to complete
            (interrupt signals end of transfer from target)
	- Do Cache Invalidate
            (Call driver ioctl which calls __dma_sync_page(), to
            invalidate the cache prior to reading the buffer from the
            host CPU.  Then copy data from buffer into other user
            memory.)
	- Unlock and free buffer from physical memory
            (Call device driver ioctl function which calls
            free_user_pages()) 

     So is __dma_sync_page being called by their driver routines at
     the wrong time?

4/.  The DMA-API.txt says:
        "Memory coherency operates at a granularity called the cache
        line width.  In order for memory mapped by this API to operate
        correctly, the mapped region must begin exactly on a cache
        line boundary and end exactly on one (to prevent two
        separately mapped regions from sharing a single cache line)."

     Given that we're not relying on cache snooping, and we call
     functions to invalidate the cache, does this statement still
     apply? 

Thanks again,

-- 
Phil



More information about the Linuxppc-embedded mailing list