AW: PowerPC PCI DMA issues (prefetch/coherency?)

Prodyut Hazarika phazarika at amcc.com
Sat Sep 12 02:05:08 EST 2009


> I tried, using our JTAG debugger (BDI3000), to pause operation after 
> calling dma_alloc_coherent to examine the TLB entry for the memory 
> returned by the call (which was just past 
> CONFIG_CONSISTENT_START=0xff100000).  The TLB list loaded at the time 
> that I paused operation did not show a mapping for this area.  I guess

> the kernel swaps TLB entries on the fly so it isn't limited to only 64

> entries?  I will try to sleep in the same context as the 
> dma_alloc_coherent call to try to catch the TLB entry while loaded to 
> see if it has the I bit set.

> If that fails, any ideas?

Sleeping won't cause the entry to appear at the TLB.
After the dma_alloc call, try to deference the pointer returned.
As a result of the dereference, a DataTLB Miss will happen which will
result in
the appropriate entry put in TLB.
Then do a JTAG break right after the dereference, and you should be able
to see the TLB entry.

Thanks
Prodyut



Mikhail Zolotaryov wrote:
> Hi Tom,
>
> possible solution could be to use tasklet to perform DMA-related job 
> (as in most cases DMA transfer is interrupt driven - makes sense).
>
>
> Tom Burns wrote:
>> Hi,
>>
>> With the default config for the Sequoia board on 2.6.24, calling 
>> pci_dma_sync_sg_for_cpu() results in executing
>> invalidate_dcache_range() in arch/ppc/kernel/misc.S from 
>> __dma_sync().  This OOPses on PPC440 since it tries to call directly 
>> the assembly instruction dcbi, which can only be executed in 
>> supervisor mode.  We tried that before resorting to manual cache line

>> management with usermode-safe assembly calls.
>>
>> Regards,
>> Tom Burns
>> International Datacasting Corporation
>>
>> Mikhail Zolotaryov wrote:
>>> Hi,
>>>
>>> Why manage cache lines  manually, if appropriate code is a part of 
>>> __dma_sync / dma_sync_single_for_device of DMA API ? (implies 
>>> CONFIG_NOT_COHERENT_CACHE enabled, as default for Sequoia Board)
>>>
>>> Prodyut Hazarika wrote:
>>>> Hi Adam,
>>>>
>>>>  
>>>>> Yes, I am using the 440EPx (same as the sequoia board). Our 
>>>>> ideDriver is DMA'ing blocks of 192-byte data over the PCI bus
>>>>>     
>>>> (using
>>>>  
>>>>> the Sil0680A PCI-IDE bridge). Most of the DMA's (depending on
timing)
>>>>> end up being partially corrupted when we try to parse the data in
the
>>>>> virtual page. We have confirmed the data is good before the
PCI-IDE
>>>>> bridge. We are creating two 8K pages and map them to physical DMA
>>>>>     
>>>> memory
>>>>  
>>>>> using single-entry scatter/gather structs. When a DMA block is
>>>>> corrupted, we see a random portion of it (always a multiple of
16byte
>>>>> cache lines) is overwritten with old data from the last time the
>>>>>     
>>>> buffer
>>>>  
>>>>> was used.     
>>>>
>>>> This looks like a cache coherency problem.
>>>> Can you ensure that the TLB entries corresponding to the DMA region

>>>> has
>>>> the CacheInhibit bit set.
>>>> You will need a BDI connected to your system.
>>>>
>>>> Also, you will need to invalidate and flush the lines
appropriately,
>>>> since in 440 cores,
>>>> L1Cache coherency is managed entirely by software.
>>>> Please look at drivers/net/ibm_newemac/mal.c and core.c for example
on
>>>> how to do it.
>>>>
>>>> Thanks
>>>> Prodyut
>>>>
>>>> On Thu, 2009-09-03 at 13:27 -0700, Prodyut Hazarika wrote:
>>>>  
>>>>> Hi Adam,
>>>>>
>>>>>  
>>>>>> Are you sure there is L2 cache on the 440?
>>>>>>       
>>>>> It depends on the SoC you are using. SoC like 460EX (Canyonlands
>>>>>     
>>>> board)
>>>>  
>>>>> have L2Cache.
>>>>> It seems you are using a Sequoia board, which has a 440EPx SoC. 
>>>>> 440EPx
>>>>> has a 440 cpu core, but no L2Cache.
>>>>> Could you please tell me which SoC you are using?
>>>>> You can also refer to the appropriate dts file to see if there is 
>>>>> L2C.
>>>>> For example, in canyonlands.dts (460EX based board), we have the
L2C
>>>>> entry.
>>>>>         L2C0: l2c {
>>>>>               ...
>>>>>         }
>>>>>
>>>>>  
>>>>>> I am seeing this problem with our custom IDE driver which is 
>>>>>> based on
>>>>>>       
>>>>
>>>>  
>>>>>> pretty old code. Our driver uses pci_alloc_consistent() to
allocate
>>>>>>       
>>>> the
>>>>  
>>>>>> physical DMA memory and alloc_pages() to allocate a virtual page.

>>>>>> It then uses pci_map_sg() to map to a scatter/gather buffer. 
>>>>>> Perhaps I should convert these to the DMA API calls as you
suggest.
>>>>>>       
>>>>> Could you give more details on the consistency problem? It is a
good
>>>>> idea to change to the new DMA APIs, but pci_alloc_consistent()
should
>>>>> work too
>>>>>
>>>>> Thanks
>>>>> Prodyut  On Thu, 2009-09-03 at 19:57 +1000, Benjamin Herrenschmidt

>>>>> wrote:
>>>>>  
>>>>>> On Thu, 2009-09-03 at 09:05 +0100, Chris Pringle wrote:
>>>>>>    
>>>>>>> Hi Adam,
>>>>>>>
>>>>>>> If you have a look in include/asm-ppc/pgtable.h for the
following
>>>>>>>         
>>>>> section:
>>>>>  
>>>>>>> #ifdef CONFIG_44x
>>>>>>> #define _PAGE_BASE    (_PAGE_PRESENT | _PAGE_ACCESSED |
>>>>>>>         
>>>>> _PAGE_GUARDED)
>>>>>  
>>>>>>> #else
>>>>>>> #define _PAGE_BASE    (_PAGE_PRESENT | _PAGE_ACCESSED)
>>>>>>> #endif
>>>>>>>
>>>>>>> Try adding _PAGE_COHERENT to the appropriate line above and see
if
>>>>>>>         
>>>>> that  
>>>>>>> fixes your issue - this causes the 'M' bit to be set on the page
>>>>>>>         
>>>>> which  
>>>>>>> sure enforce cache coherency. If it doesn't, you'll need to
check
>>>>>>>         
>>>>> the  
>>>>>>> 'M' bit isn't being masked out in head_44x.S (it was originally
>>>>>>>         
>>>>> masked  
>>>>>>> out on arch/powerpc, but was fixed in later kernels when the
cache
>>>>>>>         
>>>>
>>>>  
>>>>>>> coherency issues with non-SMP systems were resolved).
>>>>>>>         
>>>>>> I have some doubts about the usefulness of doing that for 4xx.
>>>>>>       
>>>> AFAIK,
>>>>  
>>>>>> the 440 core just ignores M.
>>>>>>
>>>>>> The problem lies probably elsewhere. Maybe the L2 cache coherency
>>>>>>       
>>>>> isn't
>>>>>  
>>>>>> enabled or not working ?
>>>>>>
>>>>>> The L1 cache on 440 is simply not coherent, so drivers have to
make
>>>>>>       
>>>>> sure
>>>>>  
>>>>>> they use the appropriate DMA APIs which will do cache flushing
when
>>>>>> needed.
>>>>>>
>>>>>> Adam, what driver is causing you that sort of problems ?
>>>>>>
>>>>>> Cheers,
>>>>>> Ben.
>>>>>>
>>>>>>
>>>>>>       
>>>
>>>
>>
>>
>
>
--------------------------------------------------------

CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to AppliedMicro Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


More information about the Linuxppc-dev mailing list