Freescale P2020 CPU Freeze over PCIe abort signal

Eran Liberty liberty at extricom.com
Wed Oct 20 03:53:58 EST 2010


Eran Liberty wrote:
> Eran Liberty wrote:
>> This should probably go to the Freescale support, as it feels like a 
>> hardware issue yet the end result is a very frozen Linux kernel so I 
>> post here first...
>>
>> I have a programmable FPGA PCIe device connected to a Freescale's 
>> P2020 PCIe port. As part of the bring-up tests, we are testing two 
>> faulty scenarios:
>> 1. The FPGA totally ignores the PCIe transaction.
>> 2. The FPGA return a transaction abort.
>>
>> Both are plausible PCIe behavior and their should be outcome is 
>> documented in the PCIe spec. The first should be terminated by the 
>> transaction requestor timeout mechanism and raise an error, the 
>> second should abort the transaction and raise and error.
>>
>> In P2020 if I do any of those the CPU is left hung over the transaction.
>>
>> something like:
>> in_le32(addr)
>>
>> is turned into:
>> 7c 00 04 ac     sync   7c 00 4c 2c     lwbrx   r0,0,r9
>> 0c 00 00 00     twi     0,r0,0
>> 4c 00 01 2c     isync
>>
>> assembly code, where in r9 (in this example) hold an address which is 
>> physically mapped into the PCIe resource space.
>>
>> The CPU will hang over the load instruction.
>>
>> Just for the fun of it, I have wrote my own assembly function 
>> omitting everything but the load instruction; still freeze.
>> Replace "lwbrx" with a simple "lwz"; still freeze.
>>
>> It looks like the CPU snoozes till the PCIe transaction is done with 
>> no timeouts, ignoring any abort signal.
>>
>> I am going to:
>> A. Try to reach the Freescale support.
>> B. Asked the FPGA designed to give me a new behavior that will stall 
>> the PCIe transaction replay for 10 sec, but after those return ok.
>> C. report back here with either A or B.
>>
>> If you have any ideas I would love to hear them.
>>
>> -- Liberty
>>
> Some more info:
>
> As said the the FPGA designer provided me a PCIe device that will 
> stall its response to a variable amount of time. The CPU became 
> un-frozen after this amount of time. More over, we have found that in 
> that period till it un-froze the PCIe core did a retry to that 
> transaction over and over every 40 ms. This gave me the bright idea to 
> look for the word "retry" in the Freescale documentation which 
> rewarded me with these registers:
>
> ------------------------------------------------------- snip 
> -------------------------------------------------------
> 16.3.2.3        PCI Express Outbound Completion Timeout Register
>                (PEX_OTB_CPL_TOR)
> The PCI Express outbound completion timeout register, shown in Figure 
> 16-4, contains the maximum wait
> time for a response to come back as a result of an outbound non-posted 
> request before a timeout condition
> occurs.
> Offset 
> 0x00C                                                                                                
> Access: Read/Write
>         0   1              5     7   
> 8                                                                                      
> 31
>     R
>        TD            
> —                                                            TC
>     W
> Reset 0     0  0  0   0   0   0  0   0   0   0   1    0  0   0  0    
> 1   1  1    1   1  1   1   1   1  1   1   1  1  1   1  1
>            Figure 16-4. PCI Express Outbound Completion Timeout 
> Register (PEX_OTB_CPL_TOR)
> Table 16-6 describes the PCI Express outbound completion timeout 
> register fields.
>                                 Table 16-6. PEX_OTB_CPL_TOR Field 
> Descriptions
>  Bits     Name                                                     
> Description
>   0        TD     Timeout disable. This bit controls the 
> enabling/disabling of the timeout function.
>                   0 Enable completion timeout
>                   1 Disable completion timeout
>  1–7        —     Reserved
> 8–31       TC     Timeout counter. This is the value that is used to 
> load the response counter of the completion timeout.
>                   One TC unit is 8× the PCI Express controller clock 
> period; that is, one TC unit is 20 ns at 400 MHz, and 30
>                   ns at 266.66 MHz.
>                   The following are examples of timeout periods based 
> on different TC settings:
>                   0x00_0000 Reserved
>                   0x10_FFFF 22.28 ms at 400 MHz controller clock; 
> 33.34 ms at 266.66 MHz controller clock
>                   0xFF_FFFF 335.54 ms at 400 MHz controller clock; 
> 503.31 ms at 266.66 MHz controller clock
>
>
> 16.3.2.4       PCI Express Configuration Retry Timeout Register
>               (PEX_CONF_RTY_TOR)
> The PCI Express configuration retry timeout register, shown in Figure 
> 16-5, contains the maximum time
> period during which retries of configuration transactions which 
> resulted in a CRS response occur.
> Offset 
> 0x010                                                                               
> Access: Read/Write
>         0  1     3   
> 4                                                                                     
> 31
>     R
>        RD     —                                                 TC
>     W
> Reset 0    0  0  0  0   1  0  0  0  0   0  0   0  0  0  0   1  1  1  
> 1  1  1   1   1 1 1   1  1   1 1   1  1
>           Figure 16-5. PCI Express Configuration Retry Timeout 
> Register (PEX_CONF_RTY_TOR)
>                            QorIQ P2020 Integrated Processor Reference 
> Manual, Rev. 0
> 16-12                                                                                   
> Freescale Semiconductor
>                                                                                                 
> PCI Express Interface Controller
> Table 16-7 describes the PCI Express configuration retry timeout 
> register fields.
>                            Table 16-7. PEX_CONF_RTY_TOR Field 
> Descriptions
>  Bits  Name                                                     
> Description
>   0     RD    Retry disable. This bit disables the retry of a 
> configuration transaction that receives a CRS status response
>               packet.
>               0 Enable retry of a configuration transaction in 
> response to receiving a CRS status response until the timeout
>                  counter (defined by the PEX_CONF_RTY_TOR[TC] field) 
> has expired.
>               1 Disable retry of a configuration transaction 
> regardless of receiving a CRS status response.
>  1–3     —    Reserved
> 4–31    TC    Timeout counter. This is the value that is used to load 
> the CRS response counter.
>               One TC unit is 8× the PCI Express controller clock 
> period; that is, one TC unit is 20 ns at 400 MHz and 30 ns
>               at 266.66 MHz.
>               Timeout period based on different TC settings:
>               0x000_0000        Reserved
>               0x400_FFFF        1.34 s at 400 MHz controller clock, 
> 2.02 s at 266.66 MHz controller clock
>               0xFFF_FFFF        5.37 s at 400 MHz controller clock, 
> 8.05 s at 266.66 MHz controller clock
> ------------------------------------------------------- snap 
> -------------------------------------------------------
>
> Now this is all nice on the paper, but what the P2020 seems to be 
> doing in reality is
> 1. never expire
> 2. do re-tries even in the non configuration access
>
> I am going to try to disable completion timeout and see if I get 
> better behavior.
>
> -- Liberty
>
>
Disabling PEX_OTB_CPL_TOR,  PEX_CONF_RTY_TOR, or both yields the same 
behavior. The kernel freezes over the load command while the underlying 
hardware does PCIe transaction retries to infinity and beyond.

-- Liberty


More information about the Linuxppc-dev mailing list