[PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()

Wei Yang weiyang at linux.vnet.ibm.com
Tue Apr 29 16:49:55 EST 2014


On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>On 04/23/2014 12:26 PM, Wei Yang wrote:
>> During the EEH hotplug event, iommu_add_device() will be invoked three times
>> and two of them will trigger warning or error.
>> 
>> The three times to invoke the iommu_add_device() are:
>> 
>>     pci_device_add
>>        ...
>>        set_iommu_table_base_and_group   <- 1st time, fail
>>     device_add
>>        ...
>>        tce_iommu_bus_notifier           <- 2nd time, succees
>>     pcibios_add_pci_devices
>>        ...
>>        pcibios_setup_bus_devices        <- 3rd time, re-attach
>> 
>> The first time fails, since the dev->kobj->sd is not initialized. The
>> dev->kobj->sd is initialized in device_add().
>> The third time's warning is triggered by the re-attach of the iommu_group.
>> 
>> After applying this patch, the error
>
>Nack.
>
>The patch still seems incorrect and we actually need to remove
>tce_iommu_bus_notifier completely as pcibios_setup_bus_devices is called
>from another notifier anyway. Could you please test it?
>
>

Hi, Alexey,

Nice to see your comment. Let me show what I got fist.

Generally, when kernel enumerate on the pci device, following functions will
be invoked.

     pci_device_add
        pcibios_setup_bus_device
        ...
        set_iommu_table_base_and_group   
     device_add
        ...
        tce_iommu_bus_notifier           
     pcibios_fixp_bus
        pcibios_add_pci_devices
        ...
        pcibios_setup_bus_devices        

>From the call flow, we see for a normall pci device, the
pcibios_setup_bus_device() will be invoked twice.

At the bootup time, none of them succeed to setup the dma, since the PE is not
assigned or the tbl is not set. The iommu tbl and group is setup in
pnv_pci_ioda_setup_DMA().

This call flow maintains the same when EEH error happens on Bus PE, while the
behavior is a little different. 

     pci_device_add
        pcibios_setup_bus_device
        ...
        set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
     device_add
        ...
        tce_iommu_bus_notifier           <- succeed
     pcibios_fixp_bus
        pcibios_add_pci_devices
        ...
        pcibios_setup_bus_devices        <- warning, re-attach

While this call flow will change a little on a VF. For a VF,
pcibios_fixp_bus() will not be invoked. Current behavior is this.

     pci_device_add
        pcibios_setup_bus_device
        ...
        set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
     device_add
        ...
        tce_iommu_bus_notifier           <- succeed

And if an EEH error happens just on a VF, I believe the same call flow should
go for recovery. (This is not set down, still under discussion with Gavin.)

My conclusion is:
1. remove the tce_iommu_bus_notifier completely will make the VF not work.
   So I choose to revert the code and attach make the iommu group attachment
   just happens in one place.

BTW, I know my patch is not a perfect one. For a PF, the tbl will still be set
twice. I am not sure why we need to invoke pcibios_setup_device() twice on a
PF, maybe some platform need to fixup some thing after the pci_bus is added.
So I don't remove one of them to solve the problem.

If you have a better idea, I am glad to take it.

-- 
Richard Yang
Help you, Help me



More information about the Linuxppc-dev mailing list