[PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()

Wei Yang weiyang at linux.vnet.ibm.com
Wed Apr 30 11:31:04 EST 2014


On Tue, Apr 29, 2014 at 11:11:05PM +1000, Alexey Kardashevskiy wrote:
>On 04/29/2014 07:37 PM, Wei Yang wrote:
>> On Tue, Apr 29, 2014 at 05:55:48PM +1000, Alexey Kardashevskiy wrote:
>>> On 04/29/2014 04:49 PM, Wei Yang wrote:
>>>> On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 04/23/2014 12:26 PM, Wei Yang wrote:
>>>>>> During the EEH hotplug event, iommu_add_device() will be invoked three times
>>>>>> and two of them will trigger warning or error.
>>>>>>
>>>>>> The three times to invoke the iommu_add_device() are:
>>>>>>
>>>>>>     pci_device_add
>>>>>>        ...
>>>>>>        set_iommu_table_base_and_group   <- 1st time, fail
>>>>>>     device_add
>>>>>>        ...
>>>>>>        tce_iommu_bus_notifier           <- 2nd time, succees
>>>>>>     pcibios_add_pci_devices
>>>>>>        ...
>>>>>>        pcibios_setup_bus_devices        <- 3rd time, re-attach
>>>>>>
>>>>>> The first time fails, since the dev->kobj->sd is not initialized. The
>>>>>> dev->kobj->sd is initialized in device_add().
>>>>>> The third time's warning is triggered by the re-attach of the iommu_group.
>>>>>>
>>>>>> After applying this patch, the error
>>>>>
>>>>> Nack.
>>>>>
>>>>> The patch still seems incorrect and we actually need to remove
>>>>> tce_iommu_bus_notifier completely as pcibios_setup_bus_devices is called
>>>> >from another notifier anyway. Could you please test it?
>>>>>
>>>>>
>>>>
>>>> Hi, Alexey,
>>>>
>>>> Nice to see your comment. Let me show what I got fist.
>>>>
>>>> Generally, when kernel enumerate on the pci device, following functions will
>>>> be invoked.
>>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           
>>>>      pcibios_fixp_bus
>>>>         pcibios_add_pci_devices
>>>>         ...
>>>>         pcibios_setup_bus_devices        
>>>>
>>>> >From the call flow, we see for a normall pci device, the
>>>> pcibios_setup_bus_device() will be invoked twice.
>>>
>>>
>>> tce_iommu_bus_notifier should not be called for devices during boot-time
>>> PCI discovery as the discovery is done from subsys_initcall handler and
>>> tce_iommu_bus_notifier is registered as subsys_initcall_sync. Are you sure
>>> this is happening? You should see warnings as for PF's EEH but you do not
>>> say that.
>>>
>> 
>> Let me make it clearer. I list the call flow for general purpose to show the
>> sequence of the call flow.
>> 
>> The tce_iommu_bus_notifier is not invoked at the boot time. And none of them
>> do real thing at boot time.
>> 
>> I don't understand your last sentence. I see warning and error during EEH
>> hotplug. If necessary, I will add this in the commit log.
>> 
>>>
>>>> At the bootup time, none of them succeed to setup the dma, since the PE is not
>>>> assigned or the tbl is not set. The iommu tbl and group is setup in
>>>> pnv_pci_ioda_setup_DMA().
>>>>
>>>> This call flow maintains the same when EEH error happens on Bus PE, while the
>>>> behavior is a little different. 
>>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>>>
>>>
>>> Sorry, what is uninitialized? The only kobj here is the one in iommu_group
>>> and it must have been taken care of before adding a device.
>> 
>> pci_dev->dev->kobj->sd.
>> 
>> I have explained this in previous discussion. This kobj->sd is initialized in
>> device_add().
>> 
>>>
>>>
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           <- succeed
>>>>      pcibios_fixp_bus
>>>>         pcibios_add_pci_devices
>>>>         ...
>>>>         pcibios_setup_bus_devices        <- warning, re-attach
>>>
>>>
>>> This is why I am suggesting getting rid of tce_iommu_bus_notifier - we will
>>> avoid the warning.
>> 
>> Then how about the first time's error?
>
>
>What do you call "first time error" here?
>

Would you please take a look at my commit log?

Currently, the iommu_group will be added three times. The error happens at the
first time we try to attatch the iommu_group in pci_device_add(). And yes,
this happens at the EEH recovery on PF for a hotplug case.

The error is: 

        iommu_tce: 0003:05:00.0 has not been added, ret=-14

And the reason is:
  
        pci_dev->dev->kobj->sd is not initialized.

>
>>>> While this call flow will change a little on a VF. For a VF,
>>>> pcibios_fixp_bus() will not be invoked. Current behavior is this.
>>>
>>>
>>> You meant pcibios_fixup_bus? I would expect it to get called (from
>>> pci_rescan_bus() or something like that?) and this seems to be the problem
>>> here.
>> 
>> When EEH happens and hotplug the pci bus, we need to do the pci_scan_bridge()
>> which will do the pcibios_fixup_bus().
>
>Are you talking now about EEH on PF (physical function) or EEH on VF
>(virtual function)?
>
>Are you calling pci_scan_bridge()
>

Yes, it is pcibios_add_pci_devices() do the hotplug and invoke the
pci_scan_bridge().

>
>> So you suggest to remove it? This is the generic code.
>
>
>I suggest removing tce_iommu_bus_notifier only. It must go. It was my
>mistake at the first place.
>
>
>>> How are VFs added in terms of code flow? Is there any git tree to look at?
>>>
>> 
>> VF add code flow is a generic code in drivers/pci/iov.c, virtfn_add().
>
>virtfn_add -> pci_device_add -> pcibios_add_device -> pcibios_setup_device
>-> ppc_md.pci_dma_dev_setup -> pnv_pci_dma_dev_setup ->
>pnv_pci_ioda_dma_dev_setup ->
>set_iommu_table_base.
>
>
>What part of this is missing?
>

Currently, you will do set_iommu_table_base_and_group() in
pnv_pci_ioda_dma_dev_setup(). And this will fail and return an error.

The error reason is the kobj->sd is not initialized yet.
I hope it is clear this time.

>
>> 
>>>
>>>
>>>>      pci_device_add
>>>>         pcibios_setup_bus_device
>>>>         ...
>>>>         set_iommu_table_base_and_group   <- fail, kobj->sd is not initialized
>>>>      device_add
>>>>         ...
>>>>         tce_iommu_bus_notifier           <- succeed
>>>>
>>>> And if an EEH error happens just on a VF, I believe the same call flow should
>>>> go for recovery. (This is not set down, still under discussion with Gavin.)
>>>>
>>>> My conclusion is:
>>>> 1. remove the tce_iommu_bus_notifier completely will make the VF not work.
>>>>    So I choose to revert the code and attach make the iommu group attachment
>>>>    just happens in one place.
>>>>
>>>> BTW, I know my patch is not a perfect one. For a PF, the tbl will still be set
>>>> twice. I am not sure why we need to invoke pcibios_setup_device() twice on a
>>>> PF, maybe some platform need to fixup some thing after the pci_bus is added.
>>>> So I don't remove one of them to solve the problem.
>>>>
>>>> If you have a better idea, I am glad to take it.
>
>
>
>-- 
>Alexey

-- 
Richard Yang
Help you, Help me



More information about the Linuxppc-dev mailing list