[PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()
Wei Yang
weiyang at linux.vnet.ibm.com
Tue Apr 29 19:37:20 EST 2014
On Tue, Apr 29, 2014 at 05:55:48PM +1000, Alexey Kardashevskiy wrote:
>On 04/29/2014 04:49 PM, Wei Yang wrote:
>> On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>>> On 04/23/2014 12:26 PM, Wei Yang wrote:
>>>> During the EEH hotplug event, iommu_add_device() will be invoked three times
>>>> and two of them will trigger warning or error.
>>>>
>>>> The three times to invoke the iommu_add_device() are:
>>>>
>>>> pci_device_add
>>>> ...
>>>> set_iommu_table_base_and_group <- 1st time, fail
>>>> device_add
>>>> ...
>>>> tce_iommu_bus_notifier <- 2nd time, succees
>>>> pcibios_add_pci_devices
>>>> ...
>>>> pcibios_setup_bus_devices <- 3rd time, re-attach
>>>>
>>>> The first time fails, since the dev->kobj->sd is not initialized. The
>>>> dev->kobj->sd is initialized in device_add().
>>>> The third time's warning is triggered by the re-attach of the iommu_group.
>>>>
>>>> After applying this patch, the error
>>>
>>> Nack.
>>>
>>> The patch still seems incorrect and we actually need to remove
>>> tce_iommu_bus_notifier completely as pcibios_setup_bus_devices is called
>>>from another notifier anyway. Could you please test it?
>>>
>>>
>>
>> Hi, Alexey,
>>
>> Nice to see your comment. Let me show what I got fist.
>>
>> Generally, when kernel enumerate on the pci device, following functions will
>> be invoked.
>>
>> pci_device_add
>> pcibios_setup_bus_device
>> ...
>> set_iommu_table_base_and_group
>> device_add
>> ...
>> tce_iommu_bus_notifier
>> pcibios_fixp_bus
>> pcibios_add_pci_devices
>> ...
>> pcibios_setup_bus_devices
>>
>>>From the call flow, we see for a normall pci device, the
>> pcibios_setup_bus_device() will be invoked twice.
>
>
>tce_iommu_bus_notifier should not be called for devices during boot-time
>PCI discovery as the discovery is done from subsys_initcall handler and
>tce_iommu_bus_notifier is registered as subsys_initcall_sync. Are you sure
>this is happening? You should see warnings as for PF's EEH but you do not
>say that.
>
Let me make it clearer. I list the call flow for general purpose to show the
sequence of the call flow.
The tce_iommu_bus_notifier is not invoked at the boot time. And none of them
do real thing at boot time.
I don't understand your last sentence. I see warning and error during EEH
hotplug. If necessary, I will add this in the commit log.
>
>> At the bootup time, none of them succeed to setup the dma, since the PE is not
>> assigned or the tbl is not set. The iommu tbl and group is setup in
>> pnv_pci_ioda_setup_DMA().
>>
>> This call flow maintains the same when EEH error happens on Bus PE, while the
>> behavior is a little different.
>>
>> pci_device_add
>> pcibios_setup_bus_device
>> ...
>> set_iommu_table_base_and_group <- fail, kobj->sd is not initialized
>
>
>Sorry, what is uninitialized? The only kobj here is the one in iommu_group
>and it must have been taken care of before adding a device.
pci_dev->dev->kobj->sd.
I have explained this in previous discussion. This kobj->sd is initialized in
device_add().
>
>
>> device_add
>> ...
>> tce_iommu_bus_notifier <- succeed
>> pcibios_fixp_bus
>> pcibios_add_pci_devices
>> ...
>> pcibios_setup_bus_devices <- warning, re-attach
>
>
>This is why I am suggesting getting rid of tce_iommu_bus_notifier - we will
>avoid the warning.
Then how about the first time's error?
>
>
>> While this call flow will change a little on a VF. For a VF,
>> pcibios_fixp_bus() will not be invoked. Current behavior is this.
>
>
>You meant pcibios_fixup_bus? I would expect it to get called (from
>pci_rescan_bus() or something like that?) and this seems to be the problem
>here.
When EEH happens and hotplug the pci bus, we need to do the pci_scan_bridge()
which will do the pcibios_fixup_bus().
So you suggest to remove it? This is the generic code.
>
>How are VFs added in terms of code flow? Is there any git tree to look at?
>
VF add code flow is a generic code in drivers/pci/iov.c, virtfn_add().
>
>
>> pci_device_add
>> pcibios_setup_bus_device
>> ...
>> set_iommu_table_base_and_group <- fail, kobj->sd is not initialized
>> device_add
>> ...
>> tce_iommu_bus_notifier <- succeed
>>
>> And if an EEH error happens just on a VF, I believe the same call flow should
>> go for recovery. (This is not set down, still under discussion with Gavin.)
>>
>> My conclusion is:
>> 1. remove the tce_iommu_bus_notifier completely will make the VF not work.
>> So I choose to revert the code and attach make the iommu group attachment
>> just happens in one place.
>>
>> BTW, I know my patch is not a perfect one. For a PF, the tbl will still be set
>> twice. I am not sure why we need to invoke pcibios_setup_device() twice on a
>> PF, maybe some platform need to fixup some thing after the pci_bus is added.
>> So I don't remove one of them to solve the problem.
>>
>> If you have a better idea, I am glad to take it.
>>
>
>
>--
>Alexey Kardashevskiy
>IBM OzLabs, LTC Team
>
>e-mail: aik at au1.ibm.com
>notes: Alexey Kardashevskiy/Australia/IBM
--
Richard Yang
Help you, Help me
More information about the Linuxppc-dev
mailing list