[PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

Bryant G. Ly bryantly at linux.vnet.ibm.com
Sat Oct 14 06:12:32 AEDT 2017



On 10/13/17 1:05 PM, Alex Williamson wrote:
> On Fri, 13 Oct 2017 07:01:48 -0500
> Steven Royer <seroyer at linux.vnet.ibm.com> wrote:
>
>> On 2017-10-13 06:53, Steven Royer wrote:
>>> On 2017-10-12 22:34, Bjorn Helgaas wrote:
>>>> [+cc Alex, Bodong, Eli, Saeed]
>>>>
>>>> On Thu, Oct 12, 2017 at 02:59:23PM -0500, Bryant G. Ly wrote:
>>>>> On 10/12/17 1:29 PM, Bjorn Helgaas wrote:
>>>>>> On Thu, Oct 12, 2017 at 03:09:53PM +1100, Michael Ellerman wrote:
>>>>>>> Bjorn Helgaas <helgaas at kernel.org> writes:
>>>>>>>   
>>>>>>>> On Fri, Sep 22, 2017 at 09:19:28AM -0500, Bryant G. Ly wrote:
>>>>>>>>> This patch adds the machine dependent call for
>>>>>>>>> pcibios_bus_add_device, since the previous patch
>>>>>>>>> separated the calls out between the PowerNV and PowerVM.
>>>>>>>>>
>>>>>>>>> The difference here is that for the PowerVM environment
>>>>>>>>> we do not want match_driver set because in this environment
>>>>>>>>> we do not want the VF device drivers to load immediately, due to
>>>>>>>>> firmware loading the device node when VF device is assigned to the
>>>>>>>>> logical partition.
>>>>>>>>>
>>>>>>>>> This patch will depend on the patch linked below, which is under
>>>>>>>>> review.
>>>>>>>>>
>>>>>>>>> https://patchwork.kernel.org/patch/9882915/
>>>>>>>>>
>>>>>>>>> Signed-off-by: Bryant G. Ly <bryantly at linux.vnet.ibm.com>
>>>>>>>>> Signed-off-by: Juan J. Alvarez <jjalvare at us.ibm.com>
>>>>>>>>> ---
>>>>>>>>>   arch/powerpc/platforms/pseries/eeh_pseries.c | 24 ++++++++++++++++++++++++
>>>>>>>>>   1 file changed, 24 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>>>>> index 6b812ad990e4..45946ee90985 100644
>>>>>>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>>>>> @@ -64,6 +64,27 @@ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
>>>>>>>>>   static DEFINE_SPINLOCK(slot_errbuf_lock);
>>>>>>>>>   static int eeh_error_buf_size;
>>>>>>>>> +void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
>>>>>>>>> +{
>>>>>>>>> +	struct pci_dn *pdn = pci_get_pdn(pdev);
>>>>>>>>> +
>>>>>>>>> +	if (!pdev->is_virtfn)
>>>>>>>>> +		return;
>>>>>>>>> +
>>>>>>>>> +	pdn->device_id  =  pdev->device;
>>>>>>>>> +	pdn->vendor_id  =  pdev->vendor;
>>>>>>>>> +	pdn->class_code =  pdev->class;
>>>>>>>>> +
>>>>>>>>> +	/*
>>>>>>>>> +	 * The following operations will fail if VF's sysfs files
>>>>>>>>> +	 * aren't created or its resources aren't finalized.
>>>>>>>>> +	 */
>>>>>>>>> +	eeh_add_device_early(pdn);
>>>>>>>>> +	eeh_add_device_late(pdev);
>>>>>>>>> +	eeh_sysfs_add_device(pdev);
>>>>>>>>> +	pdev->match_driver = -1;
>>>>>>>> match_driver is a bool, which should be assigned "true" or "false".
>>>>>>> Above he mentioned a dependency on:
>>>>>>>
>>>>>>>    [04/10] PCI: extend pci device match_driver state
>>>>>>>    https://patchwork.kernel.org/patch/9882915/
>>>>>>>
>>>>>>>
>>>>>>> Which makes it an int.
>>>>>> Oh, right, I missed that, thanks.
>>>>>>   
>>>>>>> Or has that patch been rejected or something?
>>>>>> I haven't *rejected* it, but it's low on my priority list, so you
>>>>>> shouldn't depend on it unless it adds functionality you really need.
>>>>>> If I did apply that particular patch, I would want some rework because
>>>>>> it currently obfuscates the match_driver logic.  There's no clue when
>>>>>> reading the code what -1/0/1 mean.
>>>>> So do you prefer enum's? - If so I can make a change for that.
>>>>>> Apparently here you *do* want the "-1 means the PCI core will never
>>>>>> set match_driver to 1" functionality, so maybe you do depend on it.
>>>>> We depend on the patch because we want that ability to never set
>>>>> match_driver,
>>>>> for SRIOV on PowerVM.
>>>> Is this really new PowerVM-specific functionality?  ISTR recent
>>>> discussions
>>>> about inhibiting driver binding in a generic way, e.g.,
>>>> http://lkml.kernel.org/r/1490022874-54718-1-git-send-email-bodong@mellanox.com
>>>>    
>>>>>> If that's the case, how to you ever bind a driver to these VFs?  The
>>>>>> changelog says you don't want VF drivers to load *immediately*, so I
>>>>>> assume you do want them to load eventually.
>>>>>>   
>>>>> The VF's that get dynamically created within the configure SR-IOV
>>>>> call, on the Pseries Platform, wont be matched with a driver. - We
>>>>> do not want it to match.
>>>>>
>>>>> The Power Hypervisor will load the VFs. The VF's will get
>>>>> assigned(by the user) via the HMC or Novalink in this environment
>>>>> which will then trigger PHYP to load the VF device node to the
>>>>> device tree.
>>>> I don't know what it means for the Hypervisor to "load the VFs."  Can
>>>> you explain that in PCI-speak?
>>>>
>>>> The things I know about are:
>>>>
>>>>    - we set PCI_SRIOV_CTRL_VFE in the PF, which enables VFs
>>>>    - now the VFs respond to config accesses
>>>>    - the PCI core enumerates the VFs by reading their config space
>>>>    - the PCI core builds pci_dev structs for the VFs
>>>>    - the PCI core adds these pci_devs to the bus
>>>>    - we try to bind drivers to the VFs
>>>>    - the VF driver probe function may read VF config space and VF BARs
>>>>    - the VF may be assigned to a guest VM
>>>>
>>>> Where does "loading the VFs" fit in?  I don't know what HMC, Novalink,
>>>> or PHYP are.  I don't *need* to know what they are, as long as you can
>>>> explain what's happening in terms of the PCI concepts and generic
>>>> Linux VMs
>>>> and device assignment.
>>>>
>>>> Bjorn
>>> The VFs will be hotplugged into the VM separately from the enable
>>> SR-IOV, so the driver will load as part of the hotplug operation.
>>>
>>> Steve
>> One more point of clarification: when the hotplug happens, the VF will
>> show up on a virtual PCI bus that is not directly correlated to the real
>> PCI bus that the PF is on.  On that virtual PCI bus, the driver will
>> match because it won't be set to -1.
So lets refer to Bjorn's list of things for SRIOV.

   - we set PCI_SRIOV_CTRL_VFE in the PF, which enables VFs
   - now the VFs respond to config accesses
   - the PCI core enumerates the VFs by reading their config space
   - the PCI core builds pci_dev structs for the VFs
   - the PCI core adds these pci_devs to the bus

So everything is the same up to here.
   - we try to bind drivers to the VFs
   - the VF driver probe function may read VF config space and VF BARs
   - the VF may be assigned to a guest VM

PowerVM environment is very different than traditional KVM in terms of SRIOV.
In our environment the VFs are not usable or view-able by the Hosting Partition
in this case Linux. This is a very important point in that the Host CAN NOT
do anything to any of the VFs available.

So like existing way of enabling SRIOV we still rely on the PF driver to
enable VFs - but in this case the attachment phase is done via a user
action via a management console in our case (novalink or hmc) triggered
event that will essentially act like a hotplug.

So in the fine details of that user triggered action the system firmware
will bind the VFs, allowing resources to be allocated to the VF.
- Which essentially does all the attaching as we know it today but is
managed by PHYP not by the kernel.
> I'm pretty lost too, but I think what's being said is that the
> paravirtualized SR-IOV enable creates VFs according to the SR-IOV
> offset and stride capabilities of the PF, but we're supposed to ignore
> those (why are we even creating pci_devs for them?) and the hypervisor
> will actually hotplug the VFs somewhere else.  How's that still
> SR-IOV?  Why wouldn't the hypervisor just add the real VFs that we're
> supposed to use at the offset and stride indicated by the PF SR-IOV
> capability and mask the VFs that we're not supposed to see?  Thanks,
>
> Alex
>
In our current environment (PSeries) we still need to manage the VFs say
when doing error recovery. Therefore, we need pci_devs to be mapped
to system resources in this case a PE (Partitionable Endpoint).
There are more patches that will map the system resources to a pci_dn
structure. This patch was merely the start of separating PowerVM
and PowerNV configure sriov call.

-Bryant



More information about the Linuxppc-dev mailing list