[PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device

Sat Oct 14 05:05:58 AEDT 2017

On Fri, 13 Oct 2017 07:01:48 -0500
Steven Royer <seroyer at linux.vnet.ibm.com> wrote:

> On 2017-10-13 06:53, Steven Royer wrote:
> > On 2017-10-12 22:34, Bjorn Helgaas wrote:  
> >> [+cc Alex, Bodong, Eli, Saeed]
> >> 
> >> On Thu, Oct 12, 2017 at 02:59:23PM -0500, Bryant G. Ly wrote:  
> >>> On 10/12/17 1:29 PM, Bjorn Helgaas wrote:  
> >>> >On Thu, Oct 12, 2017 at 03:09:53PM +1100, Michael Ellerman wrote:  
> >>> >>Bjorn Helgaas <helgaas at kernel.org> writes:
> >>> >>  
> >>> >>>On Fri, Sep 22, 2017 at 09:19:28AM -0500, Bryant G. Ly wrote:  
> >>> >>>>This patch adds the machine dependent call for
> >>> >>>>pcibios_bus_add_device, since the previous patch
> >>> >>>>separated the calls out between the PowerNV and PowerVM.
> >>> >>>>
> >>> >>>>The difference here is that for the PowerVM environment
> >>> >>>>we do not want match_driver set because in this environment
> >>> >>>>we do not want the VF device drivers to load immediately, due to
> >>> >>>>firmware loading the device node when VF device is assigned to the
> >>> >>>>logical partition.
> >>> >>>>
> >>> >>>>This patch will depend on the patch linked below, which is under
> >>> >>>>review.
> >>> >>>>
> >>> >>>>https://patchwork.kernel.org/patch/9882915/
> >>> >>>>
> >>> >>>>Signed-off-by: Bryant G. Ly <bryantly at linux.vnet.ibm.com>
> >>> >>>>Signed-off-by: Juan J. Alvarez <jjalvare at us.ibm.com>
> >>> >>>>---
> >>> >>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 24 ++++++++++++++++++++++++
> >>> >>>>  1 file changed, 24 insertions(+)
> >>> >>>>
> >>> >>>>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>index 6b812ad990e4..45946ee90985 100644
> >>> >>>>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>@@ -64,6 +64,27 @@ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
> >>> >>>>  static DEFINE_SPINLOCK(slot_errbuf_lock);
> >>> >>>>  static int eeh_error_buf_size;
> >>> >>>>+void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
> >>> >>>>+{
> >>> >>>>+	struct pci_dn *pdn = pci_get_pdn(pdev);
> >>> >>>>+
> >>> >>>>+	if (!pdev->is_virtfn)
> >>> >>>>+		return;
> >>> >>>>+
> >>> >>>>+	pdn->device_id  =  pdev->device;
> >>> >>>>+	pdn->vendor_id  =  pdev->vendor;
> >>> >>>>+	pdn->class_code =  pdev->class;
> >>> >>>>+
> >>> >>>>+	/*
> >>> >>>>+	 * The following operations will fail if VF's sysfs files
> >>> >>>>+	 * aren't created or its resources aren't finalized.
> >>> >>>>+	 */
> >>> >>>>+	eeh_add_device_early(pdn);
> >>> >>>>+	eeh_add_device_late(pdev);
> >>> >>>>+	eeh_sysfs_add_device(pdev);
> >>> >>>>+	pdev->match_driver = -1;  
> >>> >>>match_driver is a bool, which should be assigned "true" or "false".  
> >>> >>Above he mentioned a dependency on:
> >>> >>
> >>> >>   [04/10] PCI: extend pci device match_driver state
> >>> >>   https://patchwork.kernel.org/patch/9882915/
> >>> >>
> >>> >>
> >>> >>Which makes it an int.  
> >>> >Oh, right, I missed that, thanks.
> >>> >  
> >>> >>Or has that patch been rejected or something?  
> >>> >I haven't *rejected* it, but it's low on my priority list, so you
> >>> >shouldn't depend on it unless it adds functionality you really need.
> >>> >If I did apply that particular patch, I would want some rework because
> >>> >it currently obfuscates the match_driver logic.  There's no clue when
> >>> >reading the code what -1/0/1 mean.  
> >>> So do you prefer enum's? - If so I can make a change for that.  
> >>> >Apparently here you *do* want the "-1 means the PCI core will never
> >>> >set match_driver to 1" functionality, so maybe you do depend on it.  
> >>> We depend on the patch because we want that ability to never set
> >>> match_driver,
> >>> for SRIOV on PowerVM.  
> >> 
> >> Is this really new PowerVM-specific functionality?  ISTR recent 
> >> discussions
> >> about inhibiting driver binding in a generic way, e.g.,
> >> http://lkml.kernel.org/r/1490022874-54718-1-git-send-email-bodong@mellanox.com
> >>   
> >>> >If that's the case, how to you ever bind a driver to these VFs?  The
> >>> >changelog says you don't want VF drivers to load *immediately*, so I
> >>> >assume you do want them to load eventually.
> >>> >  
> >>> The VF's that get dynamically created within the configure SR-IOV
> >>> call, on the Pseries Platform, wont be matched with a driver. - We
> >>> do not want it to match.
> >>> 
> >>> The Power Hypervisor will load the VFs. The VF's will get
> >>> assigned(by the user) via the HMC or Novalink in this environment
> >>> which will then trigger PHYP to load the VF device node to the
> >>> device tree.  
> >> 
> >> I don't know what it means for the Hypervisor to "load the VFs."  Can
> >> you explain that in PCI-speak?
> >> 
> >> The things I know about are:
> >> 
> >>   - we set PCI_SRIOV_CTRL_VFE in the PF, which enables VFs
> >>   - now the VFs respond to config accesses
> >>   - the PCI core enumerates the VFs by reading their config space
> >>   - the PCI core builds pci_dev structs for the VFs
> >>   - the PCI core adds these pci_devs to the bus
> >>   - we try to bind drivers to the VFs
> >>   - the VF driver probe function may read VF config space and VF BARs
> >>   - the VF may be assigned to a guest VM
> >> 
> >> Where does "loading the VFs" fit in?  I don't know what HMC, Novalink,
> >> or PHYP are.  I don't *need* to know what they are, as long as you can
> >> explain what's happening in terms of the PCI concepts and generic 
> >> Linux VMs
> >> and device assignment.
> >> 
> >> Bjorn  
> > 
> > The VFs will be hotplugged into the VM separately from the enable
> > SR-IOV, so the driver will load as part of the hotplug operation.
> > 
> > Steve  
> 
> One more point of clarification: when the hotplug happens, the VF will 
> show up on a virtual PCI bus that is not directly correlated to the real 
> PCI bus that the PF is on.  On that virtual PCI bus, the driver will 
> match because it won't be set to -1.

I'm pretty lost too, but I think what's being said is that the
paravirtualized SR-IOV enable creates VFs according to the SR-IOV
offset and stride capabilities of the PF, but we're supposed to ignore
those (why are we even creating pci_devs for them?) and the hypervisor
will actually hotplug the VFs somewhere else.  How's that still
SR-IOV?  Why wouldn't the hypervisor just add the real VFs that we're
supposed to use at the offset and stride indicated by the PF SR-IOV
capability and mask the VFs that we're not supposed to see?  Thanks,

Alex