[PATCH v3 2/2] pseries/eeh: Add Pseries pcibios_bus_add_device
Alex Williamson
alex.williamson at redhat.com
Sat Oct 14 05:05:58 AEDT 2017
On Fri, 13 Oct 2017 07:01:48 -0500
Steven Royer <seroyer at linux.vnet.ibm.com> wrote:
> On 2017-10-13 06:53, Steven Royer wrote:
> > On 2017-10-12 22:34, Bjorn Helgaas wrote:
> >> [+cc Alex, Bodong, Eli, Saeed]
> >>
> >> On Thu, Oct 12, 2017 at 02:59:23PM -0500, Bryant G. Ly wrote:
> >>> On 10/12/17 1:29 PM, Bjorn Helgaas wrote:
> >>> >On Thu, Oct 12, 2017 at 03:09:53PM +1100, Michael Ellerman wrote:
> >>> >>Bjorn Helgaas <helgaas at kernel.org> writes:
> >>> >>
> >>> >>>On Fri, Sep 22, 2017 at 09:19:28AM -0500, Bryant G. Ly wrote:
> >>> >>>>This patch adds the machine dependent call for
> >>> >>>>pcibios_bus_add_device, since the previous patch
> >>> >>>>separated the calls out between the PowerNV and PowerVM.
> >>> >>>>
> >>> >>>>The difference here is that for the PowerVM environment
> >>> >>>>we do not want match_driver set because in this environment
> >>> >>>>we do not want the VF device drivers to load immediately, due to
> >>> >>>>firmware loading the device node when VF device is assigned to the
> >>> >>>>logical partition.
> >>> >>>>
> >>> >>>>This patch will depend on the patch linked below, which is under
> >>> >>>>review.
> >>> >>>>
> >>> >>>>https://patchwork.kernel.org/patch/9882915/
> >>> >>>>
> >>> >>>>Signed-off-by: Bryant G. Ly <bryantly at linux.vnet.ibm.com>
> >>> >>>>Signed-off-by: Juan J. Alvarez <jjalvare at us.ibm.com>
> >>> >>>>---
> >>> >>>> arch/powerpc/platforms/pseries/eeh_pseries.c | 24 ++++++++++++++++++++++++
> >>> >>>> 1 file changed, 24 insertions(+)
> >>> >>>>
> >>> >>>>diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>index 6b812ad990e4..45946ee90985 100644
> >>> >>>>--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>> >>>>@@ -64,6 +64,27 @@ static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
> >>> >>>> static DEFINE_SPINLOCK(slot_errbuf_lock);
> >>> >>>> static int eeh_error_buf_size;
> >>> >>>>+void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
> >>> >>>>+{
> >>> >>>>+ struct pci_dn *pdn = pci_get_pdn(pdev);
> >>> >>>>+
> >>> >>>>+ if (!pdev->is_virtfn)
> >>> >>>>+ return;
> >>> >>>>+
> >>> >>>>+ pdn->device_id = pdev->device;
> >>> >>>>+ pdn->vendor_id = pdev->vendor;
> >>> >>>>+ pdn->class_code = pdev->class;
> >>> >>>>+
> >>> >>>>+ /*
> >>> >>>>+ * The following operations will fail if VF's sysfs files
> >>> >>>>+ * aren't created or its resources aren't finalized.
> >>> >>>>+ */
> >>> >>>>+ eeh_add_device_early(pdn);
> >>> >>>>+ eeh_add_device_late(pdev);
> >>> >>>>+ eeh_sysfs_add_device(pdev);
> >>> >>>>+ pdev->match_driver = -1;
> >>> >>>match_driver is a bool, which should be assigned "true" or "false".
> >>> >>Above he mentioned a dependency on:
> >>> >>
> >>> >> [04/10] PCI: extend pci device match_driver state
> >>> >> https://patchwork.kernel.org/patch/9882915/
> >>> >>
> >>> >>
> >>> >>Which makes it an int.
> >>> >Oh, right, I missed that, thanks.
> >>> >
> >>> >>Or has that patch been rejected or something?
> >>> >I haven't *rejected* it, but it's low on my priority list, so you
> >>> >shouldn't depend on it unless it adds functionality you really need.
> >>> >If I did apply that particular patch, I would want some rework because
> >>> >it currently obfuscates the match_driver logic. There's no clue when
> >>> >reading the code what -1/0/1 mean.
> >>> So do you prefer enum's? - If so I can make a change for that.
> >>> >Apparently here you *do* want the "-1 means the PCI core will never
> >>> >set match_driver to 1" functionality, so maybe you do depend on it.
> >>> We depend on the patch because we want that ability to never set
> >>> match_driver,
> >>> for SRIOV on PowerVM.
> >>
> >> Is this really new PowerVM-specific functionality? ISTR recent
> >> discussions
> >> about inhibiting driver binding in a generic way, e.g.,
> >> http://lkml.kernel.org/r/1490022874-54718-1-git-send-email-bodong@mellanox.com
> >>
> >>> >If that's the case, how to you ever bind a driver to these VFs? The
> >>> >changelog says you don't want VF drivers to load *immediately*, so I
> >>> >assume you do want them to load eventually.
> >>> >
> >>> The VF's that get dynamically created within the configure SR-IOV
> >>> call, on the Pseries Platform, wont be matched with a driver. - We
> >>> do not want it to match.
> >>>
> >>> The Power Hypervisor will load the VFs. The VF's will get
> >>> assigned(by the user) via the HMC or Novalink in this environment
> >>> which will then trigger PHYP to load the VF device node to the
> >>> device tree.
> >>
> >> I don't know what it means for the Hypervisor to "load the VFs." Can
> >> you explain that in PCI-speak?
> >>
> >> The things I know about are:
> >>
> >> - we set PCI_SRIOV_CTRL_VFE in the PF, which enables VFs
> >> - now the VFs respond to config accesses
> >> - the PCI core enumerates the VFs by reading their config space
> >> - the PCI core builds pci_dev structs for the VFs
> >> - the PCI core adds these pci_devs to the bus
> >> - we try to bind drivers to the VFs
> >> - the VF driver probe function may read VF config space and VF BARs
> >> - the VF may be assigned to a guest VM
> >>
> >> Where does "loading the VFs" fit in? I don't know what HMC, Novalink,
> >> or PHYP are. I don't *need* to know what they are, as long as you can
> >> explain what's happening in terms of the PCI concepts and generic
> >> Linux VMs
> >> and device assignment.
> >>
> >> Bjorn
> >
> > The VFs will be hotplugged into the VM separately from the enable
> > SR-IOV, so the driver will load as part of the hotplug operation.
> >
> > Steve
>
> One more point of clarification: when the hotplug happens, the VF will
> show up on a virtual PCI bus that is not directly correlated to the real
> PCI bus that the PF is on. On that virtual PCI bus, the driver will
> match because it won't be set to -1.
I'm pretty lost too, but I think what's being said is that the
paravirtualized SR-IOV enable creates VFs according to the SR-IOV
offset and stride capabilities of the PF, but we're supposed to ignore
those (why are we even creating pci_devs for them?) and the hypervisor
will actually hotplug the VFs somewhere else. How's that still
SR-IOV? Why wouldn't the hypervisor just add the real VFs that we're
supposed to use at the offset and stride indicated by the PF SR-IOV
capability and mask the VFs that we're not supposed to see? Thanks,
Alex
More information about the Linuxppc-dev
mailing list