[PATCH v3 1/2] PCI: One more parameter to pci_set_pcie_reset_state()

Alex Williamson alex.williamson at redhat.com
Tue Mar 24 08:08:01 AEDT 2015


On Mon, 2015-03-23 at 17:07 -0300, cascardo at linux.vnet.ibm.com wrote:
> On Mon, Mar 23, 2015 at 06:40:34AM -0600, Alex Williamson wrote:
> > On Mon, 2015-03-23 at 15:40 +1100, Gavin Shan wrote:
> > > On Sun, Mar 22, 2015 at 10:06:01PM -0600, Alex Williamson wrote:
> > > >On Mon, 2015-03-23 at 14:56 +1100, Gavin Shan wrote:
> > > >> On Sun, Mar 22, 2015 at 09:34:33PM -0600, Alex Williamson wrote:
> > > >> >On Mon, 2015-03-23 at 14:02 +1100, Gavin Shan wrote:
> > > >> >> The patch adds one more parameter ("probe") to pci_set_pcie_reset_state(),
> > > >> >> which allows to check if one particular PCI device can be resetted by the
> > > >> >> function. The function will be reused to support PCI device specific methods
> > > >> >> maintained in pci_dev_reset_methods[] in subsequent patch.
> > > >> >> 
> > > >> >> Cc: Brian King <brking at us.ibm.com>
> > > >> >> Cc: Frank Haverkamp <haver at linux.vnet.ibm.com>
> > > >> >> Cc: Ian Munsie <imunsie at au1.ibm.com>
> > > >> >> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
> > > >> >> ---
> > > >> >> v3: Fix arguments of pci_set_pcie_reset_state() in cxl driver
> > > >> >> v2: Reimplemented based on pci_set_pcie_reset_state()
> > > >> >> ---
> > > >> >>  arch/powerpc/kernel/eeh.c       | 14 ++++++++++----
> > > >> >>  drivers/misc/cxl/pci.c          |  2 +-
> > > >> >>  drivers/misc/genwqe/card_base.c |  9 +++++++--
> > > >> >>  drivers/pci/pci.c               | 15 +++++++++------
> > > >> >>  drivers/scsi/ipr.c              |  5 +++--
> > > >> >>  include/linux/pci.h             |  5 +++--
> > > >> >>  6 files changed, 33 insertions(+), 17 deletions(-)
> > > >> >
> > > >> >
> > > >> >Argh, you're trying to make pci_set_pcie_reset_state() compatible with
> > > >> >pci_dev_specific_reset(), so it can be called via pci_reset_function(),
> > > >> >but the whole point of the pci_reset_function() interface is to reset a
> > > >> >*single* function without disturbing anything else.  These patches make
> > > >> >no effort at all to limit the set of affected devices to a single
> > > >> >function and take great liberties using PCI_ANY_ID for vendors.  My take
> > > >> >on the powerpc version of pcibios_set_pcie_reset_state() is that it's
> > > >> >effectively a slot reset, so why not hook into the
> > > >> >pci_reset_hotplug_slot() via the hotplug slot ops?  Please also use
> > > >> >cover letters.  Thanks,
> > > >> >
> > > >> 
> > > >> Yep, that's the point and intention of the patches. pcibios_set_pcie_reset_state()
> > > >> isn't equal to pci_reset_hotplug_slot(). The later one depends on PCI slot, which
> > > >> wasn't populated on PowerNV platform yet, but pcibios_set_pcie_reseet_state() doesn't.
> > > >> 
> > > >> The patchset depends on improved pcibios_set_pcie_reset_state(), which can be seen
> > > >> from following linked. With it, we don't affect any PCI devices as the config space
> > > >> is saved/restored accordingly before/after reset:
> > > >> 
> > > >> https://patchwork.ozlabs.org/patch/438598/
> > > >
> > > >Sorry, that's wrong.  pci_reset_function() can be called while other
> > > >devices in the same multifunction package are actively in use.  It
> > > >doesn't matter that you're saving and restoring the external state of
> > > >the device, the internal state is lost and operation of the device is
> > > >interrupted.  That is not how pci_reset_function() is supposed to work.
> > > >Thanks,
> > > >
> > > 
> > > pcibios_set_pcie_reset_state() applies hot reset on PCI bus, or PCI
> > > slot fundamental reset essentially. It's potentially affecting multiple
> > > PCI devices (not functions).
> > > 
> > > Yes. It's not safe to call pcibios_set_pcie_reset_state() if some of
> > > the target functions are actively and in use. I also suspect if it
> > > works to reset function#0 via pci_reset_function() while function#1
> > > is actively in use? I guess the caller of pci_reset_function() perhaps
> > > has to ensure there are no active functions/devices.
> > > 
> > > One of the issues the patches try to fix: some of broadcom adapters have
> > > multile functions, which can't support FLR, AF FLR, PM reset. Also, reset
> > > on parent PCI bus and the corresponding reset can't be applied because of
> > > they're multi-function package. In summary, pci_reset_function() doesn't
> > > work on the adapters. That won't give clean state when passing device from
> > > host to guest, or return it back to host. Sometimes, the host memory gets
> > > corrupted when destorying the guest. Occasionally, the patches with improved
> > > pcibios_set_pcie_reset_state() avoided the issue.
> > > 
> > > Some adapters might require fundamental reset on the PCI slot, or hot reset
> > > on parent bus explicitly, in order to successfully reload its firmware after
> > > reset.
> > 
> > This is exactly why we have pci_reset_slot() and pci_reset_bus(), so
> > that the caller can manage the scope of the reset.  You cannot change
> > the semantics of pci_reset_function() simply because it's convenient for
> > your implementation.  This series is wrong and should not be applied.
> > Thanks,
> > 
> > Alex
> > 
> 
> 
> Alex,
> 
> I agree with you here. I think the bigger issue is that we are not
> making sure VFIO is secure, allowing functions to be assigned separately
> to different guests, even when we cannot guarantee we can safely reset a
> single function, and also ignoring the possibility of internal
> communication between functions. This second problem is real, since some
> adapters will allow firmware to be loaded or reset, which will affect
> other functions.
> 
> We need to either have in written that this is acceptable and that
> higher layers should take care of these possibilities, if they care
> about them, or we need to fix this, by making VFIO only accept a whole
> group of functions to be assigned to a guest, which will allow the
> problem described by Gavin to be properly handled, by using slot reset
> functions, or another way of grouping devices that may be specific to
> the platform.
> 
> Does it make sense?

I don't know what you're doing on POWER, I thought groups were
equivalent to PEs, but on x86 we learn about isolation of PCI functions
by standard PCI properties.  Devices need to tell us that they're
isolated via ACS capabilities (or quirks, with vendor approval).
Otherwise we assume there is no isolation and multi-function devices
will be grouped together, preventing the individual functions from being
assigned to separate userspace instances.  VFIO does not ignore the
problem of internal communication between functions, it uses ACS... at
least on most platforms.

Firmware shared between functions is an unsolvable problem outside of
restricting device assignment only to SR-IOV virtual functions.  We
cannot know the programming method to block firmware updates of an
assigned device nor can we control which functions share the firmware.
If you're worried about this level of attack against the hardware, use
VFs exclusively.

If you're unsatisfied with the grouping of devices on your platform,
this is entirely within your control since VFIO relies on IOMMU groups,
which are managed via the platform IOMMU driver.  If your IOMMU driver
is arbitrarily splitting each PCI function into a separate group without
testing the isolation, that's a problem with IOMMU groups on your
platform.  If you would like to add a reset requirement to a group, you
have the ability to do that.  On x86, I expect this would be far too
restrictive.

VFIO also knows how to do bus and slot resets, the pci_reset_bus() and
pci_reset_slot() interfaces were added by me with vfio-pci as the first
user.  Resets are generally handled as best effort though and this is
sometimes a property of a device that might make it unsuitable for
device assignment.

The fact that you're implying above that the devices covered by the
device specific reset proposed in this series can be assigned separately
makes ignoring the scope of pci_reset_function() all the more wrong.  If
you want to change grouping on POWER to require that a group can be
reset, that's an IOMMU driver issue.  If you want to make a change to
VFIO to require an opt-in/out of allowing access to groups without a
reset, then propose something.  Just please don't ignore the semantics
of existing functions because it's a quicker and easier hack than doing
it correctly.  Thanks,

Alex



More information about the Linuxppc-dev mailing list