[PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV

Bjorn Helgaas helgaas at kernel.org
Tue Oct 25 14:51:13 AEDT 2016


On Tue, Oct 25, 2016 at 12:47:28PM +1100, Gavin Shan wrote:
> On Mon, Oct 24, 2016 at 09:03:16AM -0500, Bjorn Helgaas wrote:
> >On Mon, Oct 24, 2016 at 10:28:02AM +1100, Gavin Shan wrote:
> >> On Fri, Oct 21, 2016 at 11:50:34AM -0500, Bjorn Helgaas wrote:
> >> >Hi Gavin,
> >> >
> >> >On Fri, Sep 30, 2016 at 09:47:50AM +1000, Gavin Shan wrote:
> >> >> pci_update_resource() might be called to update (shift) IOV BARs
> >> >> in PPC PowerNV specific pcibios_sriov_enable() when enabling PF's
> >> >> SRIOV capability. At that point, the PF have been functional if
> >> >> the SRIOV is enabled through sysfs entry "sriov_numvfs". The PF's
> >> >> memory decoding (0x2 in PCI_COMMAND) shouldn't be disabled when
> >> >> updating its IOV BARs with pci_update_resource(). Otherwise, we
> >> >> receives EEH error caused by MMIO access to PF's memory BARs during
> >> >> the window when PF's memory decoding is disabled.
> >> >
> >> >The fact that you get EEH errors is irrelevant.  We can't write code
> >> >simply to avoid errors -- we have to write code to make the system
> >> >work correctly.
> >> >
> >> >I do not want to add a "mmio_force_on" parameter to
> >> >pci_update_resource().  That puts the burden on the caller to
> >> >understand this subtle issue.  If the caller passes mmio_force_on=1
> >> >when it shouldn't, things will almost always work, but once in a blue
> >> >moon a half-updated BAR will conflict with some other device in the
> >> >system, and we'll have an unreproducible, undebuggable crash.
> >> >
> >> 
> >> Bjorn, thanks for your comments. Yes, the EEH error was caused by MMIO
> >> access to PF's normal BARs, not VF BARs. Yeah, I also had the conern
> >> that adding parameter to pci_update_resource() increases the visible
> >> complexity to the caller of the function.
> >> 
> >> >What you do need is an explanation of why it's safe to non-atomically
> >> >update a VF BARx in the SR-IOV capability.  I think this probably
> >> >involves the VF MSE bit, and you probably have to either disable VFs
> >> >completely or clear the MSE bit while you're updating the VF BARx.  We
> >> >should be able to do this inside pci_update_resource() without
> >> >changing the interface.
> >> >
> >> 
> >> Yes, It's what PATCH[1/2] does: (PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE)
> >> are set after VF BARs are updated with pci_update_resource() in this PPC
> >> specific scenario. There are other two situations where the IOV BARs are
> >> updated: PCI resource resizing and allocation during system booting or hot
> >> adding PF. The VF shouldn't be enabled in both cases when updating IOV BARs.
> >> 
> >> I think you suggest to add check in pci_update_resource(): Don't disable
> >> PF's memory decoding when updating VF BARs. I will send updated revision
> >> if it's what you want.
> >> 
> >> 	/*
> >> 	 * The PF might be functional when updating its IOV BARs. So PF's
> >> 	 * memory decoding shouldn't be disabled when updating its IOV BARs.
> >> 	 */
> >> 	disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
> >> #ifdef CONFIG_PCI_IOV
> >> 	disable &= !(resno >= PCI_IOV_RESOURCES &&
> >> 		     resno <= PCI_IOV_RESOURCE_END);
> >> #endif
> >> 	if (disable) {
> >> 		pci_read_config_word(dev, PCI_COMMAND, &cmd);
> >> 		pci_write_config_word(dev, PCI_COMMAND,
> >> 				      cmd & ~PCI_COMMAND_MEMORY);
> >> 	}
> >
> >Not exactly.  A 64-bit BAR cannot be updated atomically.  The whole
> >point of this exercise is to make sure that when we update such a BAR,
> >whether it is a normal PCI BAR or an SR-IOV BAR, the transient state
> >does not conflict with anything else in the system.
> >
> >For example, consider two devices that do not conflict:
> >
> >  device A BAR 0: 0x00000000_20000000
> >  device B BAR 0: 0x00000000_40000000
> >
> >We want to update A's BAR 0 to 0x00000001_40000000.  We can't do the
> >update atomically, so we have this sequence:
> >
> >  before update:            device A BAR 0: 0x00000000_20000000
> >  after writing lower half: device A BAR 0: 0x00000000_40000000
> >  after writing upper half: device A BAR 0: 0x00000001_40000000
> >
> >If the driver for device B accesses B between the writes, both A and B
> >may respond, which is a fatal error.
> >
> 
> Thanks for the detailed explanation. Apart from pdev->mmio_always_on,
> the normal BARs are updated with PCI_COMMAND_MEMORY cleared. With
> PATCH[1/2], The IOV BARs are updated with PCI_SRIOV_CTRL_MSE and
> PCI_SRIOV_CTRL_VFE cleared in the problematic path the patch tried
> to address initially. However, I prefer your suggestion at end of
> the reply.
> 
> >Probably the *best* thing would be to make pci_update_resource()
> >return an error if it's asked to update a BAR that is enabled, but I
> >haven't looked at all the callers to see whether that's practical.
> >
> 
> It arguably enforces users to tackle the limitation: the memory
> decoding (PCI_COMMAND_MEMORY or PCI_SRIOV_CTRL_VFE) should be
> disabled before updating the BARs with pci_update_resource().
> It means user cannot call APIs in relatively relaxed order
> as before. For example, pci_enable_device() followed by
> pci_update_resource(), which is allowed previously, won't
> work.

That specific case (pci_enable_device() followed by
pci_update_resource()) should *not* work.  pci_enable_device() is
normally called by a driver's .probe() method, and after we call a
.probe() method, the PCI core shouldn't touch the device at all
because there's no means of mutual exclusion between the driver and
the PCI core.

I think pci_update_resource() should only be called in situations
where the caller already knows that nobody is using the device.  For
regular PCI BARs, that doesn't necessarily mean PCI_COMMAND_MEMORY is
turned off, because firmware leaves PCI_COMMAND_MEMORY enabled for
many devices, even though nobody is using them.

Anyway, I think that's a project for another day.  That's too much to
tackle for the limited problem you're trying to solve.

> We can hide the limitation inside pci_update_resource() because
> nobody accesses the device's memory space that is being updated
> by pci_update_resource(). 
> 
> >The current strategy in pci_update_resource() is to clear
> >PCI_COMMAND_MEMORY when updating a 64-bit memory BAR.  This only
> >applies to the regular PCI BARs 0-5.
> >
> >Extending that strategy to SR-IOV would mean clearing
> >PCI_SRIOV_CTRL_MSE when updating a 64-bit VF BAR.  Obviously you
> >wouldn't clear PCI_COMMAND_MEMORY in this case because
> >PCI_COMMAND_MEMORY doesn't affect the VF BARs.
> >
> 
> Yeah, it would be the solution to have. If you agree, I will post
> updatd version according to this: Clearing PCI_SRIOV_CTRL_MSE when
> updating IOV BARs. The bit won't be touched if pdev->mmio_always_on
> is true.

I think you should ignore pdev->mmio_always_on for IOV BARs.
mmio_always_on is basically a workaround for devices that either don't
follow the spec or where we didn't completely understand the problem.
I don't think there's any reason to set mmio_always_on for SR-IOV
devices.

Bjorn


More information about the Linuxppc-dev mailing list