[Skiboot] [PATCH 5/5] core/pci: Support SRIOV VFs
Gavin Shan
gwshan at linux.vnet.ibm.com
Tue Feb 7 10:14:40 AEDT 2017
On Mon, Feb 06, 2017 at 05:29:22PM +1100, Russell Currey wrote:
>On Fri, 2016-11-18 at 16:05 +1100, Gavin Shan wrote:
>> Currently, skiboot can't see SRIOV VFs. It introduces some troubles
>> as I can see: The device initialization logic (phb->ops->device_init())
>> isn't applied to VFs, meaning we have to maintain same and duplicated
>> mechanism in kernel for VFs only. It introduces difficulty to code
>> maintaining and prone to lose sychronization.
>>
>> This was motivated by bug reported by Carol: The VF's Max Payload
>> Size (MPS) isn't matched with PF's on Mellanox's adapter even kernel
>> tried to make them same. It's caused by readonly PCIECAP_EXP_DEVCTL
>> register on VFs. The skiboot would be best place to emulate this bits
>> to eliminate the gap as I can see.
>>
>> This supports SRIOV VFs. When the PF's SRIOV capability is populated,
>> the number of maximal VFs (struct pci_device) are instanciated, but
>> usable yet. In the mean while, PCI config register filter is registered
>
>should this be "but not usable yet"?
>
Yes, will correct in next respin.
>> against PCIECAP_SRIOV_CTRL_VFE to capture the event of enabling or
>> disabling VFs. The VFs are initialized, put into the PF's children
>> list (pd->children), populate its PCI capabilities, and register PCI
>> config register filter against PCICAP_EXP_DEVCTL. The filter's handler
>> caches what is written to MPS field and returns the cached value on
>> read, to eliminate the gap mentioned as above.
>>
>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>> ---
>
>Having knowledge of VFs in skiboot seems like a good idea.
>
>Minor comments below.
>
>Reviewed-by: Russell Currey <ruscur at russell.cc>
>
>> core/Makefile.inc | 4 +-
>> core/pci-iov.c | 252
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> core/pci.c | 2 +
>> include/pci-cfg.h | 32 +++++++
>> include/pci-iov.h | 37 ++++++++
>> 5 files changed, 325 insertions(+), 2 deletions(-)
>> create mode 100644 core/pci-iov.c
>> create mode 100644 include/pci-iov.h
>>
>> diff --git a/core/Makefile.inc b/core/Makefile.inc
>> index 9223af1..5ff84d1 100644
>> --- a/core/Makefile.inc
>> +++ b/core/Makefile.inc
>> @@ -2,8 +2,8 @@
>>
>> SUBDIRS += core
>> CORE_OBJS = relocate.o console.o stack.o init.o chip.o mem_region.o
>> -CORE_OBJS += malloc.o lock.o cpu.o utils.o fdt.o opal.o interrupts.o
>> -CORE_OBJS += timebase.o opal-msg.o pci.o pci-virt.o pci-slot.o pcie-slot.o
>> +CORE_OBJS += malloc.o lock.o cpu.o utils.o fdt.o opal.o interrupts.o
>> timebase.o
>> +CORE_OBJS += opal-msg.o pci.o pci-iov.o pci-virt.o pci-slot.o pcie-slot.o
>> CORE_OBJS += pci-opal.o fast-reboot.o device.o exceptions.o trace.o
>> affinity.o
>> CORE_OBJS += vpd.o hostservices.o platform.o nvram.o nvram-format.o hmi.o
>> CORE_OBJS += console-log.o ipmi.o time-utils.o pel.o pool.o errorlog.o
>> diff --git a/core/pci-iov.c b/core/pci-iov.c
>> new file mode 100644
>> index 0000000..2d37b15
>> --- /dev/null
>> +++ b/core/pci-iov.c
>> @@ -0,0 +1,252 @@
>> +/* Copyright 2013-2016 IBM Corp.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at
>> + *
>> + * http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> + * implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#include <skiboot.h>
>> +#include <pci.h>
>> +#include <pci-cfg.h>
>> +#include <pci-slot.h>
>> +#include <pci-iov.h>
>> +
>> +/*
>> + * Tackle the VF's MPS in PCIe capability. The field is read only.
>> + * This function caches what is written and returns the cached
>> + * MPS on read.
>> + */
>> +static int64_t pci_iov_vf_devctl(void *dev, struct pci_cfg_reg_filter *pcrf,
>> + uint32_t offset, uint32_t len,
>> + ov_vf_devctluint32_t *data, bool write)
>> +{
>> + struct pci_device *vf = (struct pci_device *)dev;
>> + uint32_t pos = pci_cap(vf, PCI_CFG_CAP_ID_EXP, false);
>> + uint8_t *pcache;
>> +
>> + if (offset != pcrf->start ||
>> + offset != (pos + PCICAP_EXP_DEVCTL))
>> + return OPAL_SUCCESS;
>
>In what situations would this be called with a wrong offset? Does there need to
>be some kind of message or different return here, given this function always
>returns OPAL_SUCCESS?
>
There is possibility that the handler is associated with wrong offset/length.
I think it's fine and needn't a message because it's very rare error case.
The retrun value is dropped by upper layer. OPAL_SUCCESS was ever used to
indicates the value should be flushed to hardware as well. So OPAL_SUCCESS
is correct.
>> +
>> + pcache = &pcrf->data[0];
>> + if (write) {
>> + *pcache = ((uint8_t)(*data >> (8 * (4 - len)))) &
>> + PCICAP_EXP_DEVCTL_MPS;
>> + } else {
>> + *data &= ~(PCICAP_EXP_DEVCTL_MPS << (8 * (4 - len)));
>> + *data |= (((uint32_t)(*pcache & PCICAP_EXP_DEVCTL_MPS))
>> + << (8 * (4 - len)));
>> + }
>> +
>> + return OPAL_SUCCESS;
>> +}
>> +
>> +static void pci_iov_vf_quirk(struct phb *phb, struct pci_device *vf)
>> +{
>> + struct pci_cfg_reg_filter *pcrf;
>> + uint32_t pos;
>> +
>> + if (!pci_has_cap(vf, PCI_CFG_CAP_ID_EXP, false))
>> + return;
>> +
>> + /* Mellanox MT27500 Family [ConnectX-3] */
>
>It might be nice to extend the comment to explain why this is broken for this
>particular family of devices.
>
I think the code is self-explaining, but yes, more comments aren't harmful. I'll
add something like below:
/*
* The VF's MPS is readonly in hardware. The values for PF/VF are
* usually different. We are introducing a quirk to make them look
* same.
*/
>> + if (vf->vdid != 0x100315b3)
>> + return;
>> +
>> + pos = pci_cap(vf, PCI_CFG_CAP_ID_EXP, false);
>> + pcrf = pci_add_cfg_reg_filter(vf, pos + PCICAP_EXP_DEVCTL, 4,
>> + PCI_REG_FLAG_MASK, pci_iov_vf_devctl);
>> + if (!pcrf)
>> + prlog(PR_WARNING, "%s: Missed DEVCTL filter on
>> %04x:%02x:%02x.%01x\n",
>> + __func__, phb->opal_id, (vf->bdfn >> 8),
>> + ((vf->bdfn >> 3) & 0x1f), (vf->bdfn & 0x7));
>> +}
>> +
>> +/*
>> + * Update the SRIOV parameters that change when the number of
>> + * VFs is configured.
>> + */
>> +static bool pci_iov_update_parameters(struct pci_iov *iov)
>> +{
>> + struct phb *phb = iov->phb;
>> + uint16_t bdfn = iov->pd->bdfn;
>> + uint32_t pos = iov->pos;
>> + uint16_t val;
>> + bool enabled;
>> +
>> + pci_cfg_read16(phb, bdfn, pos + PCIECAP_SRIOV_CTRL, &val);
>> + enabled = !!(val & PCIECAP_SRIOV_CTRL_VFE);
>> + if (iov->enabled == enabled)
>> + return false;
>> +
>> + if (enabled) {
>> + pci_cfg_read16(phb, bdfn, pos + PCIECAP_SRIOV_INITIAL_VF,
>> + &iov->init_VFs);
>> + pci_cfg_read16(phb, bdfn, pos + PCIECAP_SRIOV_NUM_VF,
>> + &iov->num_VFs);
>> + pci_cfg_read16(phb, bdfn, pos + PCIECAP_SRIOV_VF_OFFSET,
>> + &iov->offset);
>> + pci_cfg_read16(phb, bdfn, pos + PCIECAP_SRIOV_VF_STRIDE,
>> + &iov->stride);
>> + } else {
>> + iov->init_VFs = 0;
>> + iov->num_VFs = 0;
>> + iov->offset = 0;
>> + iov->stride = 0;
>> + }
>> +
>> + iov->enabled = enabled;
>> + return true;
>> +}
>> +
>> +static int64_t pci_iov_change(void *dev __unused,
>> + struct pci_cfg_reg_filter *pcrf,
>> + uint32_t offset __unused,
>> + uint32_t len __unused,
>> + uint32_t *data __unused,
>> + bool write __unused)
>> +{
>> + struct pci_iov *iov = (struct pci_iov *)pcrf->data;
>> + struct phb *phb = iov->phb;
>> + struct pci_device *pd = iov->pd;
>> + struct pci_device *vf, *tmp;
>> + uint32_t i;
>> + bool changed;
>> +
>> + /* Update SRIOV variable parameters */
>> + changed = pci_iov_update_parameters(iov);
>> + if (!changed)
>> + return OPAL_SUCCESS;
>> +
>> + /* Remove all VFs that have been attached to the parent */
>> + if (!iov->enabled) {
>> + list_for_each_safe(&pd->children, vf, tmp, link)
>> + list_del(&vf->link);
>> + return OPAL_SUCCESS;
>> + }
>> +
>> + /* Initialize the VFs and attach them to parent */
>> + for (changed = false, i = 0; i < iov->num_VFs; i++) {
>> + vf = &iov->VFs[i];
>> + vf->bdfn = pd->bdfn + iov->offset + iov->stride * i;
>> + list_add_tail(&pd->children, &vf->link);
>> +
>> + /*
>> + * We don't populate the capabilities again if they have
>> + * been existing, to save time. Also, we need delay for
>> + * 100ms before the VF's config space becomes ready.
>> + */
>> + if (!pci_has_cap(vf, PCI_CFG_CAP_ID_EXP, false)) {
>> + if (!changed) {
>> + changed = !changed;
>> + time_wait_ms(100);
>> + }
>> +
>> + pci_init_capabilities(phb, vf);
>> + pci_iov_vf_quirk(phb, vf);
>> + }
>> +
>> + /* Call PHB hook */
>> + if (phb->ops->device_init)
>> + phb->ops->device_init(phb, pd, NULL);
>> + }
>> +
>> + return OPAL_SUCCESS;
>> +}
>> +
>> +/*
>> + * This function is called with disabled SRIOV capability. So the VF's
>> + * config address isn't finalized and its config space isn't accessible.
>> + */
>> +static void pci_iov_init_VF(struct pci_device *pd, struct pci_device *vf)
>> +{
>> + vf->is_bridge = false;
>> + vf->is_multifunction = false;
>> + vf->dev_type = PCIE_TYPE_ENDPOINT;
>> + vf->scan_map = -1;
>> + vf->vdid = pd->vdid;
>> + vf->sub_vdid = pd->sub_vdid;
>> + vf->class = pd->class;
>> + vf->dn = NULL;
>> + vf->slot = NULL;
>> + vf->parent = pd;
>> + list_head_init(&vf->pcrf);
>> + list_head_init(&vf->children);
>> +}
>> +
>> +void pci_init_iov_cap(struct phb *phb, struct pci_device *pd)
>> +{
>> + int64_t pos;
>> + struct pci_iov *iov;
>> + struct pci_cfg_reg_filter *pcrf;
>> + uint32_t i;
>> +
>> + /* Search for SRIOV capability */
>> + if (!pci_has_cap(pd, PCI_CFG_CAP_ID_EXP, false))
>> + return;
>> +
>> + pos = pci_find_ecap(phb, pd->bdfn, PCIECAP_ID_SRIOV, NULL);
>> + if (pos <= 0)
>> + return;
>> +
>> + /* Allocate IOV */
>> + iov = zalloc(sizeof(*iov));
>> + if (!iov) {
>> + prlog(PR_ERR, "%s: Cannot alloc IOV for
>> %04x:%02x:%02x.%01x\n",
>> + __func__, phb->opal_id, (pd->bdfn >> 8),
>> + ((pd->bdfn >> 3) & 0x1f), (pd->bdfn & 0x7));
>> + return;
>> + }
>> +
>> + /* Allocate VFs */
>> + pci_cfg_read16(phb, pd->bdfn, pos + PCIECAP_SRIOV_TOTAL_VF,
>> + &iov->total_VFs);
>> + iov->VFs = zalloc(sizeof(*iov->VFs) * iov->total_VFs);
>> + if (!iov->VFs) {
>> + prlog(PR_ERR, "%s: Cannot alloc %d VFs for
>> %04x:%02x:%02x.%01x\n",
>> + __func__, iov->total_VFs, phb->opal_id,
>> + (pd->bdfn >> 8), ((pd->bdfn >> 3) & 0x1f),
>> + (pd->bdfn & 0x7));
>> + free(iov);
>> + return;
>> + }
>> +
>> + /* Initialize VFs */
>> + for (i = 0; i < iov->total_VFs; i++)
>> + pci_iov_init_VF(pd, &iov->VFs[i]);
>> +
>> + /* Register filter for enabling or disabling SRIOV capability */
>> + pcrf = pci_add_cfg_reg_filter(pd, pos + PCIECAP_SRIOV_CTRL, 2,
>> + PCI_REG_FLAG_WRITE, pci_iov_change);
>> + if (!pcrf) {
>> + prlog(PR_ERR, "%s: Cannot set filter on
>> %04x:%02x:%02x.%01x\n",
>> + __func__, phb->opal_id, (pd->bdfn >> 8),
>> + ((pd->bdfn >> 3) & 0x1f), (pd->bdfn & 0x7));
>> + free(iov->VFs);
>> + free(iov);
>> + return;
>> + }
>> +
>> + /* Associate filter and IOV capability */
>> + pcrf->data = (void *)iov;
>> +
>> + /*
>> + * Retrieve the number of VFs and other information if applicable.
>> + * Register the SRIOV capability in the mean while.
>> + */
>> + iov->phb = phb;
>> + iov->pd = pd;
>> + iov->pos = pos;
>> + iov->enabled = false;
>> + pci_iov_update_parameters(iov);
>> + pci_set_cap(pd, PCIECAP_ID_SRIOV, pos, iov, true);
>> +}
>> diff --git a/core/pci.c b/core/pci.c
>> index b38d5a0..860ff5a 100644
>> --- a/core/pci.c
>> +++ b/core/pci.c
>> @@ -18,6 +18,7 @@
>> #include <cpu.h>
>> #include <pci.h>
>> #include <pci-cfg.h>
>> +#include <pci-iov.h>
>> #include <pci-slot.h>
>> #include <timebase.h>
>> #include <device.h>
>> @@ -204,6 +205,7 @@ void pci_init_capabilities(struct phb *phb, struct
>> pci_device *pd)
>> {
>> pci_init_pcie_cap(phb, pd);
>> pci_init_aer_cap(phb, pd);
>> + pci_init_iov_cap(phb, pd);
>> }
>>
>> static struct pci_device *pci_scan_one(struct phb *phb, struct pci_device
>> *parent,
>> diff --git a/include/pci-cfg.h b/include/pci-cfg.h
>> index 27c0f74..530f0a8 100644
>> --- a/include/pci-cfg.h
>> +++ b/include/pci-cfg.h
>> @@ -486,6 +486,38 @@
>> #define PCIECAP_AER_TLP_PFX_LOG2 0x40
>> #define PCIECAP_AER_TLP_PFX_LOG3 0x44
>>
>> +/* SRIOV capability */
>> +#define PCIECAP_ID_SRIOV 0x10
>> +#define PCIECAP_SRIOV_CAP 0x04
>> +#define PCIECAP_SRIOV_CAP_VFM 0x01
>> +#define PCIECAP_SRIOV_CAP_INTR(x) ((x) >> 21)
>> +#define PCIECAP_SRIOV_CTRL 0x08
>> +#define PCIECAP_SRIOV_CTRL_VFE 0x01
>> +#define PCIECAP_SRIOV_CTRL_VFM 0x02
>> +#define PCIECAP_SRIOV_CTRL_INTR 0x04
>> +#define PCIECAP_SRIOV_CTRL_MSE 0x08
>> +#define PCIECAP_SRIOV_CTRL_ARI 0x10
>> +#define PCIECAP_SRIOV_STATUS 0x0a
>> +#define PCIECAP_SRIOV_STATUS_VFM 0x01
>> +#define PCIECAP_SRIOV_INITIAL_VF 0x0c
>> +#define PCIECAP_SRIOV_TOTAL_VF 0x0e
>> +#define PCIECAP_SRIOV_NUM_VF 0x10
>> +#define PCIECAP_SRIOV_FUNC_LINK 0x12
>> +#define PCIECAP_SRIOV_VF_OFFSET 0x14
>> +#define PCIECAP_SRIOV_VF_STRIDE 0x16
>> +#define PCIECAP_SRIOV_VF_DID 0x1a
>> +#define PCIECAP_SRIOV_SUP_PGSIZE 0x1c
>> +#define PCIECAP_SRIOV_SYS_PGSIZE 0x20
>> +#define PCIECAP_SRIOV_BAR 0x24
>> +#define PCIECAP_SRIOV_NUM_BARS 6
>> +#define PCIECAP_SRIOV_VFM 0x3c
>> +#define PCIECAP_SRIOV_VFM_BIR(x) ((x) & 7)
>> +#define PCIECAP_SRIOV_VFM_OFFSET(x) ((x) & ~7)
>> +#define PCIECAP_SRIOV_VFM_UA 0x0
>> +#define PCIECAP_SRIOV_VFM_MI 0x1
>> +#define PCIECAP_SRIOV_VFM_MO 0x2
>> +#define PCIECAP_SRIOV_VFM_AV 0x3
>> +
>> /* Vendor specific extend capability */
>> #define PCIECAP_ID_VNDR 0x0b
>> #define PCIECAP_VNDR_HDR 0x04
>> diff --git a/include/pci-iov.h b/include/pci-iov.h
>> new file mode 100644
>> index 0000000..787b2cd
>> --- /dev/null
>> +++ b/include/pci-iov.h
>> @@ -0,0 +1,37 @@
>> +/* Copyright 2013-2016 IBM Corp.
>> + *
>> + * Licensed under the Apache License, Version 2.0 (the "License");
>> + * you may not use this file except in compliance with the License.
>> + * You may obtain a copy of the License at
>> + *
>> + * http://www.apache.org/licenses/LICENSE-2.0
>> + *
>> + * Unless required by applicable law or agreed to in writing, software
>> + * distributed under the License is distributed on an "AS IS" BASIS,
>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> + * implied.
>> + * See the License for the specific language governing permissions and
>> + * limitations under the License.
>> + */
>> +
>> +#ifndef __PCI_IOV_H
>> +#define __PCI_IOV_H
>> +
>> +struct pci_iov {
>> + struct phb *phb;
>> + struct pci_device *pd;
>> + struct pci_device *VFs;
>> + uint32_t pos;
>> + bool enabled;
>> + struct pci_cfg_reg_filter pcrf;
>> +
>> + uint16_t init_VFs;
>> + uint16_t total_VFs;
>> + uint16_t num_VFs;
>> + uint16_t offset;
>> + uint16_t stride;
>> +};
>> +
>> +extern void pci_init_iov_cap(struct phb *phb, struct pci_device *pd);
>> +
>> +#endif
>
More information about the Skiboot
mailing list