[PATCH 11/15] cxl: Add support for interrupts on the Mellanox CX4

Andrew Donnellan andrew.donnellan at au1.ibm.com
Thu Jul 14 15:34:53 AEST 2016


On 14/07/16 07:17, Ian Munsie wrote:
> From: Ian Munsie <imunsie at au1.ibm.com>
>
> The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
> interrupts are routed from the networking hardware to the XSL using the
> MSIX table, and from there will be transformed back into an MSIX
> interrupt using the cxl style interrupts (i.e. using IVTE entries and
> ranges to map a PE and AFU interrupt number to an MSIX address).
>
> We want to hide the implementation details of cxl interrupts as much as
> possible. To this end, we use a special version of the MSI setup &
> teardown routines in the PHB while in cxl mode to allocate the cxl
> interrupts and configure the IVTE entries in the process element.
>
> This function does not configure the MSIX table - the CX4 card uses a
> custom format in that table and it would not be appropriate to fill that
> out in generic code. The rest of the functionality is similar to the
> "Full MSI-X mode" described in the CAIA, and this could be easily
> extended to support other adapters that use that mode in the future.
>
> The interrupts will be associated with the default context. If the
> maximum number of interrupts per context has been limited (e.g. by the
> mlx5 driver), it will automatically allocate additional kernel contexts
> to associate extra interrupts as required. These contexts will be
> started using the same WED that was used to start the default context.
>
> Signed-off-by: Ian Munsie <imunsie at au1.ibm.com>

Some minor nitpicks below, which shouldn't block acceptance.

Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>


> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 104c040..530d4af 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3465,6 +3465,10 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>  const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
>  	.dma_dev_setup		= pnv_pci_dma_dev_setup,
>  	.dma_bus_setup		= pnv_pci_dma_bus_setup,
> +#ifdef CONFIG_PCI_MSI

If you've got CXL_BASE you've already got PCI_MSI.

> diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
> index f02a859..f3d34b9 100644
> --- a/drivers/misc/cxl/api.c
> +++ b/drivers/misc/cxl/api.c
> @@ -14,6 +14,7 @@
>  #include <misc/cxl.h>
>  #include <linux/fs.h>
>  #include <asm/pnv-pci.h>
> +#include <linux/msi.h>
>
>  #include "cxl.h"
>
> @@ -489,3 +490,73 @@ int cxl_get_max_irqs_per_process(struct pci_dev *dev)
>  	return afu->irqs_max;
>  }
>  EXPORT_SYMBOL_GPL(cxl_get_max_irqs_per_process);
> +
> +/*
> + * This is a special interrupt allocation routine called from the PHB's MSI
> + * setup function. When capi interrupts are allocated in this manner they must
> + * still be associated with a running context, but since the MSI APIs have no
> + * way to specify this we use the default context associated with the device.
> + *
> + * The Mellanox CX4 has a hardware limitation that restricts the maximum AFU
> + * interrupt number, so in order to overcome this their driver informs us of
> + * the restriction by setting the maximum interrupts per context, and we
> + * allocate additional contexts as necessary so that we can keep the AFU
> + * interrupt number within the supported range.
> + */
> +int _cxl_cx4_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
> +{
> +	struct cxl_context *ctx, *new_ctx, *default_ctx;
> +	int remaining;
> +	int rc;
> +
> +	ctx = default_ctx = cxl_get_context(pdev);
> +	if (WARN_ON(!default_ctx))
> +		return -ENODEV;

I have a very slight preference for:

if (!default_ctx) {
	dev_WARN(&pdev->dev, "couldn't get default context");
	return -ENODEV;
}

(I see this in your arch/powerpc code too, but that's obviously copied 
from the regular powernv irq code. Also, why is there no dev_WARN_ON() 
function?)

> +
> +	remaining = nvec;
> +	while (remaining > 0) {
> +		rc = cxl_allocate_afu_irqs(ctx, min(remaining, ctx->afu->irqs_max));
> +		if (rc) {
> +			pr_warn("%s: Failed to find enough free MSIs\n", pci_name(pdev));

dev_warn(&pdev->dev, "failed to find enough free MSIs\n"); is more 
common in the cxl code.


-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited



More information about the Linuxppc-dev mailing list