[Skiboot] [PATCH 1/2] phb4: Reallocate PEC2 DMA-Read engines to improve GPU-Direct bandwidth
Andrew Donnellan
andrew.donnellan at au1.ibm.com
Thu Jul 19 18:04:25 AEST 2018
On 19/07/18 17:54, Vaibhav Jain wrote:
> We reallocate additional 16/8 DMA-Read engines allocated to stack0/1
> on PEC2 respectively. This is needed to improve bandwidth available to
> the Mellanox CX5 adapter when trying to read GPU memory (GPU-Direct).
>
> If kernel cxl driver indicates a request to allocate maximum possible
> DMA read engines when calling enable_capi_mode() and card is attached
> to PEC2/stack0 slot then we assume its a Mellanox CX5 adapter. We then
> allocate additional 16/8 extra DMA read engines to stack0 and stack1
> respectively on PEC2. This is done by populating the
> XPEC_PCI_PRDSTKOVR and XPEC_NEST_READ_STACK_OVERRIDE as suggested by
> the h/w team.
>
> Signed-off-by: Christophe Lombard <clombard at linux.vnet.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> ---
> hw/phb4.c | 40 +++++++++++++++++++++++++++++++++++++---
> include/phb4-regs.h | 2 ++
> 2 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 8ba3eb78..f2b92409 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3806,7 +3806,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
> /* max PHB read buffers 0-47 */
> reg = 0xFFFFFFFFFFFF0000;
> if (capp_eng & CAPP_MAX_DMA_READ_ENGINES)
> - reg = 0xFF00000000000000;
> + reg = 0xF000000000000000;
> xscom_write(p->chip_id, APC_FSM_READ_MASK + offset, reg);
> xscom_write(p->chip_id, XPT_FSM_RMM + offset, reg);
> }
> @@ -3814,7 +3814,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
> /* Set 30 Read machines for CAPP Minus 20-27 for DMA */
> reg = 0xFFFFF00E00000000;
> if (capp_eng & CAPP_MAX_DMA_READ_ENGINES)
> - reg = 0xFF00000000000000;
> + reg = 0xF000000000000000;
> xscom_write(p->chip_id, APC_FSM_READ_MASK + offset, reg);
> xscom_write(p->chip_id, XPT_FSM_RMM + offset, reg);
> }
> @@ -3932,6 +3932,8 @@ static int64_t enable_capi_mode(struct phb4 *p, uint64_t pe_number,
> return OPAL_HARDWARE;
> }
>
> + stq_eng = 0x0000000000000000ULL;
> + dma_eng = 0x0000000000000000ULL;
> if (p->index == CAPP0_PHB_INDEX) {
> /* PBCQ is operating as a x16 stack
> * - The maximum number of engines give to CAPP will be
> @@ -3974,10 +3976,42 @@ static int64_t enable_capi_mode(struct phb4 *p, uint64_t pe_number,
> reg = 0x8000000000000000ULL; /* PEC works in CAPP Mode */
> reg |= stq_eng;
> if (capp_eng & CAPP_MAX_DMA_READ_ENGINES)
> - dma_eng = 0x0000FF0000000000ULL; /* 16 CAPP Read machines */
> + dma_eng = 0x0000F00000000000ULL; /* 4 CAPP Read machines */
For the record Vaibhav explained that the old comment here was
inaccurate, it's being changed from 8->4 rather than 16->4.
> reg |= dma_eng;
> xscom_write(p->chip_id, p->pe_xscom + XPEC_NEST_CAPP_CNTL, reg);
>
> + /* PEC2 has 3 ETU's + 16 pci lanes that can operate as x16,
> + * x8+x8 (bifurcated) or x8+x4+x4 (trifurcated) mode. When
> + * Mellanox CX5 card is attached to stack0 of this PEC, indicated by
> + * request to allocate CAPP_MAX_DMA_READ_ENGINES; we tweak the default
> + * dma-read engines allocations to maximize the DMA read performance
> + */
Thanks for this comment, it's much clearer than v1.
> + if ((p->index == CAPP1_PHB_INDEX) &&
> + (capp_eng & CAPP_MAX_DMA_READ_ENGINES)) {
> +
> + /*
> + * Allocate Additional 16/8 dma read engines to stack0/stack1
> + * respectively. Read engines 0:31 are anyways always assigned
> + * to stack0. Also skip allocating DMA Read Engine-32 by
> + * enabling Bit[0] in XPEC_NEST_READ_STACK_OVERRIDE register.
> + * Enabling this bit seems cause a parity error reported in
> + * NFIR[1]-nonbar_pe.
> + */
> + reg = 0x7fff80007F008000ULL;
> +
> + xscom_write(p->chip_id, p->pci_xscom + XPEC_PCI_PRDSTKOVR, reg);
> + xscom_write(p->chip_id, p->pe_xscom +
> + XPEC_NEST_READ_STACK_OVERRIDE, reg);
> +
> + /* Log this reallocation as it may impact dma performance of
> + * other slots connected to PEC2
> + */
> + PHBINF(p, "CAPP: Set %d dma-read engines for PEC2/stack-0\n",
> + 32 + __builtin_popcountll(reg & PPC_BITMASK(0, 31)));
> + PHBDBG(p, "CAPP: XPEC_NEST_READ_STACK_OVERRIDE: %016llx\n",
> + reg);
> + }
> +
> /* PCI to PB data movement ignores the PB init signal. */
> xscom_write_mask(p->chip_id, p->pe_xscom + XPEC_NEST_PBCQ_HW_CONFIG,
> XPEC_NEST_PBCQ_HW_CONFIG_PBINIT,
> diff --git a/include/phb4-regs.h b/include/phb4-regs.h
> index d7b551f3..ef3cfa93 100644
> --- a/include/phb4-regs.h
> +++ b/include/phb4-regs.h
> @@ -347,6 +347,7 @@
> #define XPEC_NEST_PBCQ_HW_CONFIG_DIS_NODAL PPC_BIT(50)
> #define XPEC_NEST_PBCQ_HW_CONFIG_DIS_RNNN PPC_BIT(52)
> #define XPEC_NEST_CAPP_CNTL 0x7
> +#define XPEC_NEST_READ_STACK_OVERRIDE 0x8
>
> /* Nest base per-stack registers */
> #define XPEC_NEST_STK_PCI_NFIR 0x0
> @@ -381,6 +382,7 @@
> /* PCI base registers */
> #define XPEC_PCI_PBAIB_HW_CONFIG 0x0
> #define XPEC_PCI_CAPP_SEC_BAR 0x1
> +#define XPEC_PCI_PRDSTKOVR 0x2
>
> /* PCI base per-stack registers */
> #define XPEC_PCI_STK_PCI_FIR 0x0
>
--
Andrew Donnellan OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com IBM Australia Limited
More information about the Skiboot
mailing list