[PATCH v3] powerpc / cxl: Add support for the Mellanox CX4 in cxl mode
Ian Munsie
imunsie at au1.ibm.com
Thu Jul 14 07:16:59 AEST 2016
This series adds support for the Mellanox CX4 network adapter operating in cxl
mode to the cxl driver and the PowerNV PHB code. The Mellanox developers will
submit a separate patch series that makes use of this in the mlx5 driver.
The CX4 card can operate in either pci mode, or cxl mode. In cxl mode, memory
accesses from the card go through the XSL (Translation Service Layer,
essentially a stripped down version of the Power Service Layer), allowing it to
transparently access unpinned memory with the cxl driver handling faulting in
pages as necessary, etc. Most of the support for the XSL is already upstream,
though this series does include a bug fix to enable bus mastering for this
(patch 3).
Patch 2 in this series provides an API which the mlx5 driver can query to check
if it is in a cxl capable slot. The card will come up in pci mode, and the mlx5
driver can choose to switch it to cxl mode, wherein it will reappear with an
additional physical function representing the XSL that the cxl driver will bind
to. Patches 13-15 add support for switching the card's mode, including using
the PCI hotplug support to re-enumerate the device tree and re-probind the
card.
Unlike previous users of the cxl kernel API where we used a virtual PHB and
exposed PCI devices under it, the Mellanox CX4 uses a peer model where cxl
binds to one of the physical functions of the card and the mlx5_core driver
binds to the other networking physical functions. Patch 6 skips creating a vPHB
for AFUs without any AFU configuration records (including devices using the
peer model) and opts out of EEH handling. Patches 7 and 8 add support for using
the cxl kernel API with the real PHB to enable this peer model. Patches 4 and
5 are prepatory patches exposing some APIs that the PHB will need to call.
While in cxl mode, interrupts from the CX4 are a little unusual - they are
neither pci interrupts, nor cxl interrutps, but rather a hybrid of the two. The
interrupts are passed from the networking hardware to the XSL using a custom
format in the MSIX table, and from there are treated as cxl interrupts. These
are configured mostly transparently using the standard msix APIs - the PHB
handles allocating and configuring the cxl interrupts, associating them with
the default context, and the mlx5 driver handles filling out the MSIX table
with their custom format (not included in this series). See patch 11.
Additionally, the CX4 has a hard limitation of the number of interrupts that
can be associated with a given context, so to overcome this patches 9 and 10
expose an API to allow the mlx5 driver to inform us of the limit, and the
interrupt allocation code in patch 11 will allocate additional contexts to
associate these with.
Patch 1 is a prepatory cleanup patch to reorganise cxl code in arch/powerpc
into a separate file.
Patch 12 is a workaround for a hardware limitation in the CX4 where a context
with PE=0 cannot be used.
The entire series is bisectable.
Changes since v2:
Addressed feedback from Andrew Donnellan:
- Fixed typos in several comments
- Moved _cxl_pci_associate_default_context and
_cxl_pci_disable_device from vphb.c to a new file phb.c since
they are used by both the vPHB and peer models. (Patch 5)
- Changed two exported symbols to EXPORT_SYMBOL_GPL (Patch 7)
- Undid change to remove static from pnv_pci_release_device and
pci_controller_ops and declare them in the header, both of
which were left over from an earlier cut. (Patch 7)
Changes since v1:
- New patch 6 to skip creating a vPHB if there are no AFU configuration
records, and opt out of EEH handling (partially split from patch 8).
- Updated comments in various patches (1, 2, 7, 10, 15) with feedback
from Andrew Donnellan and Frederic Barrat
- Handle error case if cxl_next_msi_hwirq returns 0 signifying
that an AFU IRQ is not mapped to a hardware interrupt (Patch 11)
- Dropped extraneous "select HOTPLUG_PCI_POWERNV_BASE" in Kconfig,
which was accidentally left in from an earlier non-public
revision. Thanks to Gavin Shan for pointing it out (Patch 13)
- Added new error label for error paths calling pci_dev_put() -
suggested by Ian Munsie (Patch 15)
- Added newline at end of Kconfig (Patch 15)
More information about the Linuxppc-dev
mailing list