[PATCH v4 00/25] Add support for OpenCAPI Persistent Memory devices

Alastair D'Silva alastair at d-silva.org
Thu Apr 2 09:44:13 AEDT 2020


> -----Original Message-----
> From: Dan Williams <dan.j.williams at intel.com>
> Sent: Wednesday, 1 April 2020 7:48 PM
> To: Alastair D'Silva <alastair at d-silva.org>
> Cc: Aneesh Kumar K . V <aneesh.kumar at linux.ibm.com>; Oliver O'Halloran
> <oohall at gmail.com>; Benjamin Herrenschmidt
> <benh at kernel.crashing.org>; Paul Mackerras <paulus at samba.org>; Michael
> Ellerman <mpe at ellerman.id.au>; Frederic Barrat <fbarrat at linux.ibm.com>;
> Andrew Donnellan <ajd at linux.ibm.com>; Arnd Bergmann
> <arnd at arndb.de>; Greg Kroah-Hartman <gregkh at linuxfoundation.org>;
> Vishal Verma <vishal.l.verma at intel.com>; Dave Jiang
> <dave.jiang at intel.com>; Ira Weiny <ira.weiny at intel.com>; Andrew Morton
> <akpm at linux-foundation.org>; Mauro Carvalho Chehab
> <mchehab+samsung at kernel.org>; David S. Miller <davem at davemloft.net>;
> Rob Herring <robh at kernel.org>; Anton Blanchard <anton at ozlabs.org>;
> Krzysztof Kozlowski <krzk at kernel.org>; Mahesh Salgaonkar
> <mahesh at linux.vnet.ibm.com>; Madhavan Srinivasan
> <maddy at linux.vnet.ibm.com>; Cédric Le Goater <clg at kaod.org>; Anju T
> Sudhakar <anju at linux.vnet.ibm.com>; Hari Bathini
> <hbathini at linux.ibm.com>; Thomas Gleixner <tglx at linutronix.de>; Greg
> Kurz <groug at kaod.org>; Nicholas Piggin <npiggin at gmail.com>; Masahiro
> Yamada <yamada.masahiro at socionext.com>; Alexey Kardashevskiy
> <aik at ozlabs.ru>; Linux Kernel Mailing List <linux-kernel at vger.kernel.org>;
> linuxppc-dev <linuxppc-dev at lists.ozlabs.org>; linux-nvdimm <linux-
> nvdimm at lists.01.org>; Linux MM <linux-mm at kvack.org>
> Subject: Re: [PATCH v4 00/25] Add support for OpenCAPI Persistent Memory
> devices
> 
> On Sun, Mar 29, 2020 at 10:23 PM Alastair D'Silva <alastair at d-silva.org>
> wrote:
> >
> > This series adds support for OpenCAPI Persistent Memory devices on
> > bare metal (arch/powernv), exposing them as nvdimms so that we can
> > make use of the existing infrastructure. There already exists a driver
> > for the same devices abstracted through PowerVM (arch/pseries):
> > arch/powerpc/platforms/pseries/papr_scm.c
> >
> > These devices are connected via OpenCAPI, and present as LPC (lowest
> coherence point) memory to the system, practically, that means that
> memory on these cards could be treated as conventional, cache-coherent
> memory.
> >
> > Since the devices are connected via OpenCAPI, they are not enumerated
> via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with
> devices below them.
> >
> > This series introduces a driver that exposes the memory on these cards as
> nvdimms, with each card getting it's own bus. This is somewhat complicated
> by the fact that the cards do not have out of band persistent storage for
> metadata, so 1 SECTION_SIZE's (see SPARSEMEM) worth of storage is carved
> out of the top of the card storage to implement the ndctl_config_* calls.
> 
> Is it really tied to section-size? Can't that change based on the configured
> page-size? It's not clear to me why that would be the choice, but I'll dig into
> the implementation.
> 

I had tried using PAGE_SIZE, but ran into problems carving off just 1 page and handing it to the kernel, while leaving the rest as pmem. That was a while ago though, so maybe I should retry it.

> > The driver is not responsible for configuring the NPU (NVLink Processing
> Unit) BARs to map the LPC memory from the card into the system's physical
> address space, instead, it requests this to be done via OPAL calls (typically
> implemented by Skiboot).
> 
> Are OPAL calls similar to ACPI DSMs? I.e. methods for the OS to invoke
> platform firmware services? What's Skiboot?
> 

Yes, OPAL is the interface to firmware for POWER. Skiboot is the open-source (and only) implementation of OPAL.

> >
> > The series is structured as follows:
> >  - Required infrastructure changes & cleanup
> >  - A minimal driver implementation
> >  - Implementing additional features within the driver
> 
> Thanks for the intro and the changelog!
> 
> >
> > Changelog:
> > V4:
> >   - Rebase on next-20200320
> 
> Do you have dependencies on other material that's in -next? Otherwise -
> next is only a viable development baseline if you are going to merge through
> Andrew's tree.
> 
> >   - Bump copyright to 2020
> >   - Ensure all uapi headers use C89 compatible comments (missed
> ocxlpmem.h)
> >   - Move the driver back to drivers/nvdimm/ocxl, after confirmation
> >     that this location is desirable
> >   - Rename ocxl.c to ocxlpmem.c (+ support files)
> >   - Rename all ocxl_pmem to ocxlpmem
> >   - Address checkpatch --strict issues
> >   - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> >         - Pass base address as __be64
> >   - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> >         - Address checkpatch spacing warnings
> >         - Reword blurb
> >         - Reword size description for ocxl_link_add_lpc_mem()
> >         - Add an early exit in ocxl_link_lpc_release() to avoid triggering
> >           bogus warnings if called after ocxl_link_lpc_map() fails
> >   - "powerpc/powernv: Add OPAL calls for LPC memory alloc/release"
> >         - Reword blurb
> >   - "powerpc/powernv: Map & release OpenCAPI LPC memory"
> >         - Reword blurb
> >   - Move minor_idr init from file_init() to ocxlpmem_init() (fixes runtime
> error
> >     in "nvdimm: Add driver for OpenCAPI Persistent Memory")
> >   - Wrap long lines
> >   - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> >         - Remove '+ 1' workround from serial number->cookie assignment
> >         - Drop out of memory message for ocxlpmem in probe()
> >         - Fix leaks of ocxlpmem & ocxlpmem->ocxl_fn in probe()
> >         - remove struct ocxlpmem_function0, it didn't value add
> >         - factor out err_unregistered label in probe
> >         - Address more checkpatch warnings
> >         - get/put the pci dev on probe/free
> >         - Drop ocxlpmem_ prefix from static functions
> >         - Propogate errors up from called functions in probe()
> >         - Set MODULE_LICENSE to GPLv2
> >         - Add myself as module author
> >         - Call nvdimm_bus_unregister() in remove() to release references
> >         - Don't call devm_memunmap on metadata_address, the release
> handler on
> >          the device already deals with this
> >   - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> >         - Fix mask for read_latency
> >         - Fold in is_usable logic into timeout to remove error message race
> >         - propogate bad rc from read_device_metadata
> >   - "nvdimm/ocxl: Add register addresses & status values to the header"
> >         - Add comments for register abbreviations where names have been
> >           expanded
> >         - Add missing status for blocked on background task
> >         - Alias defines for firmware update status to show that the duplication
> >           of values is intentional
> >   - "nvdimm/ocxl: Register a character device for userspace to interact with"
> >         - Add lock around minors IDR, delete the cdev before
> device_unregister
> >         - Propogate errors up from called functions in probe()
> >   - "nvdimm/ocxl: Add support for Admin commands"
> >         - Fix typo in setup_command_data error message, and drop 'ocxl' from
> it
> >         - Drop vestigial CHI read from admin_command_request
> >         - Change command ID mismatch message to dev_err, and return an
> error
> >         - Use jiffies to implement admin_command_complete_timeout()
> >         - Flesh out blurb
> >         - Create a wrapper to issue the command & wait for timeout
> >   - "nvdimm/ocxl: Add support for near storage commands"
> >         - dropped (will submit with the patches for nvdimm overwrite)
> >   - "nvdimm/ocxl: Implement the Read Error Log command"
> >         - Remove stray blank line
> >         - change misplaced goto to an early exit in read_error_log
> >         - Inline error_log_offset_0x08
> >         - Read WWID data as LE rather than host endian
> >         - Move the include of nvdimm/ocxlpmem.h to ocxl.c
> >         - Add padding after fwrevision in struct ioctl_ocxl_pmem_error_log
> >         - Register IOCTL magic
> >         - Coerce pointers to __u64 in IOCTLs
> >   - "nvdimm/ocxl: Add controller dump IOCTLs"
> >         - Coerce pointers to __u64 in IOCTLs
> >         - Document expected IOCTL usage in blurb
> >         - Add missing rc check
> >         - Only populate up to the number of bytes returned by the card,
> >           and return this length to the caller
> >         - Add missing header check
> >   - "nvdimm/ocxl: Add an IOCTL to report controller statistics"
> >         - Update to match the latest version of the spec
> >         - Verify that parametr block IDs & lengths match what we expect
> >         - Use defines for offsets
> >   - "nvdimm/ocxl: Forward events to userspace"
> >         - Don't enable NSCRA doorbell
> >         - return -EBUSY if the event context is already used
> >         - return -ENODEV if IRQs cannot be mapped
> >         - Tag IRQ pointers with __iomem
> >         - Drop ocxlpmem_ prefix from static functions
> >         - Propogate error from eventfd_ctx_fdget
> >         - Fix error check in copy_to_user
> >         - Drop GLOBAL_MMIO_CHI_NSCRA (this should be in the overwrite
> patch)
> >         - Drop unused irq_pgmap
> >         - Don't redef BIT_ULL
> >   - "nvdimm/ocxl: Add debug IOCTLs"
> >         - Eliminate clearing loop (now done in admin_command_execute()
> >         - Drop dummy IOCTLs if CONFIG_OCXL_PMEM_DEBUG is not set
> >         - Group debug IOCTLs together & comment that they may not be
> available
> >   - "nvdimm/ocxl: Expose SMART data via ndctl"
> >         - Drop 'rc = 0; goto out;'
> >         - Propogate errors from ndctl_smart()
> >   - "nvdimm/ocxl: Expose the serial number in sysfs" & "nvdimm/ocxl:
> Expose the firmware version in sysfs"
> >         - Squash these 2 patches together
> >         - Expose data as a DIMM attribute rather than an ocxlpmem
> >           attribute
> >   - "nvdimm/ocxl: Add an IOCTL to request controller health & perf data"
> >         - Reword blurb
> >   - "nvdimm/ocxl: Implement the heartbeat command"
> >         - Propogate rc in probe()
> >
> > V3:
> >   - Rebase against next/next-20200220
> >   - Move driver to arch/powerpc/platforms/powernv, we now expect this
> >     driver to go upstream via the powerpc tree
> >   - "nvdimm/ocxl: Implement the Read Error Log command"
> >         - Fix bad header path
> >   - "nvdimm/ocxl: Read the capability registers & wait for device ready"
> >         - Fix overlapping masks between readiness_timeout &
> memory_available_timeout
> >   - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> >         - Address minor review comments from Jonathan Cameron
> >         - Remove attributes
> >         - Default to module if building LIBNVDIMM
> >         - Propogate errors up from called functions in probe()
> >   - "nvdimm/ocxl: Expose SMART data via ndctl"
> >         - Pack attributes in struct
> >         - Support different size SMART buffers for compatibility with newer
> >           ndctls that may want more SMART attribs than we provide
> >         - Rework to to use ND_CMD_CALL instead of ND_CMD_SMART
> >   - drop "ocxl: Free detached contexts in ocxl_context_detach_all()"
> >   - "powerpc: Map & release OpenCAPI LPC memory"
> >         - Remove 'extern'
> >         - Only available with CONFIG_MEMORY_HOTPLUG_SPARSE
> >   - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> >         - Address minor review comments from Jonathan Cameron
> >   - "ocxl: Add functions to map/unmap LPC memory"
> >         - Split detected memory message into a separate patch
> >         - Address minor review comments from Jonathan Cameron
> >         - Add a comment explaining why unmap_lpc_mem is in
> deconfigure_afu
> >   - "nvdimm/ocxl: Add support for Admin commands"
> >         - use sizeof(u64) rather than 0x08 when iterating u64s
> >   - "nvdimm/ocxl: Implement the heartbeat command"
> >         - Fix typo in blurb
> >   - Address kernel doc issues
> >   - Ensure all uapi headers use C89 compatible comments
> >   - Drop patches for firmware update & overwrite, these will be
> >     submitted later once patches are available for ndctl
> >   - Rename SCM to OpenCAPI Persistent Memory
> >
> > V2:
> >   - "powerpc: Map & release OpenCAPI LPC memory"
> >       - Fix #if -> #ifdef
> >       - use pci_dev_id to get the bdfn
> >       - use __be64 to hold be data
> >       - indent check_hotplug_memory_addressable correctly
> >       - Remove export of check_hotplug_memory_addressable
> >   - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> >       - Improve patch description and remove redundant default
> >   - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
> >       - Mark a few funcs as static as identified by the 0day bot
> >       - Add OCXL dependancies to OCXL_SCM
> >       - Use memcpy_mcsafe in scm_ndctl_config_read
> >       - Rename scm_foo_offset_0x00 to scm_foo_header_parse & add docs
> >       - Name DIMM attribs "ocxl" rather than "scm"
> >       - Split out into base + many feature patches
> >   - "powerpc: Enable OpenCAPI Storage Class Memory driver on bare
> metal"
> >       - Build DEV_DAX & friends as modules
> >   - "ocxl: Conditionally bind SCM devices to the generic OCXL driver"
> >       - Patch dropped (easy enough to maintain this out of tree for
> development)
> >   - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
> >       - Add a warning if an unmatched lpc_release is called
> >   - "ocxl: Add functions to map/unmap LPC memory"
> >       - Use EXPORT_SYMBOL_GPL
> >
> >
> > Alastair D'Silva (25):
> >   powerpc/powernv: Add OPAL calls for LPC memory alloc/release
> >   mm/memory_hotplug: Allow check_hotplug_memory_addressable to be
> called
> >     from drivers
> >   powerpc/powernv: Map & release OpenCAPI LPC memory
> >   ocxl: Remove unnecessary externs
> >   ocxl: Address kernel doc errors & warnings
> >   ocxl: Tally up the LPC memory on a link & allow it to be mapped
> >   ocxl: Add functions to map/unmap LPC memory
> >   ocxl: Emit a log message showing how much LPC memory was detected
> >   ocxl: Save the device serial number in ocxl_fn
> >   nvdimm: Add driver for OpenCAPI Persistent Memory
> >   powerpc: Enable the OpenCAPI Persistent Memory driver for
> >     powernv_defconfig
> >   nvdimm/ocxl: Add register addresses & status values to the header
> >   nvdimm/ocxl: Read the capability registers & wait for device ready
> >   nvdimm/ocxl: Add support for Admin commands
> >   nvdimm/ocxl: Register a character device for userspace to interact
> >     with
> >   nvdimm/ocxl: Implement the Read Error Log command
> >   nvdimm/ocxl: Add controller dump IOCTLs
> >   nvdimm/ocxl: Add an IOCTL to report controller statistics
> >   nvdimm/ocxl: Forward events to userspace
> >   nvdimm/ocxl: Add an IOCTL to request controller health & perf data
> >   nvdimm/ocxl: Implement the heartbeat command
> >   nvdimm/ocxl: Add debug IOCTLs
> >   nvdimm/ocxl: Expose SMART data via ndctl
> >   nvdimm/ocxl: Expose the serial number & firmware version in sysfs
> >   MAINTAINERS: Add myself & nvdimm/ocxl to ocxl
> >
> >  .../userspace-api/ioctl/ioctl-number.rst      |    1 +
> >  MAINTAINERS                                   |    3 +
> >  arch/powerpc/configs/powernv_defconfig        |    5 +
> >  arch/powerpc/include/asm/opal-api.h           |    2 +
> >  arch/powerpc/include/asm/opal.h               |    2 +
> >  arch/powerpc/include/asm/pnv-ocxl.h           |   42 +-
> >  arch/powerpc/platforms/powernv/ocxl.c         |   43 +
> >  arch/powerpc/platforms/powernv/opal-call.c    |    2 +
> >  drivers/misc/ocxl/config.c                    |   74 +-
> >  drivers/misc/ocxl/core.c                      |   61 +
> >  drivers/misc/ocxl/link.c                      |   60 +
> >  drivers/misc/ocxl/ocxl_internal.h             |   45 +-
> >  drivers/nvdimm/Kconfig                        |    2 +
> >  drivers/nvdimm/Makefile                       |    1 +
> >  drivers/nvdimm/ocxl/Kconfig                   |   21 +
> >  drivers/nvdimm/ocxl/Makefile                  |    7 +
> >  drivers/nvdimm/ocxl/main.c                    | 1975 +++++++++++++++++
> >  drivers/nvdimm/ocxl/ocxlpmem.h                |  197 ++
> >  drivers/nvdimm/ocxl/ocxlpmem_internal.c       |  280 +++
> >  include/linux/memory_hotplug.h                |    5 +
> >  include/misc/ocxl.h                           |  122 +-
> >  include/uapi/linux/ndctl.h                    |    1 +
> >  include/uapi/nvdimm/ocxlpmem.h                |  127 ++
> >  mm/memory_hotplug.c                           |    4 +-
> >  24 files changed, 2983 insertions(+), 99 deletions(-)  create mode
> > 100644 drivers/nvdimm/ocxl/Kconfig  create mode 100644
> > drivers/nvdimm/ocxl/Makefile  create mode 100644
> > drivers/nvdimm/ocxl/main.c  create mode 100644
> > drivers/nvdimm/ocxl/ocxlpmem.h  create mode 100644
> > drivers/nvdimm/ocxl/ocxlpmem_internal.c
> >  create mode 100644 include/uapi/nvdimm/ocxlpmem.h
> >
> > --
> > 2.24.1
> >


-- 
Alastair D'Silva           mob: 0423 762 819
skype: alastair_dsilva     msn: alastair at d-silva.org
blog: http://alastair.d-silva.org    Twitter: @EvilDeece



More information about the Linuxppc-dev mailing list