[Skiboot] [PATCH v14 00/10] skiboot: OPAL support for IMC instrumentation

Stewart Smith stewart at linux.vnet.ibm.com
Tue Jun 27 19:25:10 AEST 2017


Madhavan Srinivasan <maddy at linux.vnet.ibm.com> writes:
> ** No OPAL API changes in this version of the patchset. Added a patch
> (patch 10) to handle imc dt node parsering (no changes to rest of the
> patches) **
>
> Patchset adds support for In Memory Collection instrumentation (IMC)
> services in OPAL for Power9. The entire IMC infrastructure consists of
> two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
> level) and core imc pmus (core level).
>
> Nest IMC PMUs are off core but on chip. And these can be accessed via
> in-band scoms. Programming these counters and accumulating the counter
> data to memory is done via microcode running in one of the OCC Engines.
>
> Core IMC PMUs handle the per-core counters. These are initialized with
> per-core PDBARs, HTM_MODE and EVENT_MASK scoms.
>
> This patchset is to add nest and core IMC instrumentation support in the
> OPAL side.
>
> "IMA_CATALOG" partition in PNOR contains multiple device tree binaries
> (DTB) in a compressed form with PVR tag. So, when loading the pnor
> partition, OPAL passes the system PVR as a "subid" to the load_resource
> API. If a catalog dtb is found for a given pvr, it is decompressed and
> linked to the main device tree.
>
> Commit which adds the partition to PNOR is :
> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>
> The root node of a IMC catalog device tree contains nodes for the IMC
> PMUs and the common events across the PMUs. Here is an excerpt from
> the device tree :
> /dts-v1/;
>
> / {
>         name = "";
>         compatible = "ibm,opal-in-memory-counters";
>         #address-cells = <0x1>;
>         #size-cells = <0x1>;
>         version-id = "";
>
>         NEST_MCS: nest-mcs-events {
>                 #address-cells = <0x1>;
>                 #size-cells = <0x1>;
>
>                 event at 0 {
>                         event-name = "RRTO_QFULL_NO_DISP" ;
>                         reg = <0x0 0x8>;
>                         desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
>                 };
>                 event at 8 {
>                         event-name = "WRTO_QFULL_NO_DISP" ;
>                         reg = <0x8 0x8>;
>                         desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
>                 };
> 		[...]
>         mcs01 {
>                 compatible = "ibm,imc-counters";
>                 events-prefix = "PM_MCS01_";
>                 unit = "";
>                 scale = "";
>                 reg = <0x118 0x8>;
>                 events = < &NEST_MCS >;
> 		type = <0x10>;
>         };
>         mcs23 {
>                 compatible = "ibm,imc-counters";
>                 events-prefix = "PM_MCS23_";
>                 unit = "";
>                 scale = "";
>                 reg = <0x198 0x8>;
>                 events = < &NEST_MCS >;
> 		type = <0x10>;
>         };
> 	[...]
>
> 	CORE_EVENTS: core-events {
>                 #address-cells = <0x1>;
>                 #size-cells = <0x1>;
>
>                 event at e0 {
>                         event-name = "0THRD_NON_IDLE_PCYC" ;
>                         reg = <0xe0 0x8>;
>                         desc = "The number of processor cycles when all threads are idle" ;
>                 };
>                 event at 120 {
>                         event-name = "1THRD_NON_IDLE_PCYC" ;
>                         reg = <0x120 0x8>;
>                         desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
>                 };
> 		[...]
>         core {
>                 compatible = "ibm,imc-counters";
>                 events-prefix = "CPM_";
>                 unit = "";
>                 scale = "";
>                 reg = <0x0 0x8>;
>                 events = < &CORE_EVENTS >;
> 		type = <0x4>;
>         };
>
>         thread {
>                 compatible = "ibm,imc-counters";
>                 events-prefix = "CPM_";
>                 unit = "";
>                 scale = "";
>                 reg = <0x0 0x8>;
>                 events = < &CORE_EVENTS >;
> 		type = <0x1>;
>         };
> };
>
> IMC Catalog DTS:
> 	https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts
> (recent suggested device node changes are not yet updated to the link)
>
> For any IMC PMU node (mcs0, mcs1, mcs2, core, thread etc), its events
> property points to the events node which gives us the event
> information for that PMU.
> For e.g., let's take the mcs0 PMU node from the above excerpt, "events"
> property points us to the events list for this PMU and "events-prefix"
> property helps us to create the correct event name for this PMU. So,
> "RRTO_QFULL_NO_DISP" event name from "nest-mcs-events" becomes
> "PM_MCS0_RRTO_QFULL_NO_DISP" for PMU mcs0 and
> "PM_MCS1_RRTO_QFULL_NO_DISP" for PMU mcs1.
> This new design of the DTS file saves up a lot of space for the device
> tree, since a lot of event names are common across PMUs. For core and
> thread IMC PMUs, all the event names are common.
>
> Each event in the device tree contains "event-name" and "offset".
> Some of the PMUs may contain properties such as "scale" and "unit" which
> reflects the fact that all the events inside this PMU will have the
> same "scale" and "unit" values.
>
> Why this design for the IMC DTS files?
> The DTS files for Power 9 contain the IMC PMUs for nest, core and thread
> IMC PMUs. There could be an argument to design the device tree
> in such a way, so that one can use of_translate_address() directly on
> the event nodes and can get the cpu address for that event. However,
> there are some issues with that.
> For nest imc, we need to attach the device tree to per-chip HOMER region
> node. For multiple chips, this will increase replication.
> For core imc, we allocate the memory in the kernel for each core and the
> base location for core imc is not fixed. Hence, we can't use
> of_translate_address on the core events.
> For thread imc, we allocate memory for each linux process which needs to
> be monitored. This will be particularly difficult to take care of in
> the device tree since, the allocation will be dynamic.
>
> So, from the OPAL side, we need to :
>  - Find out the current processor's PVR.
>  - Fetch the IMC catalog pnor partition.
>  - Fetch the correct subpartition based on the current processor's PVR.
>  - Decompress the blob taken from the subpartition.
>  - Expand the (now uncompressed) device tree binary, fixup the phandle and
>    attach it to the system's device tree, so that, it can now be discovered
>    by the kernel.
>  - Look at the IMC availability vector which denotes which of the nest
>    PMUs are available and remove the unavailable PMU nodes from the
>    device tree.
>
> Note that :
>  - Since OPAL lacks a xz decompression library, an xz decompression
>    library has been add from http://tukaani.org/xz/embedded.html
>    (http://git.tukaani.org/?p=xz-embedded.git;a=blob;f=COPYING;h=fc4fbf798d09c4341926b5f19ae2b996d2d4557b;hb=e75f4eb79165213a02d567940d344f5c2ff1be03).
>
> This Patchset does 2 things :
>
> 1) At the time of boot, it detects the IMA_CATALOG resource. Based on
>    the current processor's PVR value, it fetches the appropriate
>    subpartition. The blob in this subpartition is then uncompressed and the
>    flattened device tree is obtained. This dtb is then expanded and then
>    linked to the system's device tree under
>    "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
>    created in this patchset. The kernel can then discover this node based
>    on its compatibility field.
>
> 2) It implements opal calls to initialize, enable and disable the IMC counters
>    as specified the host kernel.
>
> This patchset is based on the initial work for Nest Instrumentation done
> by Madhavan Srinivasan, which can be found here :
> (https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).
>
> Changelog :
> v13 -> v14:
>  - Added patch to the end of the series (patch 10)
>    to handle imc dt node parsering for mcs*.
>  - No other changes in the series.
>
> v12 -> v13:
>  - Added more documentation and code comments
>  - Merged patches 8 and 9.
>  - Modified _start and _stop calls to take another parameter.
>  - Modified decompress function to look more of memcpy style parameter input.
>  - Added new helper functions for node parser
>  - Added a new device tree parser node to detect and remove unknown imc type
>  - Removed Acked-by from patch 1 since made a change to fix the warning at
>    doc compilation.
>
> v11 -> v12:
>  - Dropped the patch to add chip-id to reserve-memory
>  - Modified the _INIT call to carry additional parameter (cpu_pir)
>  - Modified the _INIT call to update scom based on input cpu_pir
>  - Updated the opal api docs
>  - Added test for dt_fixup function in core/test/run-device.c
>  - Added new function to update base-addr and chip-id array to nest nodes
>  - Updated commit messages and added more code comments
>
> v10 -> v11:
>  - Fix the return value for _INIT call incase of nest type
>
> v9 -> v10:
>  - Implemented the phandle fixup function using a single pass dt loop
>  - Removed the hash functions and added more comments to the code
>  - separated imc catalog preloading from imc_init() as suggested
>  - Made changes to document files
>  - v8 has dropped a patch which was added in this series
>
> v8 -> v9:
>  - Changed the opal call APIs for nest and core counters.
>  - Implemented the fixup phandler using hash primitive, instead
>    of linked list.
>  - Made changes in commit messages.
>  - Changed NEST_IMC_RESUME to NEST_IMC_RUNNING.
>  - Made changes in opal-call documentations.
>
> v7 -> v8:
>  - Rebased to latest upstream
>  - Made changes to commit messages
>  - Removed NEST_IMC_PRODUCTION_MODE and added OPAL_NEST_IMC_PRODUCTION_MODE
>
> v6 -> v7:
>  - libxz -- removing the hostboot header from code
>  - Made changes in commit messages.
>
> v5 -> v6:
>  - Added a set of new dt_fixup_* functions to handle phandle
>    in the incoming tree.
>  - Removed the nest_imc.h and move the nest_pmc[] to imc.c
>  - Updated macro names and values as suggested
>  - Fixed disable_unavailable_units() to work with incoming tree
>    and not system dt.
>  - rearranged the pacthes to have homer region update patch to be first
>  - Made changes to commit messages.
>
>  v4 -> v5:
>  - Changed the cover letter to show the new IMC DTS format (which removes
>    duplication).
>  - No visible changes in the code.
>
>  v3 -> v4:
>  Major Changes include :
>  - Patchset now has support for core level IMC PMUs support.
>
>  v2 -> v3 :
>  Major changes include
>  - Addressed review comments from Oliver O'Halloran.
>  - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
>    (In-Memory Collection), since, the name IMA conflicts with existing
>    IMA (Integrity Measurement Architecture) in the linux kernel.
>  - Patches 2 and 4 have been merged together (3/6).
>  - Patch 3 (xz library) has been moved to Patch 2/6.
>
> Anju T Sudhakar (2):
>   skiboot: Add opal calls to init/start/stop IMC devices
>   skiboot: Add documentation for IMC opal call
>
> Hemant Kumar (2):
>   skiboot: Nest IMC macro definitions
>   skiboot: Add a library for xz
>
> Madhavan Srinivasan (6):
>   skiboot/doc: Add doc/imc.rst documentation
>   skiboot/doc: Add devicetree binding document for IMC
>   dt: Add helper function for last_phandle updates
>   dt: Add phandle fixup helpers
>   skiboot: Find the IMC DTB
>   skiboot: Handle combined units node in the imc dt
>
>  Makefile.main                      |    5 +-
>  core/device.c                      |   42 +-
>  core/fdt.c                         |    8 +-
>  core/flash.c                       |    1 +
>  core/init.c                        |    7 +
>  core/test/run-device.c             |   35 +-
>  doc/device-tree/imc.rst            |   72 +++
>  doc/imc.rst                        |   54 ++
>  doc/index.rst                      |    1 +
>  doc/opal-api/opal-imc-counters.rst |   87 +++
>  hw/Makefile.inc                    |    2 +-
>  hw/imc.c                           |  681 +++++++++++++++++++++
>  include/chip.h                     |   11 +
>  include/device.h                   |   20 +
>  include/imc.h                      |  142 +++++
>  include/opal-api.h                 |   12 +-
>  include/platform.h                 |    1 +
>  libxz/Makefile.inc                 |    7 +
>  libxz/xz.h                         |  304 ++++++++++
>  libxz/xz_config.h                  |  124 ++++
>  libxz/xz_crc32.c                   |   59 ++
>  libxz/xz_dec_lzma2.c               | 1171 ++++++++++++++++++++++++++++++++++++
>  libxz/xz_dec_stream.c              |  847 ++++++++++++++++++++++++++
>  libxz/xz_lzma2.h                   |  204 +++++++
>  libxz/xz_private.h                 |  156 +++++
>  libxz/xz_stream.h                  |   62 ++
>  26 files changed, 4103 insertions(+), 12 deletions(-)
>  create mode 100644 doc/device-tree/imc.rst
>  create mode 100644 doc/imc.rst
>  create mode 100644 doc/opal-api/opal-imc-counters.rst
>  create mode 100644 hw/imc.c
>  create mode 100644 include/imc.h
>  create mode 100644 libxz/Makefile.inc
>  create mode 100644 libxz/xz.h
>  create mode 100644 libxz/xz_config.h
>  create mode 100644 libxz/xz_crc32.c
>  create mode 100644 libxz/xz_dec_lzma2.c
>  create mode 100644 libxz/xz_dec_stream.c
>  create mode 100644 libxz/xz_lzma2.h
>  create mode 100644 libxz/xz_private.h
>  create mode 100644 libxz/xz_stream.h

Thanks!

I've merged the series to master as of
a1e0a047b2a01c4ad18796151685d41f747af273 with two minor fixes. There was
a bug in the failure path of loading the IMA_CATALOG that I fixed (in
what would arguably be a bug in the resource loading code, if you
wait_for_resource on a resource you didn't start preloading, you hang
rather than error out). Additionally, I moved the hw/imc.c code over to
do pr_fmt rather than hardcode IMC in all the prerror() messages.

I know there's a bunch of firmware pieces that need to settle before all
of the bits work okay here, but I think we should be good to go enough
to look at the kernel bits now.


-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list