[Skiboot] [PATCH v4 0/8] skiboot: OPAL support for IMC instrumentation

Hemant Kumar hemant at linux.vnet.ibm.com
Wed Jan 4 16:50:32 AEDT 2017


Patchset adds support for In Memory Collection instrumentation (IMC)
services in OPAL for Power9. The entire IMC infrastructure consists of
two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
level) and core imc pmus (core level).

Nest IMC PMUs are off core but on chip. And these can be accessed via
in-band scoms. Programming these counters and accumulating the counter
data to memory is done via microcode running in one of the OCC Engines.

Core IMC PMUs handle the per-core counters. These are initialized with
per-core PDBARs, HTM_MODE and EVENT_MASK scoms.

This patchset is to add nest and core IMC instrumentation support in the
OPAL side.

"IMA_CATALOG" partition in PNOR contains multiple device tree binaries
(DTB) in a compressed form with PVR tag. So, when loading the IMA_CATALOG
partition, OPAL passes the system PVR as a "subid" to the load_resource
API. If a catalog dtb found for a given pvr, it is decompressed and
linked to the main device tree. 

Commit which adds the "IMA_CATALOG" partition to PNOR is :
https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84

Each event node in the device tree contains "event-name" and "offset".
Some of the PMUs may contain properties such as "scale" and "unit" which
reflects the fact that all the events inside this PMU will have the
same "scale" and "unit" values.

https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts
talks about the DTS file for power9. This has the nest, core and thread
level IMC PMU information and their events.

An excerpt from the dtb showing "mcs" pmu node and two of its event nodes:

/dts-v1/;

/ {
        name = "";
        compatible = "ibm,opal-in-memory-counters";
        #address-cells = <0x1>;
        #size-cells = <0x1>;
        ima-nest-offset = <0x320000>;
        ima-nest-size = <0x30000>;
        version-id = "";

        mcs0 {
                compatible = "ibm,ima-counters-nest";
                ranges;
                #address-cells = <0x1>;
                #size-cells = <0x1>;
                unit = "MiB" ;
                scale = "1.2207e-4" ;

                event at 118 {
                        event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ;
                        reg = <0x118 0x8>;
                        desc = "Total Read Bandwidth seen on both MCS of MC0" ;
                };
[SNIP]

Why this design for the IMC DTS files?
The DTS files for Power 9 contain the IMC PMUs for nest, core and thread
IMC PMUs. There could be an argument to design the device tree
in such a way, so that one can use of_translate_address() directly on
the event nodes and can get the cpu address for that event. However,
there are some issues with that.
For nest imc, we need to attach the device tree to per-chip HOMER region
node. For multiple chips, this will increase replication.
For core imc, we allocate the memory in the kernel for each core and the
base location for core imc is not fixed. Hence, we can't use
of_translate_address on the core events.
For thread imc, we allocate memory for each linux process which needs to
be monitored. This will be particularly difficult to take care of in
the device tree since, the allocation will be dynamic.

So, from the OPAL side, we need to :
 - Find out the current processor's PVR.
 - Fetch the "IMA_CATALOG" partition.
 - Fetch the correct subpartition based on the current processor's PVR.
 - Decompress the blob taken from the subpartition.
 - Expand the (now uncompressed) device tree binary and attach it to the
   system's device tree, so that, it can now be discovered by the
   kernel.
 - Look at the IMC availability vector which denotes which of the nest
   PMUs are available and remove the unavailable PMU nodes from the
   device tree.

Note that :
 - The Catalog team is working on upstreaming the DTS files.
 - The commit which adds the IMA_CATALOG partition to PNOR is mentioned
   above.
 - Since OPAL lacks a xz decompression library, an xz decompression
   library has been reused from the hostboot repo (link has been
   mentioned in patch 3/7).
 - This patchset is for base enablement for IMC and hence, only contains
   the nest IMC support.
 - The last patch in the series is to add "chip-id" to reserved homer
   region node in the device tree. This will give us the homer region's
   associated chip in the kernel (which will be needed to fetch the
   counter values from the required chip).

This Patchset does 3 things :

1) At the time of boot, it detects the IMA_CATALOG resource. Based on
   the current processor's PVR value, it fetches the appropriate
   subpartition. The blob in this subpartition is then uncompressed and the
   flattened device tree is obtained. This dtb is then expanded and then
   linked to the system's device tree under
   "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
   created in this patchset. The kernel can then discover this node based
   on its compatibility field.

2) It implements an opal call to control a microcode running in one of the
   OCC engines (responsible for nest IMC data collection) from kernel to
   start/stop Nest PMU counter data collection.

3) It also implements an opal call to control the core IMC engine for each
   core to initialize, enable and disable the counters as specified the
   host kernel.

This patchset is based on the initial work for Nest Instrumentation done
by Madhavan Srinivasan, which can be found here :
(https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).

Changelog :
 v3 -> v4:
 Major Changes include :
 - Patchset now has support for core level IMC PMUs support.

 v2 -> v3 :
 Major changes include
 - Addressed review comments from Oliver O'Halloran.
 - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
   (In-Memory Collection), since, the name IMA conflicts with existing
   IMA (Integrity Measurement Architecture) in the linux kernel.
 - Patches 2 and 4 have been merged together (3/6).
 - Patch 3 (xz library) has been moved to Patch 2/6.

Changes since v1 have been mentioned in the individual patches.

Hemant Kumar (7):
  skiboot: Nest IMC macro definitions
  skiboot: Add a library for xz
  skiboot: Find the IMC DTB
  skiboot: Add opal call to enable/disable Nest IMC microcode
  skiboot: Add core IMC related counter configuration OPAL call
  skiboot: Add documentation for nest IMC opal call
  skiboot: Add documentation for the Core IMC opal call

Vasant Hegde (1):
  skiboot: Add chip id to HOMER reserved region

 Makefile.main                           |    5 +-
 core/flash.c                            |    1 +
 core/init.c                             |    7 +
 doc/opal-api/opal-core-imc-counters.rst |   40 ++
 doc/opal-api/opal-nest-imc-counters.rst |   49 ++
 hw/Makefile.inc                         |    2 +-
 hw/homer.c                              |   30 +
 hw/imc.c                                |  392 ++++++++++
 include/imc.h                           |  127 ++++
 include/nest_imc.h                      |   85 +++
 include/opal-api.h                      |   17 +-
 include/platform.h                      |    1 +
 include/skiboot.h                       |    1 +
 libxz/Makefile.inc                      |    7 +
 libxz/xz.h                              |  312 ++++++++
 libxz/xz_config.h                       |  133 ++++
 libxz/xz_crc32.c                        |   67 ++
 libxz/xz_dec_lzma2.c                    | 1183 +++++++++++++++++++++++++++++++
 libxz/xz_dec_stream.c                   |  855 ++++++++++++++++++++++
 libxz/xz_lzma2.h                        |  212 ++++++
 libxz/xz_private.h                      |  164 +++++
 libxz/xz_stream.h                       |   70 ++
 22 files changed, 3756 insertions(+), 4 deletions(-)
 create mode 100644 doc/opal-api/opal-core-imc-counters.rst
 create mode 100644 doc/opal-api/opal-nest-imc-counters.rst
 create mode 100644 hw/imc.c
 create mode 100644 include/imc.h
 create mode 100644 include/nest_imc.h
 create mode 100644 libxz/Makefile.inc
 create mode 100644 libxz/xz.h
 create mode 100644 libxz/xz_config.h
 create mode 100644 libxz/xz_crc32.c
 create mode 100644 libxz/xz_dec_lzma2.c
 create mode 100644 libxz/xz_dec_stream.c
 create mode 100644 libxz/xz_lzma2.h
 create mode 100644 libxz/xz_private.h
 create mode 100644 libxz/xz_stream.h

-- 
2.7.4



More information about the Skiboot mailing list