[Skiboot] [PATCH v3 0/6] IMC Instrumentation Support

Madhavan Srinivasan maddy at linux.vnet.ibm.com
Tue Jan 24 12:15:52 AEDT 2017


Hi ben/stewart,

Any comments on this patchset.

Maddy

On Monday 05 December 2016 11:40 PM, Hemant Kumar wrote:
> Patchset adds support for In Memory Collection instrumentation (IMC)
> services in OPAL for Power9. The entire IMC infrastructure consists of
> two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
> level) and core imc pmus (core level).
>
> Nest IMC PMUs are off core but on chip. And these can be accessed via
> in-band scoms. Programming these counters and accumulating the counter
> data to memory is done via microcode running in one of the OCC Engines.
>
> This patchset is to add nest IMC instrumentation support in the OPAL
> side.
>
> "IMA_CATALOG" partition in PNOR contains multiple device tree binaries
> (DTB) in a compressed form with PVR tag. So, when loading the IMA_CATALOG
> partition, OPAL passes the system PVR as a "subid" to the load_resource
> API. If a catalog dtb found for a given pvr, it is decompressed and
> linked to the main device tree.
>
> Commit which adds the "IMA_CATALOG" partition to PNOR is :
> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>
> Each event node in the device tree contains "event-name" and "offset".
> Some of the PMUs may contain properties such as "scale" and "unit" which
> reflects the fact that all the events inside this PMU will have the
> same "scale" and "unit" values.
>
> https://github.com/open-power/ima-catalog/commit/99b73ee691fb5273f502e879e07603c510db5f7a
> talks about the DTS file for power8. For power9, it has a similar
> format and adds more units for nest, core and thread level PMUs.
>
> An excerpt from the dtb showing "mcs" pmu node and two of its event nodes:
>
> /dts-v1/;
>
> / {
>          name = "";
>          compatible = "ibm,opal-in-memory-counters";
>          #address-cells = <0x1>;
>          #size-cells = <0x1>;
>          ima-nest-offset = <0x320000>;
>          ima-nest-size = <0x30000>;
>          version-id = "";
>
>          mcs0 {
>                  compatible = "ibm,ima-counters-nest";
>                  ranges;
>                  #address-cells = <0x1>;
>                  #size-cells = <0x1>;
>                  unit = "MiB" ;
>                  scale = "1.2207e-4" ;
>
>                  event at 118 {
>                          event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ;
>                          reg = <0x118 0x8>;
>                          desc = "Total Read Bandwidth seen on both MCS of MC0" ;
>                  };
> [SNIP]
>
> Why this design for the IMC DTS files?
> These DTS files for now contain PMUs only for Nest (i.e., chip). But,
> going forward, the DTS files for Power 9 will contain the IMC PMUs for
> core and thread as well. An argument could be to design the device tree
> in such a way, so that one can use of_translate_address() directly on
> the event nodes and can get the cpu address for that event. However,
> there are some issues with that.
> For nest imc, we need to attach the device tree to per-chip HOMER region
> node. For multiple chips, this will increase replication.
> For core imc, we allocate the memory in the kernel for each core and the
> base location for core imc is not fixed. Hence, we can't use
> of_translate_address on the core events.
> For thread imc, we allocate memory for each linux process which needs to
> be monitored. This will be particularly difficult to take care of in
> the device tree since, the allocation will be dynamic.
>
> So, from the OPAL side, we need to :
>   - Find out the current processor's PVR.
>   - Fetch the "IMA_CATALOG" partition.
>   - Fetch the correct subpartition based on the current processor's PVR.
>   - Decompress the blob taken from the subpartition.
>   - Expand the (now uncompressed) device tree binary and attach it to the
>     system's device tree, so that, it can now be discovered by the
>     kernel.
>   - Look at the IMC availability vector which denotes which of the nest
>     PMUs are available and remove the unavailable PMU nodes from the
>     device tree.
>
> Note that :
>   - The Catalog team is working on upstreaming the DTS files.
>   - The commit which adds the IMA_CATALOG partition to PNOR is mentioned
>     above.
>   - Since OPAL lacks a xz decompression library, an xz decompression
>     library has been reused from the hostboot repo (link has been
>     mentioned in patch 3/7).
>   - This patchset is for base enablement for IMC and hence, only contains
>     the nest IMC support.
>   - The last patch in the series is to add "chip-id" to reserved homer
>     region node in the device tree. This will give us the homer region's
>     associated chip in the kernel (which will be needed to fetch the
>     counter values from the required chip).
>
> This Patchset does a couple of things :
>
> 1) At the time of boot, it detects the IMA_CATALOG resource. Based on
>     the current processor's PVR value, it fetches the appropriate
>     subpartition. The blob in this subpartition is then uncompressed and the
>     flattened device tree is obtained. This dtb is then expanded and then
>     linked to the system's device tree under
>     "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
>     created in this patchset. The kernel can then discover this node based
>     on its compatibility field.
>
> 2) It implements an opal call to control a microcode running in one of the
>     OCC engines (responsible for nest IMC data collection) from kernel to
>     start/stop Nest PMU counter data collection.
>
> This patchset is based on the initial work for Nest Instrumentation done
> by Madhavan Srinivasan, which can be found here :
> (https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).
>
> TODOs:
>   - Add support for Core IMC.
>
> Changelog :
>   v2 -> v3 :
>   Major changes include
>   - Addressed review comments from Oliver O'Halloran.
>   - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
>     (In-Memory Collection), since, the name IMA conflicts with existing
>     IMA (Integrity Measurement Architecture) in the linux kernel.
>   - Patches 2 and 4 have been merged together (3/6).
>   - Patch 3 (xz library) has been moved to Patch 2/6.
>
> Changes since v1 have been mentioned in the individual patches.
>
> Hemant Kumar (5):
>    skiboot: Nest IMC macro definitions
>    skiboot: Add a library for xz
>    skiboot: Find the IMC DTB
>    skiboot: Add opal call to enable/disable Nest IMC microcode
>    skiboot: Add documentation for nest IMC opal call
>
> Vasant Hegde (1):
>    skiboot: Add chip id to HOMER reserved region
>
>   Makefile.main                           |    5 +-
>   core/flash.c                            |    1 +
>   core/init.c                             |    7 +
>   doc/opal-api/opal-nest-ima-counters.rst |   49 ++
>   hw/Makefile.inc                         |    2 +-
>   hw/homer.c                              |   30 +
>   hw/imc.c                                |  243 +++++++
>   include/imc.h                           |  117 +++
>   include/nest_imc.h                      |   85 +++
>   include/opal-api.h                      |    9 +-
>   include/platform.h                      |    1 +
>   include/skiboot.h                       |    1 +
>   libxz/Makefile.inc                      |    7 +
>   libxz/xz.h                              |  312 ++++++++
>   libxz/xz_config.h                       |  133 ++++
>   libxz/xz_crc32.c                        |   67 ++
>   libxz/xz_dec_lzma2.c                    | 1183 +++++++++++++++++++++++++++++++
>   libxz/xz_dec_stream.c                   |  855 ++++++++++++++++++++++
>   libxz/xz_lzma2.h                        |  212 ++++++
>   libxz/xz_private.h                      |  164 +++++
>   libxz/xz_stream.h                       |   70 ++
>   21 files changed, 3549 insertions(+), 4 deletions(-)
>   create mode 100644 doc/opal-api/opal-nest-ima-counters.rst
>   create mode 100644 hw/imc.c
>   create mode 100644 include/imc.h
>   create mode 100644 include/nest_imc.h
>   create mode 100644 libxz/Makefile.inc
>   create mode 100644 libxz/xz.h
>   create mode 100644 libxz/xz_config.h
>   create mode 100644 libxz/xz_crc32.c
>   create mode 100644 libxz/xz_dec_lzma2.c
>   create mode 100644 libxz/xz_dec_stream.c
>   create mode 100644 libxz/xz_lzma2.h
>   create mode 100644 libxz/xz_private.h
>   create mode 100644 libxz/xz_stream.h
>



More information about the Skiboot mailing list