[Skiboot] [PATCH v14 00/10] skiboot: OPAL support for IMC instrumentation
Stewart Smith
stewart at linux.vnet.ibm.com
Tue Jun 27 19:25:10 AEST 2017
Madhavan Srinivasan <maddy at linux.vnet.ibm.com> writes:
> ** No OPAL API changes in this version of the patchset. Added a patch
> (patch 10) to handle imc dt node parsering (no changes to rest of the
> patches) **
>
> Patchset adds support for In Memory Collection instrumentation (IMC)
> services in OPAL for Power9. The entire IMC infrastructure consists of
> two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
> level) and core imc pmus (core level).
>
> Nest IMC PMUs are off core but on chip. And these can be accessed via
> in-band scoms. Programming these counters and accumulating the counter
> data to memory is done via microcode running in one of the OCC Engines.
>
> Core IMC PMUs handle the per-core counters. These are initialized with
> per-core PDBARs, HTM_MODE and EVENT_MASK scoms.
>
> This patchset is to add nest and core IMC instrumentation support in the
> OPAL side.
>
> "IMA_CATALOG" partition in PNOR contains multiple device tree binaries
> (DTB) in a compressed form with PVR tag. So, when loading the pnor
> partition, OPAL passes the system PVR as a "subid" to the load_resource
> API. If a catalog dtb is found for a given pvr, it is decompressed and
> linked to the main device tree.
>
> Commit which adds the partition to PNOR is :
> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>
> The root node of a IMC catalog device tree contains nodes for the IMC
> PMUs and the common events across the PMUs. Here is an excerpt from
> the device tree :
> /dts-v1/;
>
> / {
> name = "";
> compatible = "ibm,opal-in-memory-counters";
> #address-cells = <0x1>;
> #size-cells = <0x1>;
> version-id = "";
>
> NEST_MCS: nest-mcs-events {
> #address-cells = <0x1>;
> #size-cells = <0x1>;
>
> event at 0 {
> event-name = "RRTO_QFULL_NO_DISP" ;
> reg = <0x0 0x8>;
> desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
> };
> event at 8 {
> event-name = "WRTO_QFULL_NO_DISP" ;
> reg = <0x8 0x8>;
> desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
> };
> [...]
> mcs01 {
> compatible = "ibm,imc-counters";
> events-prefix = "PM_MCS01_";
> unit = "";
> scale = "";
> reg = <0x118 0x8>;
> events = < &NEST_MCS >;
> type = <0x10>;
> };
> mcs23 {
> compatible = "ibm,imc-counters";
> events-prefix = "PM_MCS23_";
> unit = "";
> scale = "";
> reg = <0x198 0x8>;
> events = < &NEST_MCS >;
> type = <0x10>;
> };
> [...]
>
> CORE_EVENTS: core-events {
> #address-cells = <0x1>;
> #size-cells = <0x1>;
>
> event at e0 {
> event-name = "0THRD_NON_IDLE_PCYC" ;
> reg = <0xe0 0x8>;
> desc = "The number of processor cycles when all threads are idle" ;
> };
> event at 120 {
> event-name = "1THRD_NON_IDLE_PCYC" ;
> reg = <0x120 0x8>;
> desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
> };
> [...]
> core {
> compatible = "ibm,imc-counters";
> events-prefix = "CPM_";
> unit = "";
> scale = "";
> reg = <0x0 0x8>;
> events = < &CORE_EVENTS >;
> type = <0x4>;
> };
>
> thread {
> compatible = "ibm,imc-counters";
> events-prefix = "CPM_";
> unit = "";
> scale = "";
> reg = <0x0 0x8>;
> events = < &CORE_EVENTS >;
> type = <0x1>;
> };
> };
>
> IMC Catalog DTS:
> https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts
> (recent suggested device node changes are not yet updated to the link)
>
> For any IMC PMU node (mcs0, mcs1, mcs2, core, thread etc), its events
> property points to the events node which gives us the event
> information for that PMU.
> For e.g., let's take the mcs0 PMU node from the above excerpt, "events"
> property points us to the events list for this PMU and "events-prefix"
> property helps us to create the correct event name for this PMU. So,
> "RRTO_QFULL_NO_DISP" event name from "nest-mcs-events" becomes
> "PM_MCS0_RRTO_QFULL_NO_DISP" for PMU mcs0 and
> "PM_MCS1_RRTO_QFULL_NO_DISP" for PMU mcs1.
> This new design of the DTS file saves up a lot of space for the device
> tree, since a lot of event names are common across PMUs. For core and
> thread IMC PMUs, all the event names are common.
>
> Each event in the device tree contains "event-name" and "offset".
> Some of the PMUs may contain properties such as "scale" and "unit" which
> reflects the fact that all the events inside this PMU will have the
> same "scale" and "unit" values.
>
> Why this design for the IMC DTS files?
> The DTS files for Power 9 contain the IMC PMUs for nest, core and thread
> IMC PMUs. There could be an argument to design the device tree
> in such a way, so that one can use of_translate_address() directly on
> the event nodes and can get the cpu address for that event. However,
> there are some issues with that.
> For nest imc, we need to attach the device tree to per-chip HOMER region
> node. For multiple chips, this will increase replication.
> For core imc, we allocate the memory in the kernel for each core and the
> base location for core imc is not fixed. Hence, we can't use
> of_translate_address on the core events.
> For thread imc, we allocate memory for each linux process which needs to
> be monitored. This will be particularly difficult to take care of in
> the device tree since, the allocation will be dynamic.
>
> So, from the OPAL side, we need to :
> - Find out the current processor's PVR.
> - Fetch the IMC catalog pnor partition.
> - Fetch the correct subpartition based on the current processor's PVR.
> - Decompress the blob taken from the subpartition.
> - Expand the (now uncompressed) device tree binary, fixup the phandle and
> attach it to the system's device tree, so that, it can now be discovered
> by the kernel.
> - Look at the IMC availability vector which denotes which of the nest
> PMUs are available and remove the unavailable PMU nodes from the
> device tree.
>
> Note that :
> - Since OPAL lacks a xz decompression library, an xz decompression
> library has been add from http://tukaani.org/xz/embedded.html
> (http://git.tukaani.org/?p=xz-embedded.git;a=blob;f=COPYING;h=fc4fbf798d09c4341926b5f19ae2b996d2d4557b;hb=e75f4eb79165213a02d567940d344f5c2ff1be03).
>
> This Patchset does 2 things :
>
> 1) At the time of boot, it detects the IMA_CATALOG resource. Based on
> the current processor's PVR value, it fetches the appropriate
> subpartition. The blob in this subpartition is then uncompressed and the
> flattened device tree is obtained. This dtb is then expanded and then
> linked to the system's device tree under
> "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
> created in this patchset. The kernel can then discover this node based
> on its compatibility field.
>
> 2) It implements opal calls to initialize, enable and disable the IMC counters
> as specified the host kernel.
>
> This patchset is based on the initial work for Nest Instrumentation done
> by Madhavan Srinivasan, which can be found here :
> (https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).
>
> Changelog :
> v13 -> v14:
> - Added patch to the end of the series (patch 10)
> to handle imc dt node parsering for mcs*.
> - No other changes in the series.
>
> v12 -> v13:
> - Added more documentation and code comments
> - Merged patches 8 and 9.
> - Modified _start and _stop calls to take another parameter.
> - Modified decompress function to look more of memcpy style parameter input.
> - Added new helper functions for node parser
> - Added a new device tree parser node to detect and remove unknown imc type
> - Removed Acked-by from patch 1 since made a change to fix the warning at
> doc compilation.
>
> v11 -> v12:
> - Dropped the patch to add chip-id to reserve-memory
> - Modified the _INIT call to carry additional parameter (cpu_pir)
> - Modified the _INIT call to update scom based on input cpu_pir
> - Updated the opal api docs
> - Added test for dt_fixup function in core/test/run-device.c
> - Added new function to update base-addr and chip-id array to nest nodes
> - Updated commit messages and added more code comments
>
> v10 -> v11:
> - Fix the return value for _INIT call incase of nest type
>
> v9 -> v10:
> - Implemented the phandle fixup function using a single pass dt loop
> - Removed the hash functions and added more comments to the code
> - separated imc catalog preloading from imc_init() as suggested
> - Made changes to document files
> - v8 has dropped a patch which was added in this series
>
> v8 -> v9:
> - Changed the opal call APIs for nest and core counters.
> - Implemented the fixup phandler using hash primitive, instead
> of linked list.
> - Made changes in commit messages.
> - Changed NEST_IMC_RESUME to NEST_IMC_RUNNING.
> - Made changes in opal-call documentations.
>
> v7 -> v8:
> - Rebased to latest upstream
> - Made changes to commit messages
> - Removed NEST_IMC_PRODUCTION_MODE and added OPAL_NEST_IMC_PRODUCTION_MODE
>
> v6 -> v7:
> - libxz -- removing the hostboot header from code
> - Made changes in commit messages.
>
> v5 -> v6:
> - Added a set of new dt_fixup_* functions to handle phandle
> in the incoming tree.
> - Removed the nest_imc.h and move the nest_pmc[] to imc.c
> - Updated macro names and values as suggested
> - Fixed disable_unavailable_units() to work with incoming tree
> and not system dt.
> - rearranged the pacthes to have homer region update patch to be first
> - Made changes to commit messages.
>
> v4 -> v5:
> - Changed the cover letter to show the new IMC DTS format (which removes
> duplication).
> - No visible changes in the code.
>
> v3 -> v4:
> Major Changes include :
> - Patchset now has support for core level IMC PMUs support.
>
> v2 -> v3 :
> Major changes include
> - Addressed review comments from Oliver O'Halloran.
> - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
> (In-Memory Collection), since, the name IMA conflicts with existing
> IMA (Integrity Measurement Architecture) in the linux kernel.
> - Patches 2 and 4 have been merged together (3/6).
> - Patch 3 (xz library) has been moved to Patch 2/6.
>
> Anju T Sudhakar (2):
> skiboot: Add opal calls to init/start/stop IMC devices
> skiboot: Add documentation for IMC opal call
>
> Hemant Kumar (2):
> skiboot: Nest IMC macro definitions
> skiboot: Add a library for xz
>
> Madhavan Srinivasan (6):
> skiboot/doc: Add doc/imc.rst documentation
> skiboot/doc: Add devicetree binding document for IMC
> dt: Add helper function for last_phandle updates
> dt: Add phandle fixup helpers
> skiboot: Find the IMC DTB
> skiboot: Handle combined units node in the imc dt
>
> Makefile.main | 5 +-
> core/device.c | 42 +-
> core/fdt.c | 8 +-
> core/flash.c | 1 +
> core/init.c | 7 +
> core/test/run-device.c | 35 +-
> doc/device-tree/imc.rst | 72 +++
> doc/imc.rst | 54 ++
> doc/index.rst | 1 +
> doc/opal-api/opal-imc-counters.rst | 87 +++
> hw/Makefile.inc | 2 +-
> hw/imc.c | 681 +++++++++++++++++++++
> include/chip.h | 11 +
> include/device.h | 20 +
> include/imc.h | 142 +++++
> include/opal-api.h | 12 +-
> include/platform.h | 1 +
> libxz/Makefile.inc | 7 +
> libxz/xz.h | 304 ++++++++++
> libxz/xz_config.h | 124 ++++
> libxz/xz_crc32.c | 59 ++
> libxz/xz_dec_lzma2.c | 1171 ++++++++++++++++++++++++++++++++++++
> libxz/xz_dec_stream.c | 847 ++++++++++++++++++++++++++
> libxz/xz_lzma2.h | 204 +++++++
> libxz/xz_private.h | 156 +++++
> libxz/xz_stream.h | 62 ++
> 26 files changed, 4103 insertions(+), 12 deletions(-)
> create mode 100644 doc/device-tree/imc.rst
> create mode 100644 doc/imc.rst
> create mode 100644 doc/opal-api/opal-imc-counters.rst
> create mode 100644 hw/imc.c
> create mode 100644 include/imc.h
> create mode 100644 libxz/Makefile.inc
> create mode 100644 libxz/xz.h
> create mode 100644 libxz/xz_config.h
> create mode 100644 libxz/xz_crc32.c
> create mode 100644 libxz/xz_dec_lzma2.c
> create mode 100644 libxz/xz_dec_stream.c
> create mode 100644 libxz/xz_lzma2.h
> create mode 100644 libxz/xz_private.h
> create mode 100644 libxz/xz_stream.h
Thanks!
I've merged the series to master as of
a1e0a047b2a01c4ad18796151685d41f747af273 with two minor fixes. There was
a bug in the failure path of loading the IMA_CATALOG that I fixed (in
what would arguably be a bug in the resource loading code, if you
wait_for_resource on a resource you didn't start preloading, you hang
rather than error out). Additionally, I moved the hw/imc.c code over to
do pr_fmt rather than hardcode IMC in all the prerror() messages.
I know there's a bunch of firmware pieces that need to settle before all
of the bits work okay here, but I think we should be good to go enough
to look at the kernel bits now.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list