[Skiboot] [PATCH v14 00/10] skiboot: OPAL support for IMC instrumentation

Madhavan Srinivasan maddy at linux.vnet.ibm.com
Wed Jun 28 14:44:02 AEST 2017



On Tuesday 27 June 2017 02:55 PM, Stewart Smith wrote:
> Madhavan Srinivasan <maddy at linux.vnet.ibm.com> writes:
>> ** No OPAL API changes in this version of the patchset. Added a patch
>> (patch 10) to handle imc dt node parsering (no changes to rest of the
>> patches) **
>>
>> Patchset adds support for In Memory Collection instrumentation (IMC)
>> services in OPAL for Power9. The entire IMC infrastructure consists of
>> two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
>> level) and core imc pmus (core level).
>>
>> Nest IMC PMUs are off core but on chip. And these can be accessed via
>> in-band scoms. Programming these counters and accumulating the counter
>> data to memory is done via microcode running in one of the OCC Engines.
>>
>> Core IMC PMUs handle the per-core counters. These are initialized with
>> per-core PDBARs, HTM_MODE and EVENT_MASK scoms.
>>
>> This patchset is to add nest and core IMC instrumentation support in the
>> OPAL side.
>>
>> "IMA_CATALOG" partition in PNOR contains multiple device tree binaries
>> (DTB) in a compressed form with PVR tag. So, when loading the pnor
>> partition, OPAL passes the system PVR as a "subid" to the load_resource
>> API. If a catalog dtb is found for a given pvr, it is decompressed and
>> linked to the main device tree.
>>
>> Commit which adds the partition to PNOR is :
>> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>>
>> The root node of a IMC catalog device tree contains nodes for the IMC
>> PMUs and the common events across the PMUs. Here is an excerpt from
>> the device tree :
>> /dts-v1/;
>>
>> / {
>>          name = "";
>>          compatible = "ibm,opal-in-memory-counters";
>>          #address-cells = <0x1>;
>>          #size-cells = <0x1>;
>>          version-id = "";
>>
>>          NEST_MCS: nest-mcs-events {
>>                  #address-cells = <0x1>;
>>                  #size-cells = <0x1>;
>>
>>                  event at 0 {
>>                          event-name = "RRTO_QFULL_NO_DISP" ;
>>                          reg = <0x0 0x8>;
>>                          desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
>>                  };
>>                  event at 8 {
>>                          event-name = "WRTO_QFULL_NO_DISP" ;
>>                          reg = <0x8 0x8>;
>>                          desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
>>                  };
>> 		[...]
>>          mcs01 {
>>                  compatible = "ibm,imc-counters";
>>                  events-prefix = "PM_MCS01_";
>>                  unit = "";
>>                  scale = "";
>>                  reg = <0x118 0x8>;
>>                  events = < &NEST_MCS >;
>> 		type = <0x10>;
>>          };
>>          mcs23 {
>>                  compatible = "ibm,imc-counters";
>>                  events-prefix = "PM_MCS23_";
>>                  unit = "";
>>                  scale = "";
>>                  reg = <0x198 0x8>;
>>                  events = < &NEST_MCS >;
>> 		type = <0x10>;
>>          };
>> 	[...]
>>
>> 	CORE_EVENTS: core-events {
>>                  #address-cells = <0x1>;
>>                  #size-cells = <0x1>;
>>
>>                  event at e0 {
>>                          event-name = "0THRD_NON_IDLE_PCYC" ;
>>                          reg = <0xe0 0x8>;
>>                          desc = "The number of processor cycles when all threads are idle" ;
>>                  };
>>                  event at 120 {
>>                          event-name = "1THRD_NON_IDLE_PCYC" ;
>>                          reg = <0x120 0x8>;
>>                          desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
>>                  };
>> 		[...]
>>          core {
>>                  compatible = "ibm,imc-counters";
>>                  events-prefix = "CPM_";
>>                  unit = "";
>>                  scale = "";
>>                  reg = <0x0 0x8>;
>>                  events = < &CORE_EVENTS >;
>> 		type = <0x4>;
>>          };
>>
>>          thread {
>>                  compatible = "ibm,imc-counters";
>>                  events-prefix = "CPM_";
>>                  unit = "";
>>                  scale = "";
>>                  reg = <0x0 0x8>;
>>                  events = < &CORE_EVENTS >;
>> 		type = <0x1>;
>>          };
>> };
>>
>> IMC Catalog DTS:
>> 	https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts
>> (recent suggested device node changes are not yet updated to the link)
>>
>> For any IMC PMU node (mcs0, mcs1, mcs2, core, thread etc), its events
>> property points to the events node which gives us the event
>> information for that PMU.
>> For e.g., let's take the mcs0 PMU node from the above excerpt, "events"
>> property points us to the events list for this PMU and "events-prefix"
>> property helps us to create the correct event name for this PMU. So,
>> "RRTO_QFULL_NO_DISP" event name from "nest-mcs-events" becomes
>> "PM_MCS0_RRTO_QFULL_NO_DISP" for PMU mcs0 and
>> "PM_MCS1_RRTO_QFULL_NO_DISP" for PMU mcs1.
>> This new design of the DTS file saves up a lot of space for the device
>> tree, since a lot of event names are common across PMUs. For core and
>> thread IMC PMUs, all the event names are common.
>>
>> Each event in the device tree contains "event-name" and "offset".
>> Some of the PMUs may contain properties such as "scale" and "unit" which
>> reflects the fact that all the events inside this PMU will have the
>> same "scale" and "unit" values.
>>
>> Why this design for the IMC DTS files?
>> The DTS files for Power 9 contain the IMC PMUs for nest, core and thread
>> IMC PMUs. There could be an argument to design the device tree
>> in such a way, so that one can use of_translate_address() directly on
>> the event nodes and can get the cpu address for that event. However,
>> there are some issues with that.
>> For nest imc, we need to attach the device tree to per-chip HOMER region
>> node. For multiple chips, this will increase replication.
>> For core imc, we allocate the memory in the kernel for each core and the
>> base location for core imc is not fixed. Hence, we can't use
>> of_translate_address on the core events.
>> For thread imc, we allocate memory for each linux process which needs to
>> be monitored. This will be particularly difficult to take care of in
>> the device tree since, the allocation will be dynamic.
>>
>> So, from the OPAL side, we need to :
>>   - Find out the current processor's PVR.
>>   - Fetch the IMC catalog pnor partition.
>>   - Fetch the correct subpartition based on the current processor's PVR.
>>   - Decompress the blob taken from the subpartition.
>>   - Expand the (now uncompressed) device tree binary, fixup the phandle and
>>     attach it to the system's device tree, so that, it can now be discovered
>>     by the kernel.
>>   - Look at the IMC availability vector which denotes which of the nest
>>     PMUs are available and remove the unavailable PMU nodes from the
>>     device tree.
>>
>> Note that :
>>   - Since OPAL lacks a xz decompression library, an xz decompression
>>     library has been add from http://tukaani.org/xz/embedded.html
>>     (http://git.tukaani.org/?p=xz-embedded.git;a=blob;f=COPYING;h=fc4fbf798d09c4341926b5f19ae2b996d2d4557b;hb=e75f4eb79165213a02d567940d344f5c2ff1be03).
>>
>> This Patchset does 2 things :
>>
>> 1) At the time of boot, it detects the IMA_CATALOG resource. Based on
>>     the current processor's PVR value, it fetches the appropriate
>>     subpartition. The blob in this subpartition is then uncompressed and the
>>     flattened device tree is obtained. This dtb is then expanded and then
>>     linked to the system's device tree under
>>     "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
>>     created in this patchset. The kernel can then discover this node based
>>     on its compatibility field.
>>
>> 2) It implements opal calls to initialize, enable and disable the IMC counters
>>     as specified the host kernel.
>>
>> This patchset is based on the initial work for Nest Instrumentation done
>> by Madhavan Srinivasan, which can be found here :
>> (https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).
>>
>> Changelog :
>> v13 -> v14:
>>   - Added patch to the end of the series (patch 10)
>>     to handle imc dt node parsering for mcs*.
>>   - No other changes in the series.
>>
>> v12 -> v13:
>>   - Added more documentation and code comments
>>   - Merged patches 8 and 9.
>>   - Modified _start and _stop calls to take another parameter.
>>   - Modified decompress function to look more of memcpy style parameter input.
>>   - Added new helper functions for node parser
>>   - Added a new device tree parser node to detect and remove unknown imc type
>>   - Removed Acked-by from patch 1 since made a change to fix the warning at
>>     doc compilation.
>>
>> v11 -> v12:
>>   - Dropped the patch to add chip-id to reserve-memory
>>   - Modified the _INIT call to carry additional parameter (cpu_pir)
>>   - Modified the _INIT call to update scom based on input cpu_pir
>>   - Updated the opal api docs
>>   - Added test for dt_fixup function in core/test/run-device.c
>>   - Added new function to update base-addr and chip-id array to nest nodes
>>   - Updated commit messages and added more code comments
>>
>> v10 -> v11:
>>   - Fix the return value for _INIT call incase of nest type
>>
>> v9 -> v10:
>>   - Implemented the phandle fixup function using a single pass dt loop
>>   - Removed the hash functions and added more comments to the code
>>   - separated imc catalog preloading from imc_init() as suggested
>>   - Made changes to document files
>>   - v8 has dropped a patch which was added in this series
>>
>> v8 -> v9:
>>   - Changed the opal call APIs for nest and core counters.
>>   - Implemented the fixup phandler using hash primitive, instead
>>     of linked list.
>>   - Made changes in commit messages.
>>   - Changed NEST_IMC_RESUME to NEST_IMC_RUNNING.
>>   - Made changes in opal-call documentations.
>>
>> v7 -> v8:
>>   - Rebased to latest upstream
>>   - Made changes to commit messages
>>   - Removed NEST_IMC_PRODUCTION_MODE and added OPAL_NEST_IMC_PRODUCTION_MODE
>>
>> v6 -> v7:
>>   - libxz -- removing the hostboot header from code
>>   - Made changes in commit messages.
>>
>> v5 -> v6:
>>   - Added a set of new dt_fixup_* functions to handle phandle
>>     in the incoming tree.
>>   - Removed the nest_imc.h and move the nest_pmc[] to imc.c
>>   - Updated macro names and values as suggested
>>   - Fixed disable_unavailable_units() to work with incoming tree
>>     and not system dt.
>>   - rearranged the pacthes to have homer region update patch to be first
>>   - Made changes to commit messages.
>>
>>   v4 -> v5:
>>   - Changed the cover letter to show the new IMC DTS format (which removes
>>     duplication).
>>   - No visible changes in the code.
>>
>>   v3 -> v4:
>>   Major Changes include :
>>   - Patchset now has support for core level IMC PMUs support.
>>
>>   v2 -> v3 :
>>   Major changes include
>>   - Addressed review comments from Oliver O'Halloran.
>>   - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
>>     (In-Memory Collection), since, the name IMA conflicts with existing
>>     IMA (Integrity Measurement Architecture) in the linux kernel.
>>   - Patches 2 and 4 have been merged together (3/6).
>>   - Patch 3 (xz library) has been moved to Patch 2/6.
>>
>> Anju T Sudhakar (2):
>>    skiboot: Add opal calls to init/start/stop IMC devices
>>    skiboot: Add documentation for IMC opal call
>>
>> Hemant Kumar (2):
>>    skiboot: Nest IMC macro definitions
>>    skiboot: Add a library for xz
>>
>> Madhavan Srinivasan (6):
>>    skiboot/doc: Add doc/imc.rst documentation
>>    skiboot/doc: Add devicetree binding document for IMC
>>    dt: Add helper function for last_phandle updates
>>    dt: Add phandle fixup helpers
>>    skiboot: Find the IMC DTB
>>    skiboot: Handle combined units node in the imc dt
>>
>>   Makefile.main                      |    5 +-
>>   core/device.c                      |   42 +-
>>   core/fdt.c                         |    8 +-
>>   core/flash.c                       |    1 +
>>   core/init.c                        |    7 +
>>   core/test/run-device.c             |   35 +-
>>   doc/device-tree/imc.rst            |   72 +++
>>   doc/imc.rst                        |   54 ++
>>   doc/index.rst                      |    1 +
>>   doc/opal-api/opal-imc-counters.rst |   87 +++
>>   hw/Makefile.inc                    |    2 +-
>>   hw/imc.c                           |  681 +++++++++++++++++++++
>>   include/chip.h                     |   11 +
>>   include/device.h                   |   20 +
>>   include/imc.h                      |  142 +++++
>>   include/opal-api.h                 |   12 +-
>>   include/platform.h                 |    1 +
>>   libxz/Makefile.inc                 |    7 +
>>   libxz/xz.h                         |  304 ++++++++++
>>   libxz/xz_config.h                  |  124 ++++
>>   libxz/xz_crc32.c                   |   59 ++
>>   libxz/xz_dec_lzma2.c               | 1171 ++++++++++++++++++++++++++++++++++++
>>   libxz/xz_dec_stream.c              |  847 ++++++++++++++++++++++++++
>>   libxz/xz_lzma2.h                   |  204 +++++++
>>   libxz/xz_private.h                 |  156 +++++
>>   libxz/xz_stream.h                  |   62 ++
>>   26 files changed, 4103 insertions(+), 12 deletions(-)
>>   create mode 100644 doc/device-tree/imc.rst
>>   create mode 100644 doc/imc.rst
>>   create mode 100644 doc/opal-api/opal-imc-counters.rst
>>   create mode 100644 hw/imc.c
>>   create mode 100644 include/imc.h
>>   create mode 100644 libxz/Makefile.inc
>>   create mode 100644 libxz/xz.h
>>   create mode 100644 libxz/xz_config.h
>>   create mode 100644 libxz/xz_crc32.c
>>   create mode 100644 libxz/xz_dec_lzma2.c
>>   create mode 100644 libxz/xz_dec_stream.c
>>   create mode 100644 libxz/xz_lzma2.h
>>   create mode 100644 libxz/xz_private.h
>>   create mode 100644 libxz/xz_stream.h
> Thanks!
>
> I've merged the series to master as of
> a1e0a047b2a01c4ad18796151685d41f747af273 with two minor fixes. There was
> a bug in the failure path of loading the IMA_CATALOG that I fixed (in
> what would arguably be a bug in the resource loading code, if you
> wait_for_resource on a resource you didn't start preloading, you hang
> rather than error out). Additionally, I moved the hw/imc.c code over to
> do pr_fmt rather than hardcode IMC in all the prerror() messages.

Thanks stewart for the fix.

>
> I know there's a bunch of firmware pieces that need to settle before all
> of the bits work okay here, but I think we should be good to go enough
> to look at the kernel bits now.
>
>



More information about the Skiboot mailing list