[Skiboot] [PATCH v14 00/10] skiboot: OPAL support for IMC instrumentation
Madhavan Srinivasan
maddy at linux.vnet.ibm.com
Wed Jun 28 14:44:02 AEST 2017
On Tuesday 27 June 2017 02:55 PM, Stewart Smith wrote:
> Madhavan Srinivasan <maddy at linux.vnet.ibm.com> writes:
>> ** No OPAL API changes in this version of the patchset. Added a patch
>> (patch 10) to handle imc dt node parsering (no changes to rest of the
>> patches) **
>>
>> Patchset adds support for In Memory Collection instrumentation (IMC)
>> services in OPAL for Power9. The entire IMC infrastructure consists of
>> two kinds of Performance Monitoring Units (PMUs) : nest imc pmus (chip
>> level) and core imc pmus (core level).
>>
>> Nest IMC PMUs are off core but on chip. And these can be accessed via
>> in-band scoms. Programming these counters and accumulating the counter
>> data to memory is done via microcode running in one of the OCC Engines.
>>
>> Core IMC PMUs handle the per-core counters. These are initialized with
>> per-core PDBARs, HTM_MODE and EVENT_MASK scoms.
>>
>> This patchset is to add nest and core IMC instrumentation support in the
>> OPAL side.
>>
>> "IMA_CATALOG" partition in PNOR contains multiple device tree binaries
>> (DTB) in a compressed form with PVR tag. So, when loading the pnor
>> partition, OPAL passes the system PVR as a "subid" to the load_resource
>> API. If a catalog dtb is found for a given pvr, it is decompressed and
>> linked to the main device tree.
>>
>> Commit which adds the partition to PNOR is :
>> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>>
>> The root node of a IMC catalog device tree contains nodes for the IMC
>> PMUs and the common events across the PMUs. Here is an excerpt from
>> the device tree :
>> /dts-v1/;
>>
>> / {
>> name = "";
>> compatible = "ibm,opal-in-memory-counters";
>> #address-cells = <0x1>;
>> #size-cells = <0x1>;
>> version-id = "";
>>
>> NEST_MCS: nest-mcs-events {
>> #address-cells = <0x1>;
>> #size-cells = <0x1>;
>>
>> event at 0 {
>> event-name = "RRTO_QFULL_NO_DISP" ;
>> reg = <0x0 0x8>;
>> desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
>> };
>> event at 8 {
>> event-name = "WRTO_QFULL_NO_DISP" ;
>> reg = <0x8 0x8>;
>> desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
>> };
>> [...]
>> mcs01 {
>> compatible = "ibm,imc-counters";
>> events-prefix = "PM_MCS01_";
>> unit = "";
>> scale = "";
>> reg = <0x118 0x8>;
>> events = < &NEST_MCS >;
>> type = <0x10>;
>> };
>> mcs23 {
>> compatible = "ibm,imc-counters";
>> events-prefix = "PM_MCS23_";
>> unit = "";
>> scale = "";
>> reg = <0x198 0x8>;
>> events = < &NEST_MCS >;
>> type = <0x10>;
>> };
>> [...]
>>
>> CORE_EVENTS: core-events {
>> #address-cells = <0x1>;
>> #size-cells = <0x1>;
>>
>> event at e0 {
>> event-name = "0THRD_NON_IDLE_PCYC" ;
>> reg = <0xe0 0x8>;
>> desc = "The number of processor cycles when all threads are idle" ;
>> };
>> event at 120 {
>> event-name = "1THRD_NON_IDLE_PCYC" ;
>> reg = <0x120 0x8>;
>> desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
>> };
>> [...]
>> core {
>> compatible = "ibm,imc-counters";
>> events-prefix = "CPM_";
>> unit = "";
>> scale = "";
>> reg = <0x0 0x8>;
>> events = < &CORE_EVENTS >;
>> type = <0x4>;
>> };
>>
>> thread {
>> compatible = "ibm,imc-counters";
>> events-prefix = "CPM_";
>> unit = "";
>> scale = "";
>> reg = <0x0 0x8>;
>> events = < &CORE_EVENTS >;
>> type = <0x1>;
>> };
>> };
>>
>> IMC Catalog DTS:
>> https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts
>> (recent suggested device node changes are not yet updated to the link)
>>
>> For any IMC PMU node (mcs0, mcs1, mcs2, core, thread etc), its events
>> property points to the events node which gives us the event
>> information for that PMU.
>> For e.g., let's take the mcs0 PMU node from the above excerpt, "events"
>> property points us to the events list for this PMU and "events-prefix"
>> property helps us to create the correct event name for this PMU. So,
>> "RRTO_QFULL_NO_DISP" event name from "nest-mcs-events" becomes
>> "PM_MCS0_RRTO_QFULL_NO_DISP" for PMU mcs0 and
>> "PM_MCS1_RRTO_QFULL_NO_DISP" for PMU mcs1.
>> This new design of the DTS file saves up a lot of space for the device
>> tree, since a lot of event names are common across PMUs. For core and
>> thread IMC PMUs, all the event names are common.
>>
>> Each event in the device tree contains "event-name" and "offset".
>> Some of the PMUs may contain properties such as "scale" and "unit" which
>> reflects the fact that all the events inside this PMU will have the
>> same "scale" and "unit" values.
>>
>> Why this design for the IMC DTS files?
>> The DTS files for Power 9 contain the IMC PMUs for nest, core and thread
>> IMC PMUs. There could be an argument to design the device tree
>> in such a way, so that one can use of_translate_address() directly on
>> the event nodes and can get the cpu address for that event. However,
>> there are some issues with that.
>> For nest imc, we need to attach the device tree to per-chip HOMER region
>> node. For multiple chips, this will increase replication.
>> For core imc, we allocate the memory in the kernel for each core and the
>> base location for core imc is not fixed. Hence, we can't use
>> of_translate_address on the core events.
>> For thread imc, we allocate memory for each linux process which needs to
>> be monitored. This will be particularly difficult to take care of in
>> the device tree since, the allocation will be dynamic.
>>
>> So, from the OPAL side, we need to :
>> - Find out the current processor's PVR.
>> - Fetch the IMC catalog pnor partition.
>> - Fetch the correct subpartition based on the current processor's PVR.
>> - Decompress the blob taken from the subpartition.
>> - Expand the (now uncompressed) device tree binary, fixup the phandle and
>> attach it to the system's device tree, so that, it can now be discovered
>> by the kernel.
>> - Look at the IMC availability vector which denotes which of the nest
>> PMUs are available and remove the unavailable PMU nodes from the
>> device tree.
>>
>> Note that :
>> - Since OPAL lacks a xz decompression library, an xz decompression
>> library has been add from http://tukaani.org/xz/embedded.html
>> (http://git.tukaani.org/?p=xz-embedded.git;a=blob;f=COPYING;h=fc4fbf798d09c4341926b5f19ae2b996d2d4557b;hb=e75f4eb79165213a02d567940d344f5c2ff1be03).
>>
>> This Patchset does 2 things :
>>
>> 1) At the time of boot, it detects the IMA_CATALOG resource. Based on
>> the current processor's PVR value, it fetches the appropriate
>> subpartition. The blob in this subpartition is then uncompressed and the
>> flattened device tree is obtained. This dtb is then expanded and then
>> linked to the system's device tree under
>> "/proc/device-tree/ima-counters". The node "ima-counters" is a new node
>> created in this patchset. The kernel can then discover this node based
>> on its compatibility field.
>>
>> 2) It implements opal calls to initialize, enable and disable the IMC counters
>> as specified the host kernel.
>>
>> This patchset is based on the initial work for Nest Instrumentation done
>> by Madhavan Srinivasan, which can be found here :
>> (https://lists.ozlabs.org/pipermail/skiboot/2016-March/002999.html).
>>
>> Changelog :
>> v13 -> v14:
>> - Added patch to the end of the series (patch 10)
>> to handle imc dt node parsering for mcs*.
>> - No other changes in the series.
>>
>> v12 -> v13:
>> - Added more documentation and code comments
>> - Merged patches 8 and 9.
>> - Modified _start and _stop calls to take another parameter.
>> - Modified decompress function to look more of memcpy style parameter input.
>> - Added new helper functions for node parser
>> - Added a new device tree parser node to detect and remove unknown imc type
>> - Removed Acked-by from patch 1 since made a change to fix the warning at
>> doc compilation.
>>
>> v11 -> v12:
>> - Dropped the patch to add chip-id to reserve-memory
>> - Modified the _INIT call to carry additional parameter (cpu_pir)
>> - Modified the _INIT call to update scom based on input cpu_pir
>> - Updated the opal api docs
>> - Added test for dt_fixup function in core/test/run-device.c
>> - Added new function to update base-addr and chip-id array to nest nodes
>> - Updated commit messages and added more code comments
>>
>> v10 -> v11:
>> - Fix the return value for _INIT call incase of nest type
>>
>> v9 -> v10:
>> - Implemented the phandle fixup function using a single pass dt loop
>> - Removed the hash functions and added more comments to the code
>> - separated imc catalog preloading from imc_init() as suggested
>> - Made changes to document files
>> - v8 has dropped a patch which was added in this series
>>
>> v8 -> v9:
>> - Changed the opal call APIs for nest and core counters.
>> - Implemented the fixup phandler using hash primitive, instead
>> of linked list.
>> - Made changes in commit messages.
>> - Changed NEST_IMC_RESUME to NEST_IMC_RUNNING.
>> - Made changes in opal-call documentations.
>>
>> v7 -> v8:
>> - Rebased to latest upstream
>> - Made changes to commit messages
>> - Removed NEST_IMC_PRODUCTION_MODE and added OPAL_NEST_IMC_PRODUCTION_MODE
>>
>> v6 -> v7:
>> - libxz -- removing the hostboot header from code
>> - Made changes in commit messages.
>>
>> v5 -> v6:
>> - Added a set of new dt_fixup_* functions to handle phandle
>> in the incoming tree.
>> - Removed the nest_imc.h and move the nest_pmc[] to imc.c
>> - Updated macro names and values as suggested
>> - Fixed disable_unavailable_units() to work with incoming tree
>> and not system dt.
>> - rearranged the pacthes to have homer region update patch to be first
>> - Made changes to commit messages.
>>
>> v4 -> v5:
>> - Changed the cover letter to show the new IMC DTS format (which removes
>> duplication).
>> - No visible changes in the code.
>>
>> v3 -> v4:
>> Major Changes include :
>> - Patchset now has support for core level IMC PMUs support.
>>
>> v2 -> v3 :
>> Major changes include
>> - Addressed review comments from Oliver O'Halloran.
>> - Renamed this infrastructure from IMA (In-Memory Accumulation) to IMC
>> (In-Memory Collection), since, the name IMA conflicts with existing
>> IMA (Integrity Measurement Architecture) in the linux kernel.
>> - Patches 2 and 4 have been merged together (3/6).
>> - Patch 3 (xz library) has been moved to Patch 2/6.
>>
>> Anju T Sudhakar (2):
>> skiboot: Add opal calls to init/start/stop IMC devices
>> skiboot: Add documentation for IMC opal call
>>
>> Hemant Kumar (2):
>> skiboot: Nest IMC macro definitions
>> skiboot: Add a library for xz
>>
>> Madhavan Srinivasan (6):
>> skiboot/doc: Add doc/imc.rst documentation
>> skiboot/doc: Add devicetree binding document for IMC
>> dt: Add helper function for last_phandle updates
>> dt: Add phandle fixup helpers
>> skiboot: Find the IMC DTB
>> skiboot: Handle combined units node in the imc dt
>>
>> Makefile.main | 5 +-
>> core/device.c | 42 +-
>> core/fdt.c | 8 +-
>> core/flash.c | 1 +
>> core/init.c | 7 +
>> core/test/run-device.c | 35 +-
>> doc/device-tree/imc.rst | 72 +++
>> doc/imc.rst | 54 ++
>> doc/index.rst | 1 +
>> doc/opal-api/opal-imc-counters.rst | 87 +++
>> hw/Makefile.inc | 2 +-
>> hw/imc.c | 681 +++++++++++++++++++++
>> include/chip.h | 11 +
>> include/device.h | 20 +
>> include/imc.h | 142 +++++
>> include/opal-api.h | 12 +-
>> include/platform.h | 1 +
>> libxz/Makefile.inc | 7 +
>> libxz/xz.h | 304 ++++++++++
>> libxz/xz_config.h | 124 ++++
>> libxz/xz_crc32.c | 59 ++
>> libxz/xz_dec_lzma2.c | 1171 ++++++++++++++++++++++++++++++++++++
>> libxz/xz_dec_stream.c | 847 ++++++++++++++++++++++++++
>> libxz/xz_lzma2.h | 204 +++++++
>> libxz/xz_private.h | 156 +++++
>> libxz/xz_stream.h | 62 ++
>> 26 files changed, 4103 insertions(+), 12 deletions(-)
>> create mode 100644 doc/device-tree/imc.rst
>> create mode 100644 doc/imc.rst
>> create mode 100644 doc/opal-api/opal-imc-counters.rst
>> create mode 100644 hw/imc.c
>> create mode 100644 include/imc.h
>> create mode 100644 libxz/Makefile.inc
>> create mode 100644 libxz/xz.h
>> create mode 100644 libxz/xz_config.h
>> create mode 100644 libxz/xz_crc32.c
>> create mode 100644 libxz/xz_dec_lzma2.c
>> create mode 100644 libxz/xz_dec_stream.c
>> create mode 100644 libxz/xz_lzma2.h
>> create mode 100644 libxz/xz_private.h
>> create mode 100644 libxz/xz_stream.h
> Thanks!
>
> I've merged the series to master as of
> a1e0a047b2a01c4ad18796151685d41f747af273 with two minor fixes. There was
> a bug in the failure path of loading the IMA_CATALOG that I fixed (in
> what would arguably be a bug in the resource loading code, if you
> wait_for_resource on a resource you didn't start preloading, you hang
> rather than error out). Additionally, I moved the hw/imc.c code over to
> do pr_fmt rather than hardcode IMC in all the prerror() messages.
Thanks stewart for the fix.
>
> I know there's a bunch of firmware pieces that need to settle before all
> of the bits work okay here, but I think we should be good to go enough
> to look at the kernel bits now.
>
>
More information about the Skiboot
mailing list