[Skiboot] [PATCH v9 08/11] skiboot: Find the IMC DTB
Stewart Smith
stewart at linux.vnet.ibm.com
Mon May 1 17:31:31 AEST 2017
Anju T Sudhakar <anju at linux.vnet.ibm.com> writes:
> From: Hemant Kumar <hemant at linux.vnet.ibm.com>
>
> IMC (In Memory Collection) catalog is a repository of information
> about the Performance Monitoring Units (PMUs) and their events under
> the IMC infrastructure. The information include :
> - The PMU names
> - Event names
> - Event description
> - Event offsets
> - Event scale
> - Event unit
>
> The catalog is provided as a flattened device tree (dtb). Processors
> with different PVR values may have different PMU or event names. Hence,
> for each processor, there can be multiple device tree binaries (dtbs)
> containing the IMC information. Each of the dtb is compressed and forms
> a sub-partition inside the PNOR partition "IMA_CATALOG". Here is a link
> to the commit adding this partition to PNOR :
> https://github.com/open-power/pnor/commit/c940142c6dc64dd176096dc648f433c889919e84
>
> So, each compressed dtb forms a sub-partition inside the IMA_CATALOG
> partition and can be accessed/loaded through a sub-partition id which
> is nothing but the PVR id. Based on the current processor's PVR, the
> appropriate sub-partion will be loaded.
>
> IMA_CATALOG
> partition
> ----------- Sub-id (E.g)
> | |
> |Catalog 1| 0x100
> |---------|
> | |
> |Catalog 2| 0x200 <------ Current processor's PVR (0x200)
> |---------|
> ...
>
> In the above example, if the current processor's PVR is 0x200, catalog 2
> should be loaded.
i don't think you need to explain subpartitions here, that should be
covered in general partition loading stuff
> Note however, that the catalog information is in the form of a dtb and
> the dtb is compressed too. So, the sub-partition loaded must be
> decompressed first before we can actually use it (which is done in
> subsequent patches).
err... that's done in this patch though?
> It is important to mention here that while a PNOR image built for one
> processor is specific to only that processor and isn't portable, a
> single system generation (Processor version) may have multiple revisions
> and these revisions may have some changes in their IMC PMUs and events,
> and hence, the need for multiple IMC DTBs.
>
> The sub-partition that we obtain from the IMA_CATALOG partition is a
> compressed device tree binary. We uncompress it using the libxz's
> functions. After uncompressing it, we link the device tree binary to the
> system's device tree. The kernel can now access the device tree and get
> the IMC PMUs and their events' information.
>
> Not all the IMC PMUs listed in the device tree may be available. This is
> indicated by imc availability vector (which is a part of the IMC control
> block structure). We need to check this vector and make sure to remove
> the IMC device nodes which are unavailable.
>
> Signed-off-by: Hemant Kumar <hemant at linux.vnet.ibm.com>
> [maddy: updated nest_pmu[] struct, dt_expand_* and fixed disable_unavailable_units()]
>
> Signed-off-by: Madhavan Srinivasan <maddy at linux.vnet.ibm.com>
> Signed-off-by: Anju T Sudhakar <anju at linux.vnet.ibm.com>
> ---
> core/flash.c | 1 +
> core/init.c | 4 +
> hw/Makefile.inc | 2 +-
> hw/imc.c | 295 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> include/imc.h | 1 +
> include/platform.h | 1 +
> 6 files changed, 303 insertions(+), 1 deletion(-)
> create mode 100644 hw/imc.c
>
> diff --git a/core/flash.c b/core/flash.c
> index 793401c..e5f8452 100644
> --- a/core/flash.c
> +++ b/core/flash.c
> @@ -421,6 +421,7 @@ static struct {
> { RESOURCE_ID_KERNEL, RESOURCE_SUBID_NONE, "BOOTKERNEL" },
> { RESOURCE_ID_INITRAMFS,RESOURCE_SUBID_NONE, "ROOTFS" },
> { RESOURCE_ID_CAPP, RESOURCE_SUBID_SUPPORTED, "CAPP" },
> + { RESOURCE_ID_CATALOG, RESOURCE_SUBID_SUPPORTED, "IMA_CATALOG" },
> };
It's a bit annoying that we have IMA versus IMC here. Could we change
PNOR to have it be IMC ?
> diff --git a/core/init.c b/core/init.c
> index 6b8137c..0bca761 100644
> --- a/core/init.c
> +++ b/core/init.c
> @@ -47,6 +47,7 @@
> #include <nvram.h>
> #include <libstb/stb.h>
> #include <libstb/container.h>
> +#include <imc.h>
>
> enum proc_gen proc_gen;
> unsigned int pcie_max_link_speed;
> @@ -930,6 +931,9 @@ void __noreturn __nomcount main_cpu_entry(const void *fdt)
> /* Init SLW related stuff, including fastsleep */
> slw_init();
>
> + /* Init In-Memory Collection related stuff (load the IMC dtb into memory) */
> + imc_init();
> +
> op_display(OP_LOG, OP_MOD_INIT, 0x0002);
>
> pci_nvram_init();
> diff --git a/hw/Makefile.inc b/hw/Makefile.inc
> index b0a8b7c..0164261 100644
> --- a/hw/Makefile.inc
> +++ b/hw/Makefile.inc
> @@ -1,7 +1,7 @@
> # -*-Makefile-*-
> SUBDIRS += hw
> HW_OBJS = xscom.o chiptod.o gx.o cec.o lpc.o lpc-uart.o psi.o
> -HW_OBJS += homer.o slw.o occ.o fsi-master.o centaur.o
> +HW_OBJS += homer.o slw.o occ.o fsi-master.o centaur.o imc.o
> HW_OBJS += nx.o nx-rng.o nx-crypto.o nx-842.o
> HW_OBJS += p7ioc.o p7ioc-inits.o p7ioc-phb.o
> HW_OBJS += phb3.o sfc-ctrl.o fake-rtc.o bt.o p8-i2c.o prd.o
> diff --git a/hw/imc.c b/hw/imc.c
> new file mode 100644
> index 0000000..ad54479
> --- /dev/null
> +++ b/hw/imc.c
> @@ -0,0 +1,295 @@
> +/* Copyright 2016 IBM Corp.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> + * implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <skiboot.h>
> +#include <xscom.h>
> +#include <imc.h>
> +#include <chip.h>
> +#include <libxz/xz.h>
> +
> +/*
> + * Nest IMC PMU names along with their bit values as represented in the
> + * imc_chip_avl_vector(in struct imc_chip_cb, look at include/imc.h).
> + * nest_pmus[] is an array containing all the possible nest IMC PMU node names.
> + */
> +char const *nest_pmus[] = {
> + "powerbus0",
> + "mcs0",
> + "mcs1",
> + "mcs2",
> + "mcs3",
> + "mcs4",
> + "mcs5",
> + "mcs6",
> + "mcs7",
> + "mba0",
> + "mba1",
> + "mba2",
> + "mba3",
> + "mba4",
> + "mba5",
> + "mba6",
> + "mba7",
> + "cen0",
> + "cen1",
> + "cen2",
> + "cen3",
> + "cen4",
> + "cen5",
> + "cen6",
> + "cen7",
> + "xlink0",
> + "xlink1",
> + "xlink2",
> + "mcd0",
> + "mcd1",
> + "phb0",
> + "phb1",
> + "phb2",
> + "resvd",
> + "nx",
> + "capp0",
> + "capp1",
> + "vas",
> + "int",
> + "alink0",
> + "alink1",
> + "alink2",
> + "nvlink0",
> + "nvlink1",
> + "nvlink2",
> + "nvlink3",
> + "nvlink4",
> + "nvlink5",
> + /* reserved bits : 48 - 64 */
> +};
> +
> +void fixup_handler(hash_entry_p entry, const struct dt_property *prop);
> +
> +static struct imc_chip_cb *get_imc_cb(void)
> +{
> + uint64_t cb_loc;
> + struct proc_chip *chip = get_chip(this_cpu()->chip_id);
> +
> + cb_loc = chip->homer_base + P9_CB_STRUCT_OFFSET;
> + return (struct imc_chip_cb *)cb_loc;
> +}
> +
> +/*
> + * Decompresses the blob obtained from the IMA_CATALOG sub-partition
> + * in "buf" of size "size", assigns the uncompressed device tree
> + * binary to "fdt" and returns.
> + * Returns 0 on success and -1 on error.
> + */
> +static int decompress_subpartition(char *buf, size_t size, void
> **fdt)
I think this should be part of the generic subpartition load
infrastructure and not living in hw/imc.c, but rather over in the
core/platform.c code for loading partitions. In an ideal world, the XZ
decompression would be queued up as another CPU job.
> +{
> + struct xz_dec *s;
> + struct xz_buf b;
> + void *data;
> + int ret = 0;
> +
> + /* Initialize the xz library first */
> + xz_crc32_init();
> + s = xz_dec_init(XZ_SINGLE, 0);
> + if (s == NULL) {
> + prerror("IMC: initialization error for xz\n");
> + return -1;
> + }
> +
> + /* Allocate memory for the uncompressed data */
> + data = malloc(IMC_DTB_SIZE);
You're interchangably using this for compressed and uncompressed size.
> + if (!data) {
> + prerror("IMC: memory allocation error\n");
> + ret = -1;
> + goto err;
> + }
> +
> + /*
> + * Source address : buf
> + * Source size : size
> + * Destination address : data
> + * Destination size : IMC_DTB_SIZE
> + */
> + b.in = buf;
> + b.in_pos = 0;
> + b.in_size = size;
> + b.out = data;
> + b.out_pos = 0;
> + b.out_size = IMC_DTB_SIZE;
> +
> + /* Start decompressing */
> + ret = xz_dec_run(s, &b);
> + if (ret != XZ_STREAM_END) {
> + prerror("IMC: failed to decompress subpartition\n");
> + free(data);
> + ret = -1;
> + }
> + *fdt = data;
> +
> +err:
> + /* Clean up memory */
> + xz_dec_end(s);
> + return ret;
> +}
> +
> +/* Fixup function for the phandles of the new subtree */
> +void fixup_handler(hash_entry_p entry, const struct dt_property *prop)
> +{
> + phandle_fixup_n *val;
> + val = (phandle_fixup_n *)entry->value;
> + if (val->fixed == false) {
> + val->node->phandle = increment_return_last_phandle();
> + val->fixed = true;
> + }
> + memcpy((char *)&prop->prop, &val->node->phandle, prop->len);
> +}
> +
> +/*
> + * Remove the PMU device nodes from the ncoming new subtree, if they are not
> + * available in the hardware. The availability is described by the
> + * control block's imc_chip_avl_vector.
> + * Each bit represents a device unit. If the device is available, then
> + * the bit is set else its unset.
> + */
> +static int disable_unavailable_units(struct dt_node *dev)
> +{
> + uint64_t avl_vec;
> + struct imc_chip_cb *cb;
> + struct dt_node *target;
> + int i;
> +
> + /* Fetch the IMC control block structure */
> + cb = get_imc_cb();
> +
> + avl_vec = be64_to_cpu(cb->imc_chip_avl_vector);
> + for (i = 0; i < MAX_AVL; i++) {
> + if (!(PPC_BITMASK(i, i) & avl_vec)) {
> + /* Check if the device node exists */
> + target = dt_find_by_name(dev, nest_pmus[i]);
> + if (!target)
> + continue;
> + /* Remove the device node */
> + dt_free(target);
> + }
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Fetch the IMA_CATALOG partition and find the appropriate sub-partition
> + * based on the platform's PVR.
> + * Decompress the sub-partition and link the imc device tree to the
> + * existing device tree.
> + */
> +void imc_init(void)
> +{
> + char *buf = NULL;
> + void *fdt = NULL;
> + size_t size = IMC_DTB_SIZE;
> + uint32_t pvr = mfspr(SPR_PVR);
> + struct dt_node *dev, *node;
> + struct dt_phandle_fixup fixup;
> + const struct dt_property *prop;
> + int ret;
> + fixup_prop_list *pp_list;
> + hash_table_p hash;
> +
> + /* Enable only for power 9 */
> + if (proc_gen != proc_gen_p9)
> + return;
> +
> + buf = malloc(IMC_DTB_SIZE);
> + if (!buf) {
> + prerror("IMC: Memory allocation Failed\n");
> + return;
> + }
> +
> + ret = start_preload_resource(RESOURCE_ID_CATALOG,
> + pvr, buf, &size);
> + if (ret != OPAL_SUCCESS)
> + goto err;
> +
> + ret = wait_for_resource_loaded(RESOURCE_ID_CATALOG,
> + pvr);
> + if (ret != OPAL_SUCCESS) {
> + prerror("IMC Catalog load failed\n");
> + return;
> + }
Please do not do this synchronously.
Start the preload and then later on, as late as possible,
wait_for_resource_loaded and init the rest.
looking at core/init.c, it probably makes sense to preload it *after*
NVRAM but before start_preload_kernel(), and then run the rest of
imc_init after the kernel has started preloading (after nx_init maybe?)
> +
> + /* Decompress the subpartition now */
> + ret = decompress_subpartition(buf, size, &fdt);
> + if (ret < 0)
> + goto err;
> +
> + /* Create a device tree entry for imc counters */
> + dev = dt_new_root("imc-counters");
> + if (!dev)
> + goto err;
> +
> + /* Attach the new fdt to the imc-counters node */
> + ret = dt_expand_node(dev, fdt, 0);
> + if (ret < 0) {
> + dt_free(dev);
> + goto err;
> + }
> +
> + /* fixup the phandler for the device tree */
> + list_head_init(&fixup.list);
> + fixup.fixup = fixup_handler;
> + dt_for_each_node(dev, node) {
> + if (dt_find_property(node, "events")) {
> + pp_list = (fixup_prop_list *)malloc(sizeof(fixup_prop_list));
> + if (!pp_list) {
> + prerror("Failed to allocate memory");
> + goto err;
> + }
> + prop = dt_find_property(node, "events");
> + pp_list->prop = prop;
> + pp_list->node = node;
> + list_add(&fixup.list, &pp_list->list);
> + }
> + }
> + /* Check whether we have pmu units with "events" property for fixup */
> + if (!list_empty(&fixup.list)) {
> + hash = (hash_table_p)malloc(sizeof(hash_table_s));
> + if (!hash) {
> + prerror("Failed to allocate memory for hashtable");
> + goto err;
> + }
> + ret = dt_fixup_populate_hash(dev, hash, &fixup);
> + if (ret < 0) {
> + prerror("IMC fixup phandler failed");
> + hash_cleanup(hash);
> + goto err;
> + }
> + /* Free the hashtable */
> + hash_cleanup(hash);
> + }
> +
> + /* Check availability of the Nest PMU units from the availability vector */
> + disable_unavailable_units(dev);
> +
> + if (!dt_attach_root(dt_root, dev)) {
> + dt_free(dev);
> + goto err;
> + }
> +
> + return;
> +err:
> + prerror("IMC Devices not added\n");
> + free(buf);
> +}
Why can't you free buf in the non error case?
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list