[Skiboot] [PATCH V8 2/8] powercap: occ: Add a generic powercap framework

Stewart Smith stewart at linux.vnet.ibm.com
Thu Jul 27 18:36:19 AEST 2017


Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com> writes:
> This patch adds a generic powercap framework and exports OCC powercap
> sensors using which system powercap can be set inband through OPAL-OCC
> command-response interface.
>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com>

Nice!

Yeah, this is what I was hoping for. I think it should serve us
well. It'd be interesting to hook up the same interface for P8 too
(although that shouldn't hold up this patch), if only so we can see how
that'd work and then use it for developing test cases without needing
(more scarce) P9 hardware.

Couple of small comments:

> diff --git a/core/Makefile.inc b/core/Makefile.inc
> index b09c30c..b3c9e91 100644
> --- a/core/Makefile.inc
> +++ b/core/Makefile.inc
> @@ -8,7 +8,7 @@ CORE_OBJS += pci-opal.o fast-reboot.o device.o exceptions.o trace.o affinity.o
>  CORE_OBJS += vpd.o hostservices.o platform.o nvram.o nvram-format.o hmi.o
>  CORE_OBJS += console-log.o ipmi.o time-utils.o pel.o pool.o errorlog.o
>  CORE_OBJS += timer.o i2c.o rtc.o flash.o sensor.o ipmi-opal.o
> -CORE_OBJS += flash-subpartition.o bitmap.o buddy.o pci-quirk.o
> +CORE_OBJS += flash-subpartition.o bitmap.o buddy.o pci-quirk.o powercap.o
>
>  ifeq ($(SKIBOOT_GCOV),1)
>  CORE_OBJS += gcov-profiling.o
> diff --git a/core/powercap.c b/core/powercap.c
> new file mode 100644
> index 0000000..50b5996
> --- /dev/null
> +++ b/core/powercap.c
> @@ -0,0 +1,40 @@
> +/* Copyright 2017 IBM Corp.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at
> + *
> + *	http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> + * implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <powercap.h>
> +
> +static int opal_get_powercap(u32 handle, int token __unused, u32 *pcap)
> +{
> +	if (!pcap || !opal_addr_valid(pcap))
> +		return OPAL_PARAMETER;
> +
> +	if (powercap_get_class(handle) == POWERCAP_CLASS_OCC)
> +		return occ_get_powercap(handle, pcap);
> +
> +	return OPAL_UNSUPPORTED;
> +};
> +
> +opal_call(OPAL_GET_POWERCAP, opal_get_powercap, 3);
> +
> +static int opal_set_powercap(u32 handle, int token, u32 pcap)
> +{
> +	if (powercap_get_class(handle) == POWERCAP_CLASS_OCC)
> +		return occ_set_powercap(handle, token, pcap);
> +
> +	return OPAL_UNSUPPORTED;
> +};
> +
> +opal_call(OPAL_SET_POWERCAP, opal_set_powercap, 3);
> diff --git a/doc/device-tree/ibm,opal/power-mgt/powercap.rst b/doc/device-tree/ibm,opal/power-mgt/powercap.rst
> new file mode 100644
> index 0000000..ae89005
> --- /dev/null
> +++ b/doc/device-tree/ibm,opal/power-mgt/powercap.rst
> @@ -0,0 +1,38 @@
> +power-mgt/powercap
> +------------------
> +
> +The powercap sensors are populated in this node. Each child node in
> +the "powercap" node represents a power-cappable component.
> +
> +For example : ::
> +        system-powercap/

add "The OPAL_GET_POWERCAP and OPAL_SET_POWERCAP calls take a handle for
what powercap property to get/set".

> +
> +Each child node has below properties:
> +
> +`powercap-cur`
> +  Handle to indicate the current powercap
> +
> +`powercap-min`
> +  Handle to indicate minimum powercap

"minimum possible powercap"

> +
> +`powercap-max`
> +  Handle to indicate maximum powercap

"maximum possible powercap"

> +Powercap handle uses the following encoding: ::

Add "The format of the powercap handle is *NOT* ABI and may change in
the future."

> +
> +        | Class |    Reserved   | Attribute |
> +        |-------|---------------|-----------|
> +
> +.. code-block:: dts
> +
> +   power-mgt {
> +     powercap {
> +
> +        system-powercap {
> +                name = "system-powercap"
> +                powercap-cur = <0x00000002>
> +                powercap-min = <0x00000000>
> +                powercap-max = <0x00000001>
> +        };
> +     };
> +    }
> diff --git a/doc/opal-api/opal-powercap.rst b/doc/opal-api/opal-powercap.rst
> new file mode 100644
> index 0000000..d078003
> --- /dev/null
> +++ b/doc/opal-api/opal-powercap.rst
> @@ -0,0 +1,77 @@
> +.. _opal-powercap:
> +
> +OPAL_GET_POWERCAP
> +==============================
> +OPAL call to read the powercap using a handle to identify the
> +targeted powercap sensor (OCC, IPMI). The powercap handle is encoded
> +to include the type of powercap sensor (min,max, current) and can be
> +extended to add the component like system or chip. The powercap
> +handle is exported via device-tree.

What about this instead:

"The OPAL_GET_POWERCAP call retreives current information on the power
cap.

For each entity that can be power capped, the device tree
binding indicates what handle should be passed for each of the power cap
properties (minimum possible, maximum possible, current powercap).

The current power cap must be between the minimium possible and maximum
possible power cap. The minimum and maximum values are dynamic to allow
for them possibly being changed by other factors or entities
(e.g. service processor).
"

> +The call can be asynchronus, where the token parameter is used to wait
> +for the completion.
> +
> +Parameters
> +----------
> +::
> +        u32 handle
> +        int token
> +        u32 *pcap
> +
> +Returns
> +-------
> +OPAL_SUCCESS
> +  Success
> +
> +OPAL_PARAMETER
> +  Invalid pcap pointer
> +
> +OPAL_UNSUPPORTED
> +  No support for reading powercap sensor
> +
> +OPAL_HARDWARE
> +  Unable to procced due to the current hardware state

It would also return OPAL_ASYNC_COMPLETION if it's doing teh operation async.

> +
> +OPAL_SET_POWERCAP
> +============================
> +OPAL call to set powercap. It uses a handle identify the target
> +powercap sensor which is exported in device-tree. This call can be
> +asynchronus where the token parameter is used to wait for the
> +completion.

What about this instead:

"The OPAL_SET_POWERCAP call sets a power cap.

For each entity that can be power capped, the device tree
binding indicates what handle should be passed for each of the power cap
properties (minimum possible, maximum possible, current powercap).

The current power cap must be between the minimium possible and maximum
possible power cap.

You cannot currently set the minimum or maximum power cap, and thus
OPAL_PERMISSION will be returned if it is attempted to set. In the
future, this may change - but for now, the correct behaviour for an
Operating System is to not attempt to set them.
"

(Additionally, I think the set powercap OPAL call should do that check
for a valid handle and return OPAL_PERMISSION if the OS asks to set
min/max value).

The only reason I can think of ever allowing that is a hypothetical
future feature where we could in-band limit the range and then lock it
out (e.g. a petitboot config option for setting limits for a running OS).

> +Parameters
> +----------
> +::
> +        u32 handle
> +        int token
> +        u32 pcap
> +
> +Returns
> +-------
> +OPAL_SUCCESS
> +  Success
> +
> +OPAL_PARAMETER
> +  Invalid powercap requested beyond powercap limits
> +
> +OPAL_UNSUPPORTED
> +  No support for changing the powercap
> +
> +OPAL_PERMISSION
> +  Hardware cannot take the request
> +
> +OPAL_ASYNC_COMPLETION
> +  Request was sent and an async completion message will be sent with
> +  token and status of the request.
> +
> +OPAL_HARDWARE
> +  Unable to procced due to the current hardware state
> +
> +OPAL_BUSY
> +  Previous request in progress
> +
> +OPAL_INTERNAL_ERROR
> +  Error in request response
> +
> +OPAL_TIMEOUT
> +  Timeout in request completion
> diff --git a/hw/occ.c b/hw/occ.c
> index 58d9778..c0a1158 100644
> --- a/hw/occ.c
> +++ b/hw/occ.c
> @@ -28,6 +28,7 @@
>  #include <opal-msg.h>
>  #include <timer.h>
>  #include <i2c.h>
> +#include <powercap.h>
>
>  /* OCC Communication Area for PStates */
>
> @@ -1060,8 +1061,8 @@ static int write_occ_cmd(struct cmd_interface *chip)
>  	return OPAL_ASYNC_COMPLETION;
>  }
>
> -static int64_t __unused opal_occ_command(struct cmd_interface *chip, int token,
> -					 struct opal_occ_cmd_data *cdata)
> +static int64_t opal_occ_command(struct cmd_interface *chip, int token,
> +				struct opal_occ_cmd_data *cdata)
>  {
>  	int rc;
>
> @@ -1213,10 +1214,13 @@ exit:
>  	unlock(&chip->queue_lock);
>  }
>
> +static void occ_add_powercap_sensors(struct dt_node *power_mgt);
> +
>  static void occ_cmd_interface_init(void)
>  {
>  	struct occ_dynamic_data *data;
>  	struct occ_pstate_table *pdata;
> +	struct dt_node *power_mgt;
>  	struct proc_chip *chip;
>  	int i = 0;
>
> @@ -1247,8 +1251,118 @@ static void occ_cmd_interface_init(void)
>  			   &chips[i]);
>  		i++;
>  	}
> +
> +	power_mgt = dt_find_by_path(dt_root, "/ibm,opal/power-mgt");
> +	if (!power_mgt) {
> +		prerror("OCC: dt node /ibm,opal/power-mgt not found\n");
> +		return;
> +	}
> +
> +	/* Add powercap sensors to DT */
> +	occ_add_powercap_sensors(power_mgt);
>  }
>
> +/* Powercap interface */
> +enum sensor_powercap_occ_attr {
> +	POWERCAP_OCC_MIN,
> +	POWERCAP_OCC_MAX,
> +	POWERCAP_OCC_CUR,
> +};
> +
> +static void occ_add_powercap_sensors(struct dt_node *power_mgt)
> +{
> +	struct dt_node *pcap, *node;
> +	u32 handle;
> +
> +	pcap = dt_new(power_mgt, "powercap");
> +	if (!pcap) {
> +		prerror("OCC: Failed to create powercap node\n");
> +		return;
> +	}
> +
> +	node = dt_new(pcap, "system-powercap");
> +	if (!node) {
> +		prerror("OCC: Failed to create system powercap node\n");
> +		return;
> +	}

we need a 'compatible' node too, and this should be what the linux code
binds to rather than the path in the tree.

> +
> +	handle = powercap_make_handle(POWERCAP_CLASS_OCC, POWERCAP_OCC_CUR);
> +	dt_add_property_cells(node, "powercap-cur", handle);

(i realise this is *incredibly* minor) but I'd prefer "powercap-current"
rather than the appreviation.

> +int occ_get_powercap(u32 handle, u32 *pcap)
> +{
> +	struct occ_pstate_table *pdata;
> +	struct occ_dynamic_data *ddata;
> +	struct proc_chip *chip;
> +
> +	chip = next_chip(NULL);
> +	pdata = get_occ_pstate_table(chip);
> +	ddata = get_occ_dynamic_data(chip);
> +
> +	if (!pdata->valid)
> +		return OPAL_HARDWARE;
> +
> +	switch (powercap_get_attr(handle)) {
> +	case POWERCAP_OCC_MIN:
> +		*pcap = ddata->min_pwr_cap;
> +		break;
> +	case POWERCAP_OCC_MAX:
> +		*pcap = ddata->max_pwr_cap;
> +		break;
> +	case POWERCAP_OCC_CUR:
> +		*pcap = ddata->cur_pwr_cap;
> +		break;
> +	default:
> +		*pcap = 0;
> +		return OPAL_UNSUPPORTED;
> +	}
> +
> +	return OPAL_SUCCESS;
> +}
> +
> +static u16 pcap_cdata;
> +static struct opal_occ_cmd_data pcap_data = {
> +	.data		= (u8 *)&pcap_cdata,
> +	.cmd		= OCC_CMD_SET_POWER_CAP,
> +};
> +
> +int occ_set_powercap(u32 handle, int token, u32 pcap)
> +{
> +	struct occ_dynamic_data *ddata;
> +	struct proc_chip *chip;
> +	int i;
> +
> +	if (powercap_get_attr(handle) != POWERCAP_OCC_CUR)
> +		return OPAL_UNSUPPORTED;

I'm thinking this should be OPAL_PERMISSION ?

My reasoning is to future proof us for a hypothetical future feature
where we could allow one time / one way / only at boot changing of these
limits.

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list