[Skiboot] [PATCH V4 2/3] occ: Add support for OPAL-OCC command/response interface

Stewart Smith stewart at linux.vnet.ibm.com
Mon Jul 17 09:26:00 AEST 2017


Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com> writes:
> On 07/07/2017 02:23 PM, Stewart Smith wrote:
>> Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com> writes:
>
>
>>> +static inline bool occ_in_progress(struct cmd_interface *chip)
>>> +{
>>> +	return (chip->rsp->flag == OCC_IN_PROGRESS);
>>> +}
>>> +
>>> +static int write_occ_cmd(struct cmd_interface *chip, bool retry)
>>> +{
>>> +	struct opal_command_buffer *cmd = chip->cmd;
>>> +	struct opal_occ_cmd_rsp_msg *msg = chip->msg;
>>> +
>>> +	if (!retry && occ_in_progress(chip)) {
>>> +		chip->cmd_in_progress = false;
>>> +		return OPAL_OCC_BUSY;
>>> +	}
>>> +
>>> +	cmd->flag = chip->rsp->flag = 0;
>>> +	cmd->cmd = occ_cmds[msg->cmd].value;
>>> +	cmd->request_id = msg->request_id;
>>> +	cmd->data_size = msg->cdata_size;
>>> +	memcpy(&cmd->data, (u8 *)msg->cdata, msg->cdata_size);
>>> +	cmd->flag = OPAL_CMD_READY;
>>> +
>>> +	schedule_timer(&chip->timeout,
>>> +		       msecs_to_tb(occ_cmds[msg->cmd].timeout_ms));
>>> +	chip->prev_cmd = msg->cmd;
>>> +
>>> +	return OPAL_ASYNC_COMPLETION;
>>> +}
>>> +
>>> +static int64_t opal_occ_command(int chip_id, struct opal_occ_cmd_rsp_msg *msg,
>>> +				int token, bool retry)
>> 
>> It looks like retry is not actually retry, but rather 'return if busy,
>> don't queue' as in write_occ_cmd that's the logic.
>
> If kernel is retrying the same command due to a command-timeout in the previous
> try, then don't check the OCC status flag as OCC might not have been able to
> update the status flag. 'retry' flag is used to indicate that kernel is retrying
> the command.
>
> Do you want 'retry' state to be stored in OPAL ?

I think that would be better. If part of the protocol in talking to the
OCC is that if at first you don't succeed, retry without looking at
status, then we should do the retry within OPAL transparently to Linux,
so that the kernel just sees success/fail.

>>> @@ -910,6 +1314,9 @@ void occ_pstates_init(void)
>>>  		chip->throttle = 0;
>>>  	opal_add_poller(occ_throttle_poll, NULL);
>>>  	occ_pstates_initialized = true;
>>> +
>>> +	/* Init OPAL-OCC command-response interface */
>>> +	occ_cmd_interface_init();
>>>  }
>>>  
>>>  struct occ_load_req {
>>> @@ -1409,8 +1816,10 @@ void occ_p9_interrupt(uint32_t chip_id)
>>>  	if (ireg & OCB_OCI_OCIMISC_IRQ_TMGT)
>>>  		prd_tmgt_interrupt(chip_id);
>>>  
>>> -	if (ireg & OCB_OCI_OCIMISC_IRQ_SHMEM)
>>> +	if (ireg & OCB_OCI_OCIMISC_IRQ_SHMEM) {
>>>  		occ_throttle_poll(NULL);
>>> +		handle_occ_rsp(chip_id);
>>> +	}
>>>  
>>>  	if (ireg & OCB_OCI_OCIMISC_IRQ_I2C)
>>>  		p9_i2c_bus_owner_change(chip_id);
>>> @@ -1435,5 +1844,3 @@ void occ_fsp_init(void)
>>>  	if (fsp_present())
>>>  		fsp_register_client(&fsp_occ_client, FSP_MCLASS_OCC);
>>>  }
>>> -
>>> -
>>> diff --git a/include/opal-api.h b/include/opal-api.h
>>> index edae069..dd92cf9 100644
>>> --- a/include/opal-api.h
>>> +++ b/include/opal-api.h
>>> @@ -55,6 +55,10 @@
>>>  #define OPAL_XSCOM_CTR_OFFLINED	-30
>>>  #define OPAL_XIVE_PROVISIONING	-31
>>>  #define OPAL_XIVE_FREE_ACTIVE	-32
>>> +#define OPAL_OCC_INVALID_STATE	-33
>> 
>> Instead of adding new return codes, I think existing ones suffice, or is
>> there a really big need to have a way to distinguish these errors from
>> other ones?  e.g. insetad of OPAL_OCC_INVALID_STATE, it could be
>> OPAL_WRONG_STATE - and this is consistent with other OPAL calls.
>> 
>>> +#define OPAL_OCC_BUSY		-34
>> 
>> OPAL_BUSY
>
> Yeah OPAL_BUSY works.
>
>> 
>>> +#define OPAL_OCC_CMD_TIMEOUT	-35
>> 
>> We probably want an OPAL_TIMEOUT actually... we have OPAL_I2C_TIMEOUT
>> and OPAL_XSCOM_TIMEOUT
>> 
>>> +#define OPAL_OCC_CMD_MISMATCH	-36
>> 
>> OPAL_PARAMETER
>> 
>> Does that make sense? Any reason not to do that?
>> 
>
> OPAL_OCC_CMD_TIMEOUT and OPAL_OCC_CMD_MISMATCH is used by kerenl to retry the
> OCC command. As OPAL_PARAMETER is also used for the sanity checks of the
> parameters passed to opal_call, kernel wont be able to distinguish
> when to retry.

If the retry logic is in OPAL though, that problem should go away.

>>> @@ -1033,6 +1038,81 @@ typedef struct oppanel_line {
>>>  	__be64 line_len;
>>>  } oppanel_line_t;
>>>  
>>> +enum occ_cmd {
>>> +	OCC_CMD_AMESTER_PASS_THRU = 0,
>> 
>> What is amester passthru for? Do we need this as part of our ABI?
>
> This command will be used by Amester for tracing the sensor values at 250us/2ms
> sampling rate. In P8, Amester used IPMI for tracing. In P9, Amester will be able
> to communicate to OCC inband using this cmd/rsp interface.
>
> Yeah we need this command to support inband Amester tracing.

So, the P8 design is pretty awful, random undocumented out of band IPMI
commands aren't great.

So, what exactly is this call accomplishing though? Is it setting a
sampling frequency? I really dislike these kind of passthrough APIs, as
the care to maintain backwards compatibility expands to many places when
we could have just kept it between OPAL and Linux.

>>> +	OCC_CMD_CLEAR_SENSOR_DATA,
>> 
>> Clear what sensors?
>
> Min-max values of sensors stored for different users like CSM, Profiler and
> Job-scheduler.
>
> How about OCC_CMD_CLEAR_SENSOR_MIN_MAX instead?

I don't think it needs to be OCC specific, why not add an API to clear
any OPAL_SENSOR? For any sensor that isn't min/max and clearable (could
be property in device tree, it'd return OPAL_UNSUPPORTED. That way,
these sensors could be read by existing kernels, although a small patch
would be required to clear them.

>>> +	OCC_CMD_SET_POWER_CAP,
>> 
>> Power cap for what? System? Socket?
>
> System.
>
>>> +	OCC_CMD_SET_POWER_SHIFTING_RATIO,
>>> +	OCC_CMD_SELECT_SENSOR_GROUPS,
>> 
>> Basically, this all needs to be documented somewhere, notably in
>> doc/opal-api/
>
> Okay.
>
>> 
>> I also wonder if this approach is the best for the OPAL API. While we do
>> *currently* have a thing called an OCC that does these functions, is
>> that *always* going to be the case? Elsewhere in firmware we've been
>> going down the route of calling things "pm_complex" and sending commands
>> to/from the pm_complex to do tasks for us.
>> 
>> would it be better if our OPAL API instead described the intended effect
>> rather than how to achieve said effect?
>> 
>> I wouldn't be opposed to an OPAL_SET_POWER_CAP call, that's an async
>> call, and accepts parameters in a range that's described in the device
>> tree. That call could be implemented on P8 too (where you currently set
>> power caps by doing crazy IPMI incantations to the BMC), and that would
>> prove that the OPAL API is implementation independent, and we should be
>> future proof with however we do the intent of capping
>> system/socket/other-thing power in the future.
>
> I agree with defining an abstract API for setting powercap as a better
> approach.

Great! I think it'd end up being a much better approach.

>> How do the sensors work? Do they fit into the existing OPAL Sensors
>> infrastructure? are they more in-memory sensors? I ask as perhaps the
>> select/clear calls should be part of either the regular opal sensors API
>> or the IMC api rather than their own API.
>
> Currently, only in-memory environment sensors are exported in the OPAL sensor
> infrastructure. Yes there are more perf/util/freq in-memory sensors that are not
> exported.
>
> The current sensor framework provides API to only read the sensor values. We
> will need new API additions to the sensor framework then to clear min/max of
> in-memory sensors and select the sensor groups to be copied to main memory.
> So should we have new opal-calls defined for these sensor operations instead of
> the OCC cmd/rsp interface?

Okay, so there's groups of sensors that need enabling? What is the
impact of always enabling them? What specific sensors are they?

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list