[Skiboot] [PATCH v3] occ: Poll OCC throttle status and queue OCC events to host

Shilpasri G Bhat shilpa.bhat at linux.vnet.ibm.com
Wed Jun 3 21:09:01 AEST 2015


Hi Neelesh,

On 05/27/2015 01:16 PM, Neelesh Gupta wrote:
> 
> 
> On 05/21/2015 03:50 PM, Shilpasri G Bhat wrote:
>> Add a new class of message definition OPAL_MSG_OCC to
>> opal_message_type to notify the following OCC events to host:
>> 1) OCC Reset
>> 2) OCC Load
>> 3) OCC Throttle Status Change
>>
>> Add an opal poller to periodically read throttle status updated by OCC
>> for each chip and notify any change in throttle status to host. The
>> throttle status indicates the reason why OCC may have limited the max
>> Pstate of the chip.
>>
>> Signed-off-by: Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com>
>> ---
>> Changes from V2:
>> - Added documentation of OPAL_MSG_OCC to doc/opal-api/opal-messages.txt
>> - Moved macros OCC_RESET, OCC_LOAD, OCC_THROTTLE to opal-api.h
>> - Added a structure opal_occ_msg.
>>
>> Changes from V1:
>> - Initialize prev_throttle to 0 instead of 0xF
>> - Add pr_log when opal_queue_msg() fails on OCC reset and load.
>> - Do not update occ_reset and prev_throttle if opal_queue_msg fails in
>>    occ_throttle_poll()
>> - Do not queue if throttle reason is greater than 5.
>>
>>   doc/opal-api/opal-messages.txt | 37 ++++++++++++++++++
>>   hw/occ.c                       | 89 ++++++++++++++++++++++++++++++++++++++++++
>>   include/chip.h                 |  1 +
>>   include/opal-api.h             | 24 ++++++++++++
>>   4 files changed, 151 insertions(+)
>>
>> diff --git a/doc/opal-api/opal-messages.txt b/doc/opal-api/opal-messages.txt
>> index fdde247..451b217 100644
>> --- a/doc/opal-api/opal-messages.txt
>> +++ b/doc/opal-api/opal-messages.txt
>> @@ -158,3 +158,40 @@ struct opal_prd_msg:
>>     Responses from the kernel use the same message format, but are passed
>>   through the opal_prd_msg call.
>> +
>> +OPAL_MSG_OCC
>> +------------
>> +
>> +This is used by OPAL to inform host about OCC events like OCC reset,
>> +OCC load and throttle status change by OCC which can indicate the
>> +host the reason for frequency throttling/unthrottling.
>> +
>> +#define OCC_RESET    0
>> +#define OCC_LOAD     1
>> +#define OCC_THROTTLE     2
>> +
>> +/*
>> + * struct opal_occ_msg:
>> + * type: OCC_RESET, OCC_LOAD, OCC_THROTTLE
>> + * chip: chip id
>> + * throttle status: Indicates the reason why OCC may have limited
>> + * the max Pstate of the chip.
>> + * 0x00 = No throttle
>> + * 0x01 = Power Cap
>> + * 0x02 = Processor Over Temperature
>> + * 0x03 = Power Supply Failure (currently not used)
>> + * 0x04 = Over current (currently not used)
>> + * 0x05 = OCC Reset (not reliable as some failures will not allow for
>> + * OCC to update throttle status)
>> + */
>> +struct opal_occ_msg {
>> +    __be64 type;
>> +    __be64 chip;
>> +    __be64 throttle_status;
>> +};
>> +
>> +Host should read opal_occ_msg.chip and opal_occ_msg.throttle_status
>> +only when opal_occ_msg.type = OCC_THROTTLE.
>> +If opal_occ_msg.type > 2 then host should ignore the message for now,
>> +new events can be defined for opal_occ_msg.type in the future
>> +versions of OPAL.
>> diff --git a/hw/occ.c b/hw/occ.c
>> index fe513cb..7d11c4f 100644
>> --- a/hw/occ.c
>> +++ b/hw/occ.c
>> @@ -24,6 +24,8 @@
>>   #include <timebase.h>
>>   #include <hostservices.h>
>>   #include <errorlog.h>
>> +#include <opal-api.h>
>> +#include <opal-msg.h>
>>     /* OCC Communication Area for PStates */
>>   @@ -31,6 +33,13 @@
>>     #define MAX_PSTATES 256
>>   +struct opal_occ_msg occ_msg;
> 
> Why does it need to be global ?

There is no necessity to use a global variable. I will make it local to avoid
any contention.

> 
>> +#define chip_occ_data(chip) \
>> +        ((struct occ_pstate_table *)(chip->homer_base + \
>> +                P8_HOMER_SAPPHIRE_DATA_OFFSET))
>> +
>> +static bool occ_reset;
>> +
>>   struct occ_pstate_entry {
>>       s8 id;
>>       u8 flags;
>> @@ -302,6 +311,54 @@ static bool cpu_pstates_prepare_core(struct proc_chip
>> *chip, struct cpu_thread *
>>       return true;
>>   }
>>   +static void occ_throttle_poll(void *data __unused)
>> +{
>> +    struct proc_chip *chip;
>> +    struct occ_pstate_table *occ_data;
>> +    int rc;
>> +
>> +    if (occ_reset) {
> 
> Isn't there race using 'occ_reset' without lock ?

'occ_reset' will be referenced in FSP_OCC_RESET and opal_poller path. There is
no effect of this race in opal_poll section as we read throttle_status only when
occ_data->valid is set to 1. The only problem is host will miss out on the
opal-msg(OCC_RESET) if we received multiple OCC resets like below:

FSP->OCC_RESET		Opal_poll()
A)occ_reset=true	

Another reset		OCCs are active after first reset
B)occ_reset=true	C)occ_reset=false

If the host receives the message in the order A-B-C, then host will think we
have recovered from OCC reset while the OCCs are reset again.

So let us pass an OCC_RESET message from throttle_poll when we see that the
occ_data->valid is set to 0 while occ_reset is false.

> 
>> +        int inactive = 0;
>> +
>> +        for_each_chip(chip) {
>> +            occ_data = chip_occ_data(chip);
>> +            if (occ_data->valid != 1) {
>> +                inactive = 1;
>> +                break;
>> +            }
>> +        }
>> +        if (!inactive) {
>> +            /*
>> +             * Queue OCC_THROTTLE with throttle status as 0 to
>> +             * indicate all OCCs are active after a reset.
>> +             */
>> +            occ_msg.type = OCC_THROTTLE;
>> +            occ_msg.chip = 0;
> 
> Do you intend to assign chip id '0' ?

Yes. From the host I will not parse chip_id on receiving a throttle message from
opal after a reset.

> 
>> +            occ_msg.throttle_status = 0;
>> +            rc = _opal_queue_msg(OPAL_MSG_OCC, NULL, NULL, 3,
>> +                         (uint64_t *)&occ_msg);
>> +            if (!rc)
>> +                occ_reset = false;
>> +        }
>> +    } else {
>> +        for_each_chip(chip) {
>> +            occ_data = chip_occ_data(chip);
>> +            if ((occ_data->valid == 1) &&
>> +                (chip->prev_throttle != occ_data->throttle) &&
>> +                (occ_data->throttle <= 5)) {
> 
> Better to avoid magic number and have a macro.. say OCC_MAX_PSTATES

Agree will do.

> 
>> +                occ_msg.type = OCC_THROTTLE;
>> +                occ_msg.chip = chip->id;
>> +                occ_msg.throttle_status = occ_data->throttle;
>> +                rc = _opal_queue_msg(OPAL_MSG_OCC, NULL, NULL,
>> +                             3, (uint64_t *)&occ_msg);
>> +                if (!rc)
>> +                    chip->prev_throttle =
>> +                        occ_data->throttle;
>> +            }
>> +        }
>> +    }
>> +}
>> +
>>   
> 
> snip [...]
> 
>>     struct occ_load_req {
>> @@ -386,6 +448,13 @@ static void __occ_do_load(u8 scope, u32 dbob_id __unused,
>> u32 seq_id)
>>           prlog(PR_INFO, "OCC: Load: Fallback to preloaded image\n");
>>           rc = 0;
>>       } else if (!rc) {
>> +        occ_msg.type = OCC_LOAD;
>> +        rc = _opal_queue_msg(OPAL_MSG_OCC, NULL, NULL, 1,
>> +                     (uint64_t *)&occ_msg);
>> +        if (rc)
>> +            prlog(PR_INFO, "OCC: Failed to queue message %d\n",
>> +                  OCC_LOAD);
> 
> Do you want to start OCC if there was an error from _opal_queue_msg() ?

Here we will be just reporting the OCC_LOAD event in host. It is okay to miss
this reporting.

In any case we should not stop from starting OCC as this will further delay the
performance.

> 
>> +
>>           /* Success, start OCC */
>>           rc = host_services_occ_start();
>>       }
>> @@ -509,6 +578,26 @@ static void occ_do_reset(u8 scope, u32 dbob_id, u32 seq_id)
>>           rc = 0;
>>       }
>>       if (!rc) {
>> +        occ_msg.type = OCC_RESET;
>> +        rc = _opal_queue_msg(OPAL_MSG_OCC, NULL, NULL, 1,
>> +                     (uint64_t *)&occ_msg);
> 
> This is probably not right. How can you assure 'type' is the first element
> of the structure... someone can add one before 'type' and it will fail..
> Moreover, on the host side .. there is not the way to know how many
> 'params' passed via 'opal_msg' structure so should be consistent
> for a given module.
> 
> I think you should always pass the complete structure (that is, 3
> params in this case).. and at the host side, discard those which
> are not required (depending upon 'type') ..
> 

Yes agree.
Will pass the complete structure with default values for chip id and throttle
status (chip=0, throttle_status=0)
>From the host we ensure not to read the chip_id and throttle_status when message
type is OCC_RESET and OCC_LOAD. We also don't parse them when message type is
OCC_THROTTLE and OCC_RESET is seen before, this throttle message will have a
special meaning to host which says that all OCCs have recovered from reset.

Thanks and regards,
Shilpa



More information about the Skiboot mailing list