[Skiboot] [PATCH] occ: Set up OCC messaging even if we fail to setup pstates
Stewart Smith
stewart at linux.vnet.ibm.com
Tue Mar 27 15:54:10 AEDT 2018
Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com> writes:
> On 03/23/2018 06:54 AM, Stewart Smith wrote:
>> This means that we no longer hit this bug if we fail to get valid pstates
>> from the OCC.
>>
>> [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
>> echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
>> [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
>> [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
>> [ 10.318805] Disabling lock debugging due to kernel taint
>> [ 10.318808] Severe Machine check interrupt [Not recovered]
>> [ 10.318812] NIP [000000003003e434]: 0x3003e434
>> [ 10.318813] Initiator: CPU
>> [ 10.318815] Error type: Real address [Load/Store (foreign)]
>> [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
>> [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
>> [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
>> [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
>> [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
>> [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
>>
>> Signed-off-by: Stewart Smith <stewart at linux.vnet.ibm.com>
>> ---
>
> Thanks for the patch. We will need the below diff along with this patch.
> diff --git a/hw/occ.c b/hw/occ.c
> index d2b4a84..53e10f2 100644
> --- a/hw/occ.c
> +++ b/hw/occ.c
> @@ -1242,6 +1242,13 @@ static void occ_cmd_interface_init(void)
> struct proc_chip *chip;
> int i = 0;
>
> + /* Check if the OCC data is valid */
> + for_each_chip(chip) {
> + pdata = get_occ_pstate_table(chip);
> + if (!pdata->valid)
> + return;
> + }
> +
> chip = next_chip(NULL);
> pdata = get_occ_pstate_table(chip);
> if (pdata->version != 0x90)
> @@ -1610,6 +1614,12 @@ void occ_add_sensor_groups(struct dt_node *sg, u32
> *phandles, u32 *ptype,
> };
> int i, j;
>
> + /*
> + * Dont add sensor groups if cmd-interface is not intialized
> + */
> + if (!chips)
> + return;
> +
> for (i = 0; i < nr_occs; i++)
> if (chips[i].chip_id == chipid)
> break;
>
>> hw/occ.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/occ.c b/hw/occ.c
>> index 37cf73c30bbd..90d3105eb1a8 100644
>> --- a/hw/occ.c
>> +++ b/hw/occ.c
>> @@ -1365,6 +1365,9 @@ int occ_set_powercap(u32 handle, int token, u32 pcap)
>> if (powercap_get_attr(handle) != POWERCAP_OCC_CUR)
>> return OPAL_PERMISSION;
>>
>> + if (!chips)
>> + return OPAL_HARDWARE;
>> +
>
> This check is not required as powercap is initialized after cmd-interface
> initialization. So occ_set_powercap() will not be invoked by host if
> cmd-interface is not initialized.
>
>> for (i = 0; i < nr_occs; i++)
>> if (chips[i].occ_role == OCC_ROLE_MASTER)
>> break;
>> @@ -1721,14 +1724,11 @@ void occ_pstates_init(void)
>> if (!add_cpu_pstate_properties(&pstate_nom)) {
>> log_simple_error(&e_info(OPAL_RC_OCC_PSTATE_INIT),
>> "Skiping core cpufreq init due to OCC error\n");
>> - return;
>> - }
>> -
>> - /*
>> - * Setup host based pstates and set nominal frequency only in
>> - * P8.
>> - */
>> - if (proc_gen == proc_gen_p8) {
>> + } else if (proc_gen == proc_gen_p8) {
>> + /*
>> + * Setup host based pstates and set nominal frequency only in
>> + * P8.
>> + */
>> for_each_chip(chip)
>> for_each_available_core_in_chip(c, chip->id)
>> cpu_pstates_prepare_core(chip, c, pstate_nom);
>>
>
> Reviewed-and-tested-by: Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com>
Thanks for taking a look into that, merged with your changes to master
as of 547377ddcd93257bc825cea4809c25c1e90ffcf3
and then to 5.10.x as of e37bb5a8afcdba38c294e096c3fd4b63f38de0da
because of that big customer we know of.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list