[Skiboot] [PATCH v7 17/22] fadump: Add support to trigger memory preserving IPL on BMC system

Tue May 14 20:49:01 AEST 2019

On 05/10/2019 07:20 AM, Oliver wrote:
> On Sat, Apr 13, 2019 at 7:20 PM Vasant Hegde
> <hegdevasant at linux.vnet.ibm.com> wrote:
>>
>> On FSP based system we call 'attn' instruction. FSP detects attention and
>> initiates memory preserving IPL. On BMC system we have to call SBE S0
>> interrupt to initiate memory preserving IPL.
>>
>> This patch adds support to call SBE S0 interrupt in assert path.
>> Sequence :
>>    - S0 interrupt on slave chip SBE
>>    - S0 interrupt on master chip SBE
>>
>> Note that this is hooked to ipmi_terminate path. We have HDAT flag for MPIPL
>> support. If MPIPL is not supported then we don't create 'ibm,opal/dump' node
>> and we will fall back to existing termination flow.
>>
>> Note:
>>    - It uses xscom node to detect master and slave chip. So that we can
>>      trigger early MPIPL (even before initializing SBE driver).
>>    - At present we do not have a proper way to detect SBE is alive or not.
>>      So we wait for predefined time and then call normal reboot.
>>
>> Signed-off-by: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>
>> ---
>>   hw/ipmi/ipmi-attn.c |  9 +++++++-
>>   hw/sbe-p9.c         | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/sbe-p9.h    | 10 +++++++++
>>   3 files changed, 81 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ipmi/ipmi-attn.c b/hw/ipmi/ipmi-attn.c
>> index c6c1c59b5..ae97e92ab 100644
>> --- a/hw/ipmi/ipmi-attn.c
>> +++ b/hw/ipmi/ipmi-attn.c
>> @@ -1,4 +1,4 @@
>> -/* Copyright 2015 IBM Corp.
>> +/* Copyright 2015-2019 IBM Corp.
>>    *
>>    * Licensed under the Apache License, Version 2.0 (the "License");
>>    * you may not use this file except in compliance with the License.
>> @@ -19,6 +19,7 @@
>>   #include <pel.h>
>>   #include <platform.h>
>>   #include <processor.h>
>> +#include <sbe-p9.h>
>>   #include <skiboot.h>
>>   #include <stack.h>
>>   #include <timebase.h>
>> @@ -65,6 +66,12 @@ static void ipmi_log_terminate_event(const char *msg)
>>
> 
>>   void __attribute__((noreturn)) ipmi_terminate(const char *msg)
>>   {
>> +       /*
>> +        * If fadump is supported then trigger SBE interrupt
>> +        * to initiate fadump
>> +        */
>> +       p9_sbe_terminate();
> 
> Is this safe to do if the rest of the system doesn't support it?

Yes. I have check inside p9_sbe_terminate().

> 
>>          /* Terminate called before initializing IPMI (early abort) */
>>          if (!ipmi_present()) {
>>                  if (platform.cec_reboot)
> 
> The context is omitted in the patch but the ipmi_present() check here
> is used to determine if we should send an eSEL to report the error
> before rebooting. You probably want to do the same prior to an MPIPL.

In MPIPL path we will have opal/kernel core file. Hence I skipped logging event 
to BMC.
May be we should add log to BMC. Will look into this.

> 
>> diff --git a/hw/sbe-p9.c b/hw/sbe-p9.c
>> index ba7691f50..694375a5a 100644
>> --- a/hw/sbe-p9.c
>> +++ b/hw/sbe-p9.c
>> @@ -948,3 +948,66 @@ void p9_sbe_init(void)
>>          /* Initiate SBE timeout poller */
>>          opal_add_poller(p9_sbe_timeout_poll, NULL);
>>   }
>> +
>> +/* Terminate and initiate MPIPL */
>> +void p9_sbe_terminate(void)
>> +{
>> +       uint32_t chip_id;
>> +       uint32_t primary_chip = -1;
>> +       int rc;
>> +       u64 wait_tb;
>> +       struct dt_node *xn;
>> +
> 
>> +       if (proc_gen < proc_gen_p9)
>> +               return;
> 
> Redundant. Either MPIPL is supported and the dump node exists, or it's
> not supported and the subsequent check is going to bail.

Agree. Will remove this check.

> 
>> +
>> +       /* Return if MPIPL is not supported */
>> +       if (!dt_find_by_path(opal_node, "dump"))
>> +               return;
> 
>> +       dt_for_each_compatible(dt_root, xn, "ibm,xscom") {
>> +               chip_id = dt_get_chip_id(xn);
> Why are we iterating the device-tree directly here instead of the
> proc_chip list?

Correct. iterating chip is better option. Will fix.

> 
>> +
>> +               if (dt_has_node_property(xn, "primary", NULL)) {
>> +                       primary_chip = chip_id;
>> +                       continue;
>> +               }
>> +
>> +               rc = xscom_write(chip_id,
>> +                                SBE_CONTROL_REG_RW, SBE_CONTROL_REG_S0);
>> +               /* Initiate normal reboot */
>> +               if (rc) {
>> +                       prlog(PR_ERR, "Failed to write S0 interrupt [chip id = %x]\n",
>> +                             chip_id);
>> +                       return;
>> +               }
>> +       }
>> +
>> +       if (primary_chip == -1) {
>> +               prlog(PR_ERR, "Master chip ID not found.\n");
>> +               return;
>> +       }
>> +
>> +       /* Write S0 interrupt on master SBE */
>> +       rc = xscom_write(primary_chip,
>> +                        SBE_CONTROL_REG_RW, SBE_CONTROL_REG_S0);
>> +       /* Initiate normal reboot */
>> +       if (rc) {
>> +               prlog(PR_ERR, "Failed to write S0 interrupt [chip id = %x]\n",
>> +                     primary_chip);
>> +               return;
>> +       }
>> +
> 
>> +       /* XXX We do not have a way to detect SBE state. Hence wait for max
>> +        *     time SBE takes to respond and then trigger normal reboot.
>> +        */
>> +       prlog(PR_NOTICE, "Initiated MPIPL, waiting for SBE to respond...\n");
>> +       wait_tb = mftb() + msecs_to_tb(SBE_CMD_TIMEOUT_MAX);
>> +       while (mftb() < wait_tb) {
>> +               cpu_relax();
>> +       }
>> +
>> +       prlog(PR_ERR, "SBE did not respond within timeout period (%d secs).\n",
>> +             SBE_CMD_TIMEOUT_MAX / 1000);
> 
> /* Worst case command timeout value (90 sec) */
> #define SBE_CMD_TIMEOUT_MAX                     (90 * 1000)
> 
> This seems excessive. Also add a comment or something explaining that
> the loop is not supposed to terminate rather than just hinting at it.

Yes.. 90 second is too  much. But thats what spec says.
Sure. I will improve the comment.

Thanks

-Vasant