[Skiboot] [PATCH V2 2/3] doc/errorlogging : Update details about error logging on FSP and BMC

Mukesh Ojha mukesh02 at linux.vnet.ibm.com
Mon Jul 18 23:09:50 AEST 2016


Hi Vasant,

Thanks for the review.
I have taken care your review comment and updated in V3.

Regards,
Mukesh

On Friday 15 July 2016 01:31 PM, Vasant Hegde wrote:
> On 07/14/2016 12:12 PM, Mukesh Ojha wrote:
>> This patch add more description and example how errorlogging is 
>> independent of
>> the platform and also talks about how errorlogs committed on BMC 
>> systems.
>>
>> Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
>> ---
>> Changes in V2:
>>   - Corrects typo mistake.
>>   - Adds more detail about eSEL format.
>>   - Changes talk about generic service processors.
>>
>>   doc/error-logging.txt | 68 
>> +++++++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/error-logging.txt b/doc/error-logging.txt
>> index 7c62520..a9e5993 100644
>> --- a/doc/error-logging.txt
>> +++ b/doc/error-logging.txt
>> @@ -178,16 +178,68 @@ Step 2: Data can be appended to the user data 
>> section using the either of
>>       uint32_t tag: Unique value to identify the data.
>>                          Ideal to have ASCII value for 4-byte string.
>>
>> -Step 3: Once all the data for an error is logged in, the error needs 
>> to be
>> -    committed in FSP.
>> +Step 3: There is a platform hook for the opal error log to be 
>> committed on any
>> +    service processor(Currently used for FSP and BMC based machines).
>>
>> -    rc = elog_fsp_commit(buf);
>> +    FSP:
>> +        .elog_commit            = elog_fsp_commit
>> +
>> +    Once all the data for an error is logged in, the error needs to
>> +    be committed in FSP.
>> +
>> +    rc = platform.elog_commit(elog);
>>       Value of 0 is returned on success.
>>
>> -In the process of committing an error to FSP, log info is first 
>> internally
>> -converted to PEL format and then pushed to the FSP. All the errors 
>> logged
>> -in Sapphire are again pushed up to POWERNV platform by the FSP and 
>> all the errors
>> -reported by Sapphire and POWERNV are logged in FSP.
>> +    In the process of committing an error to FSP, log info is first
>> +    internally converted to PEL format and then pushed to the FSP. 
>> All the
>> +    errors logged in Sapphire are again pushed up to POWERNV 
>> platform by
>> +    the FSP and all the errors reported by Sapphire and POWERNV are 
>> logged
>
> Please use consistent terminology. here you are referring PowerNV and 
> below host kernel.
>
> Also host kernel will not log any errors to service processor.
>
>
> -Vasant
>
>> +    in FSP. Sapphire maintains timeout field for all error logs it is
>> +    sending to FSP. if it is not logged within allotted time period 
>> (e.g if
>> +    FSP is down), in that case OPAL sends those logs to host kernel.
>> +
>> +    BMC:
>> +        .elog_commit            = ipmi_elog_commit
>> +
>> +    rc = platform.elog_commit(elog);
>> +    Value of 0 is returned on success.
>> +
>> +    In case of BMC machines, Error logs are first converted to eSEL 
>> format.
>> +    i.e:
>> +        eSEL = SEL header + PEL data
>> +
>> +    SEL header contains below fields,
>> +    struct sel_header {
>> +        uint16_t id;
>> +        uint8_t record_type;
>> +        uint32_t timestamp;
>> +        uint16_t genid;
>> +        uint8_t evmrev;
>> +        uint8_t sensor_type;
>> +        uint8_t sensor_num;
>> +        uint8_t dir_type;
>> +        uint8_t signature;
>> +        uint8_t reserved[2];
>> +    }
>> +
>> +    After filling up the SEL header fields, Sapphire copies the 
>> errorlog PEL
>> +    data after the header section. After eSEL log gets logged in BMC 
>> via
>> +    IPMI interface.
>> +
>> +e.g:
>> +    void log_commit(struct errorlog *elog)
>> +    {
>> +        ....
>> +        ....
>> +        if (platform.elog_commit) {
>> +            rc = platform.elog_commit(elog);
>> +            if (rc)
>> +                prerror("ELOG: Platform commit error %d\n", rc);
>> +            return;
>> +        }
>> +        ....
>> +        ....
>> +    }
>>
>>   If the user does not intend to dump various user data sections, but 
>> just
>>   log the error with some amount of description around that error, 
>> they can do
>> @@ -196,7 +248,7 @@ so using just the simple error logging interface
>>   log_simple_error(uint32_t reason_code, char *fmt, ...);
>>
>>   Eg: log_simple_error(OPAL_RC_SURVE_STATUS,
>> -            "SURV: Error retreiving surveillance status: %d\n",
>> +            "SURV: Error retrieving surveillance status: %d\n",
>>                                                  err_len);
>>
>>   Using the reason code, an error log is generated with the 
>> information derived
>>
>



More information about the Skiboot mailing list