[Skiboot] [PATCH V3 2/3] doc/errorlogging : Update details about error logging on FSP and BMC
Vasant Hegde
hegdevasant at linux.vnet.ibm.com
Thu Aug 4 21:42:22 AEST 2016
On 07/18/2016 06:40 PM, Mukesh Ojha wrote:
> This patch add more description and example how error logging is independent of
> the platform and also talks about error logging on BMC.
As mentioned in other patch you have to convert this to .rst format.
>
> Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
> ---
> Changes in V3:
> - Resolves naming inconsistency pointed by Vasant.
> - Removes POWERNV logging part, as of now in current system it does not log
> any error to OPAL or to FSP/BMC.
>
> Changes in V2:
> - Corrects typo mistake.
> - Adds more detail about eSEL format.
> - Changes talk about generic service processors.
>
> doc/error-logging.txt | 99 ++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 25 deletions(-)
>
> diff --git a/doc/error-logging.txt b/doc/error-logging.txt
> index 7c62520..8a3134e 100644
> --- a/doc/error-logging.txt
> +++ b/doc/error-logging.txt
> @@ -1,18 +1,16 @@
> -How to log errors on Sapphire and POWERNV:
> -=========================================
> -
> -Currently the errors reported by POWERNV/Sapphire (OPAL) interfaces
> -are in free form, where as errors reported by FSP is in standard Platform
> -Error Log (PEL) format. For out-of band management via IPMI interfaces,
> -it is necessary to push down the errors to FSP via mailbox
> -(reported by POWERNV/Sapphire) in PEL format.
> -
> -PEL size can vary from 2K-16K bytes, fields of which needs to populated
> -based on the kind of event and error that needs to be reported.
> -All the information needed to be reported as part of the error, is
> -passed by user using the error-logging interfaces outlined below.
> -Following which, PEL structure is generated based on the input and
> -then passed on to FSP.
> +How to log errors on Sapphire:
better s/Sapphire/OPAL/
> +=============================
> +
> +Currently, the errors reported by Sapphire (OPAL) interfaces are in free form,
> +where as errors reported by FSP is in standard Platform Error Log (PEL) format.
> +For out-of band management via IPMI interfaces, it is necessary to push down
> +the errors to FSP via mailbox (reported by Sapphire) in PEL format.
You are talking about error logging interface here. Hence better to say "service
processor" instead of FSP.
> +
> +PEL size can vary from 2K-16K bytes, fields of which needs to populated based
> +on the kind of event and error that needs to be reported. All the information
> +needed to be reported as part of the error, is passed by user using the
> +error-logging interfaces outlined below. Following which, PEL structure is
> +generated based on the input and then passed on to FSP.
Ditto. We do create eSEL format for BMC based system.. but its just a wrapper
around PEL format. Actual data is still in PEL format.
>
> Error logging interfaces in Sapphire:
> ====================================
> @@ -131,9 +129,8 @@ Step 1: To report an error, invoke opal_elog_create() with required argument.
> of the system. All the parameters needed to generate a SRC
> should be provided during reporting of an event/error.
>
> -
> uint32_t reason_code: Reason for failure as stated in include/errorlog.h
> - for Sapphire
> + for Sapphire
> Eg: Reason code for code-update failures can be
> OPAL_RC_CU_INIT -> Initialisation failure
> OPAL_RC_CU_FLASH -> Flash failure
> @@ -178,16 +175,68 @@ Step 2: Data can be appended to the user data section using the either of
> uint32_t tag: Unique value to identify the data.
> Ideal to have ASCII value for 4-byte string.
>
> -Step 3: Once all the data for an error is logged in, the error needs to be
> - committed in FSP.
> +Step 3: There is a platform hook for the opal error log to be committed on any
> + service processor(Currently used for FSP and BMC based machines).
> +
> + FSP:
Make it as step 3.1 .. So that its easy to read.
> + .elog_commit = elog_fsp_commit
> +
> + Once all the data for an error is logged in, the error needs to
> + be committed in FSP.
> +
> + rc = platform.elog_commit(elog);
> + Value of 0 is returned on success.
> +
> + In the process of committing an error to FSP, Log info is first
s/Log/log/
> + internally converted to PEL format and then pushed to the FSP. All the
> + errors logged in Sapphire are again pushed up to POWERNV platform by
> + the FSP and all the errors reported by Sapphire are logged in FSP.
above sentence is confusing. Just mention that FSP takes care of pushing those
errors back to host.
> + Sapphire maintains timeout field for all error logs it is sending to
> + FSP. If it is not logged within allotted time period (e.g if FSP is
> + down), in that case OPAL sends those logs to POWERNV.
> +
> + BMC:
Make it as 3.2
> + .elog_commit = ipmi_elog_commit
>
> - rc = elog_fsp_commit(buf);
> + rc = platform.elog_commit(elog);
> Value of 0 is returned on success.
>
> -In the process of committing an error to FSP, log info is first internally
> -converted to PEL format and then pushed to the FSP. All the errors logged
> -in Sapphire are again pushed up to POWERNV platform by the FSP and all the errors
> -reported by Sapphire and POWERNV are logged in FSP.
> + In case of BMC machines, Error logs are first converted to eSEL format.
s/Error/error/
> + i.e:
> + eSEL = SEL header + PEL data
> +
> + SEL header contains below fields,
> + struct sel_header {
> + uint16_t id;
> + uint8_t record_type;
> + uint32_t timestamp;
> + uint16_t genid;
> + uint8_t evmrev;
> + uint8_t sensor_type;
> + uint8_t sensor_num;
> + uint8_t dir_type;
> + uint8_t signature;
> + uint8_t reserved[2];
> + }
> +
> + After filling up the SEL header fields, Sapphire copies the error log
> + PEL data after the header section in the error log buffer. Then using
> + IPMI interface, eSEL gets logged in BMC.
> +
> +e.g:
Example for what? I think below hunk is redundant.
-Vasant
> + void log_commit(struct errorlog *elog)
> + {
> + ....
> + ....
> + if (platform.elog_commit) {
> + rc = platform.elog_commit(elog);
> + if (rc)
> + prerror("ELOG: Platform commit error %d\n", rc);
> + return;
> + }
> + ....
> + ....
> + }
>
> If the user does not intend to dump various user data sections, but just
> log the error with some amount of description around that error, they can do
> @@ -196,7 +245,7 @@ so using just the simple error logging interface
> log_simple_error(uint32_t reason_code, char *fmt, ...);
>
> Eg: log_simple_error(OPAL_RC_SURVE_STATUS,
> - "SURV: Error retreiving surveillance status: %d\n",
> + "SURV: Error retrieving surveillance status: %d\n",
> err_len);
>
> Using the reason code, an error log is generated with the information derived
>
More information about the Skiboot
mailing list