[Skiboot] [PATCH V3 2/3] doc/errorlogging : Update details about error logging on FSP and BMC

Vasant Hegde hegdevasant at linux.vnet.ibm.com
Thu Aug 4 21:42:22 AEST 2016


On 07/18/2016 06:40 PM, Mukesh Ojha wrote:
> This patch add more description and example how error logging is independent of
> the platform and also talks about error logging on BMC.

As mentioned in other patch you have to convert this to .rst format.


>
> Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
> ---
> Changes in V3:
>   - Resolves naming inconsistency pointed by Vasant.
>   - Removes POWERNV logging part, as of now in current system it does not log
>     any error to OPAL or to FSP/BMC.
>
> Changes in V2:
>   - Corrects typo mistake.
>   - Adds more detail about eSEL format.
>   - Changes talk about generic service processors.
>
>   doc/error-logging.txt | 99 ++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 74 insertions(+), 25 deletions(-)
>
> diff --git a/doc/error-logging.txt b/doc/error-logging.txt
> index 7c62520..8a3134e 100644
> --- a/doc/error-logging.txt
> +++ b/doc/error-logging.txt
> @@ -1,18 +1,16 @@
> -How to log errors on Sapphire and POWERNV:
> -=========================================
> -
> -Currently the errors reported by POWERNV/Sapphire (OPAL) interfaces
> -are in free form, where as errors reported by FSP is in standard Platform
> -Error Log (PEL) format. For out-of band management via IPMI interfaces,
> -it is necessary to push down the errors to FSP via mailbox
> -(reported by POWERNV/Sapphire) in PEL format.
> -
> -PEL size can vary from 2K-16K bytes, fields of which needs to populated
> -based on the kind of event and error that needs to be reported.
> -All the information needed to be reported as part of the error, is
> -passed by user using the error-logging interfaces outlined below.
> -Following which, PEL structure is generated based on the input and
> -then passed on to FSP.
> +How to log errors on Sapphire:

better s/Sapphire/OPAL/

> +=============================
> +
> +Currently, the errors reported by Sapphire (OPAL) interfaces are in free form,
> +where as errors reported by FSP is in standard Platform Error Log (PEL) format.
> +For out-of band management via IPMI interfaces, it is necessary to push down
> +the errors to FSP via mailbox (reported by Sapphire) in PEL format.

You are talking about error logging interface here. Hence better to say "service 
processor" instead of FSP.

> +
> +PEL size can vary from 2K-16K bytes, fields of which needs to populated based
> +on the kind of event and error that needs to be reported. All the information
> +needed to be reported as part of the error, is passed by user using the
> +error-logging interfaces outlined below. Following which, PEL structure is
> +generated based on the input and then passed on to FSP.

Ditto. We do create eSEL format for BMC based system.. but its just a wrapper 
around PEL format. Actual data is still in PEL format.



>
>   Error logging interfaces in Sapphire:
>   ====================================
> @@ -131,9 +129,8 @@ Step 1: To report an error, invoke opal_elog_create() with required argument.
>   			of the system. All the parameters needed to generate a SRC
>   			should be provided during reporting of an event/error.
>
> -
>   	 uint32_t reason_code: Reason for failure as stated in include/errorlog.h
> -				for Sapphire
> +			for Sapphire
>   			Eg: Reason code for code-update failures can be
>   				OPAL_RC_CU_INIT  -> Initialisation failure
>   				OPAL_RC_CU_FLASH -> Flash failure
> @@ -178,16 +175,68 @@ Step 2: Data can be appended to the user data section using the either of
>   	uint32_t tag: Unique value to identify the data.
>                          Ideal to have ASCII value for 4-byte string.
>
> -Step 3: Once all the data for an error is logged in, the error needs to be
> -	committed in FSP.
> +Step 3: There is a platform hook for the opal error log to be committed on any
> +	service processor(Currently used for FSP and BMC based machines).


> +
> +	FSP:

Make it as step 3.1 .. So that its easy to read.


> +		.elog_commit            = elog_fsp_commit
> +
> +	Once all the data for an error is logged in, the error needs to
> +	be committed in FSP.
> +
> +	rc = platform.elog_commit(elog);
> +	Value of 0 is returned on success.
> +
> +	In the process of committing an error to FSP, Log info is first

s/Log/log/

> +	internally converted to PEL format and then pushed to the FSP. All the
> +	errors logged in Sapphire are again pushed up to POWERNV platform by
> +	the FSP and all the errors reported by Sapphire are logged in FSP.

above sentence is confusing. Just mention that FSP takes care of pushing those 
errors back to host.

> +	Sapphire maintains timeout field for all error logs it is sending to
> +	FSP. If it is not logged within allotted time period (e.g if FSP is
> +	down), in that case OPAL sends those logs to POWERNV.
> +
> +	BMC:

Make it as 3.2

> +		.elog_commit            = ipmi_elog_commit
>
> -	rc = elog_fsp_commit(buf);
> +	rc = platform.elog_commit(elog);
>   	Value of 0 is returned on success.
>
> -In the process of committing an error to FSP, log info is first internally
> -converted to PEL format and then pushed to the FSP. All the errors logged
> -in Sapphire are again pushed up to POWERNV platform by the FSP and all the errors
> -reported by Sapphire and POWERNV are logged in FSP.
> +	In case of BMC machines, Error logs are first converted to eSEL format.

s/Error/error/

> +	i.e:
> +		eSEL = SEL header + PEL data
> +
> +	SEL header contains below fields,
> +	struct sel_header {
> +		uint16_t id;
> +		uint8_t record_type;
> +		uint32_t timestamp;
> +		uint16_t genid;
> +		uint8_t evmrev;
> +		uint8_t sensor_type;
> +		uint8_t sensor_num;
> +		uint8_t dir_type;
> +		uint8_t signature;
> +		uint8_t reserved[2];
> +	}
> +
> +	After filling up the SEL header fields, Sapphire copies the error log
> +	PEL data after the header section in the error log buffer. Then using
> +	IPMI interface, eSEL gets logged in BMC.
> +
> +e.g:

Example for what? I think below hunk is redundant.

-Vasant

> +	void log_commit(struct errorlog *elog)
> +	{
> +		....
> +		....
> +		if (platform.elog_commit) {
> +			rc = platform.elog_commit(elog);
> +			if (rc)
> +				prerror("ELOG: Platform commit error %d\n", rc);
> +			return;
> +		}
> +		....
> +		....
> +	}
>
>   If the user does not intend to dump various user data sections, but just
>   log the error with some amount of description around that error, they can do
> @@ -196,7 +245,7 @@ so using just the simple error logging interface
>   log_simple_error(uint32_t reason_code, char *fmt, ...);
>
>   Eg: log_simple_error(OPAL_RC_SURVE_STATUS,
> -			"SURV: Error retreiving surveillance status: %d\n",
> +			"SURV: Error retrieving surveillance status: %d\n",
>                          						err_len);
>
>   Using the reason code, an error log is generated with the information derived
>



More information about the Skiboot mailing list