[Skiboot] [PATCH V3 2/3] doc/errorlogging : Update details about error logging on FSP and BMC

Mukesh Ojha mukesh02 at linux.vnet.ibm.com
Mon Jul 18 23:10:25 AEST 2016


This patch add more description and example how error logging is independent of
the platform and also talks about error logging on BMC.

Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
---
Changes in V3:
 - Resolves naming inconsistency pointed by Vasant.
 - Removes POWERNV logging part, as of now in current system it does not log
   any error to OPAL or to FSP/BMC.

Changes in V2:
 - Corrects typo mistake.
 - Adds more detail about eSEL format.
 - Changes talk about generic service processors.

 doc/error-logging.txt | 99 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 74 insertions(+), 25 deletions(-)

diff --git a/doc/error-logging.txt b/doc/error-logging.txt
index 7c62520..8a3134e 100644
--- a/doc/error-logging.txt
+++ b/doc/error-logging.txt
@@ -1,18 +1,16 @@
-How to log errors on Sapphire and POWERNV:
-=========================================
-
-Currently the errors reported by POWERNV/Sapphire (OPAL) interfaces
-are in free form, where as errors reported by FSP is in standard Platform
-Error Log (PEL) format. For out-of band management via IPMI interfaces,
-it is necessary to push down the errors to FSP via mailbox
-(reported by POWERNV/Sapphire) in PEL format.
-
-PEL size can vary from 2K-16K bytes, fields of which needs to populated
-based on the kind of event and error that needs to be reported.
-All the information needed to be reported as part of the error, is
-passed by user using the error-logging interfaces outlined below.
-Following which, PEL structure is generated based on the input and
-then passed on to FSP.
+How to log errors on Sapphire:
+=============================
+
+Currently, the errors reported by Sapphire (OPAL) interfaces are in free form,
+where as errors reported by FSP is in standard Platform Error Log (PEL) format.
+For out-of band management via IPMI interfaces, it is necessary to push down
+the errors to FSP via mailbox (reported by Sapphire) in PEL format.
+
+PEL size can vary from 2K-16K bytes, fields of which needs to populated based
+on the kind of event and error that needs to be reported. All the information
+needed to be reported as part of the error, is passed by user using the
+error-logging interfaces outlined below. Following which, PEL structure is
+generated based on the input and then passed on to FSP.
 
 Error logging interfaces in Sapphire:
 ====================================
@@ -131,9 +129,8 @@ Step 1: To report an error, invoke opal_elog_create() with required argument.
 			of the system. All the parameters needed to generate a SRC
 			should be provided during reporting of an event/error.
 
-
 	 uint32_t reason_code: Reason for failure as stated in include/errorlog.h
-				for Sapphire
+			for Sapphire
 			Eg: Reason code for code-update failures can be
 				OPAL_RC_CU_INIT  -> Initialisation failure
 				OPAL_RC_CU_FLASH -> Flash failure
@@ -178,16 +175,68 @@ Step 2: Data can be appended to the user data section using the either of
 	uint32_t tag: Unique value to identify the data.
                        Ideal to have ASCII value for 4-byte string.
 
-Step 3: Once all the data for an error is logged in, the error needs to be
-	committed in FSP.
+Step 3: There is a platform hook for the opal error log to be committed on any
+	service processor(Currently used for FSP and BMC based machines).
+
+	FSP:
+		.elog_commit            = elog_fsp_commit
+
+	Once all the data for an error is logged in, the error needs to
+	be committed in FSP.
+
+	rc = platform.elog_commit(elog);
+	Value of 0 is returned on success.
+
+	In the process of committing an error to FSP, Log info is first
+	internally converted to PEL format and then pushed to the FSP. All the
+	errors logged in Sapphire are again pushed up to POWERNV platform by
+	the FSP and all the errors reported by Sapphire are logged in FSP.
+	Sapphire maintains timeout field for all error logs it is sending to
+	FSP. If it is not logged within allotted time period (e.g if FSP is
+	down), in that case OPAL sends those logs to POWERNV.
+
+	BMC:
+		.elog_commit            = ipmi_elog_commit
 
-	rc = elog_fsp_commit(buf);
+	rc = platform.elog_commit(elog);
 	Value of 0 is returned on success.
 
-In the process of committing an error to FSP, log info is first internally
-converted to PEL format and then pushed to the FSP. All the errors logged
-in Sapphire are again pushed up to POWERNV platform by the FSP and all the errors
-reported by Sapphire and POWERNV are logged in FSP.
+	In case of BMC machines, Error logs are first converted to eSEL format.
+	i.e:
+		eSEL = SEL header + PEL data
+
+	SEL header contains below fields,
+	struct sel_header {
+		uint16_t id;
+		uint8_t record_type;
+		uint32_t timestamp;
+		uint16_t genid;
+		uint8_t evmrev;
+		uint8_t sensor_type;
+		uint8_t sensor_num;
+		uint8_t dir_type;
+		uint8_t signature;
+		uint8_t reserved[2];
+	}
+
+	After filling up the SEL header fields, Sapphire copies the error log
+	PEL data after the header section in the error log buffer. Then using
+	IPMI interface, eSEL gets logged in BMC.
+
+e.g:
+	void log_commit(struct errorlog *elog)
+	{
+		....
+		....
+		if (platform.elog_commit) {
+			rc = platform.elog_commit(elog);
+			if (rc)
+				prerror("ELOG: Platform commit error %d\n", rc);
+			return;
+		}
+		....
+		....
+	}
 
 If the user does not intend to dump various user data sections, but just
 log the error with some amount of description around that error, they can do
@@ -196,7 +245,7 @@ so using just the simple error logging interface
 log_simple_error(uint32_t reason_code, char *fmt, ...);
 
 Eg: log_simple_error(OPAL_RC_SURVE_STATUS,
-			"SURV: Error retreiving surveillance status: %d\n",
+			"SURV: Error retrieving surveillance status: %d\n",
                        						err_len);
 
 Using the reason code, an error log is generated with the information derived
-- 
2.7.4



More information about the Skiboot mailing list