anyone interested in chip register error diagnostics?

Brad Bishop bradleyb at fuzziesquirrel.com
Tue Mar 5 08:01:16 AEDT 2019


oops…I did not mean to take the list off copy.  Adding it back on…

> On Mar 4, 2019, at 3:56 PM, Supreeth Venkatesh <Supreeth.Venkatesh at arm.com> wrote:
> 
> 
> Thanks Brad.
> 
> Hi Zane/Brad,
> 
> On Arm Platforms, We use Common Platform Error Record (CPER) to report these kinds of hardware errors.
> The format of the errors are defined in Appendix N in UEFI specification
> http://www.uefi.org/sites/default/files/resources/UEFI%20Spec%202_7_A%20Sept%206.pdf
> 
> I have not read the proposal in its entirety, but this seems similar to Reliability, Availability, Serviceability (RAS) feature using
> System Management Mode/Management Mode, but on the BMC side.
> 
> I will take a look at the reviews posted and provide more feedback.
> 
> If this is something similar to RAS feature, I have in fact proposed in DMTF PMCI WG to include CPER formats to be added to one of
> the PLDM specifications.
> 
> Arm would be interested in the design of this component, if it can accommodate the above error formats and component can be designed in
> an architecture agnostic way.
> 
> Thanks,
> Supreeth
> 
> -----Original Message-----
> From: Brad Bishop <bradleyb at fuzziesquirrel.com>
> Sent: Monday, March 4, 2019 2:38 PM
> To: zshelle <zshelle at linux.vnet.ibm.com>; Supreeth Venkatesh <Supreeth.Venkatesh at arm.com>; ed.tanous at intel.com
> Subject: Re: anyone interested in chip register error diagnostics?
> 
> On Mon, Mar 04, 2019 at 02:22:45PM -0600, zshelle wrote:
>> On POWER, I work on a component that listens for hardware errors
>> reported by registers in the system chips (processors, memory buffers,
>> I/O chips, etc.) and performs service actions based on those errors. I
>> have been working on porting some of this code to the BMC for system
>> fatal error analysis (see my work-in-progress proposals:
>> https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/18591/ and
>> https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/18831/). As part
>> of the new design, we are building a generic, data-driven register
>> error isolator, which will be used by several applications within
>> POWER. However, it has the potential to be useful on other
>> architectures as well. I am curious if anyone in the community is
>> interested in this.
> 
> Thanks Zane - I'll tag Ed(x86) and Supreeth(arm) on this one.  Ed, Supreeth - do you understand the function being proposed here?  How does this work on x86 and arm servers?
> 
> thx - brad
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


More information about the openbmc mailing list