ECC memory of BMC

Will Liang (梁永鉉) Will.Liang at quantatw.com
Fri Feb 22 17:37:10 AEDT 2019


Hi,

> -----Original Message-----
> From: Andrew Jeffery [mailto:andrew at aj.id.au]
> Sent: Friday, February 22, 2019 1:55 PM
> To: Will Liang (梁永鉉) <Will.Liang at quantatw.com>;
> openbmc at lists.ozlabs.org; Stefan M Schaeckeler <sschaeck at cisco.com>
> Cc: dkodihal at in.ibm.com
> Subject: Re: ECC memory of BMC
> 
> On Fri, 22 Feb 2019, at 16:22, Stefan Schaeckeler (sschaeck) wrote:
> > Hi Will,
> >
> > On 2/21/19, 6:00 PM, "Will Liang (梁永鉉)" <Will.Liang at quantatw.com>
> wrote:
> >
> > > > > What we want to do is to record the ECC events to SEL.
> > > > >
> > > > > we are considering to create new dbus and a service.
> > > >
> > > > Right; I think you need to create a new service that polls the
> > > > sysfs interface for the EDAC device, and then use phosphor-logging to
> create error logs.
> > >
> > > We consider creating the following objects for D-Bus:
> > > -bus name : /xyz/openbmc_project/ECC -object path :
> > > /xyz/openbmc_project/ECC/status -interface :
> > > xyz.openbmc_project.Memory.MemoryECC
> > >
> > > and error types for xyz::openbmc_project::Memory::Ecc::Error::ceCount
> and "ueCount"
> > > and "isLoggingLimitReached" for phosphor-logging error message.
> >
> >
> > Note, the driver also logs the addresses of the recoverable and
> > un-recoverable errors. Perhaps you want to expose them, too?
> >
> > The edac framework is unfortunately not exposing them through sysfs.
> > They get printed through "edac_mc_handle_error()" as
> > printk(KERN_WARNING, ...) and look like
> >
> > root at aspeed-arm:# dmesg | grep EDAC
> > [ 1718.900000] EDAC MC0: 1 CE address(es) not available on
> > mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:0
> > syndrome:0x0)
> > [ 1718.900000] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0
> > channel:0 page:0x80000 offset:0x0 grain:0 syndrome:0x0)
> >
> >
> > I'm not sure if there is an elegant way for userspace to retrieve messages
> from
> > the kernel ring buffer.
> >
> 
> Lets not start scraping dmesg. It's not considered part of the kernel ABI.

We do not expose error message from EDAC driver. 

We only want to fetch recoverable/un-recoverable counts and record the ECC log.
Therefore, we need a service to poll the EDAC driver.

Will


More information about the openbmc mailing list