RFC for event logging mechanism

Andrew Jeffery andrew at aj.id.au
Wed Sep 6 09:13:35 AEST 2017


On Tue, 2017-09-05 at 15:46 -0700, Rick Altherr wrote:
> On Tue, Sep 5, 2017 at 4:24 AM, Deepak Kodihalli
> > <dkodihal at linux.vnet.ibm.com> wrote:
> > Hello,
> > 
> > I'm working this sprint on designing an event logging mechanism
> > > > (https://github.com/openbmc/openbmc/issues/1856). I have a couple proposals
> > below along with some questions. Hoping to hear thoughts on which might be a
> > better proposal. Any other feedback is welcome.
> > 
> > Potential requirements
> > 1) Applications should be able to log events of interest. Events could be
> > used for purposes such as telemetry, analytics, debug. Examples of events
> > could be changes in the power/thermal domain, such as operating temps on a
> > server, boot related, user account changes, etc.
> 
> My experience at Google is that telemetry and debug, while both
> event-oriented, have vastly different usages which lead to different
> designs.  Telemetry like temperature, voltage, current, boot count,
> f/w version are primarily collected, aggregated, and analyzed by
> automation systems.  To avoid complexity in the consumers of that
> data, it is ideally self-describing, easily parsable, and in the most
> intuitive, primitive form.  Our internal systems use explicit types
> and generous metadata for _each_ piece of telemetry.  For example, a
> voltage sensor is encoded as a float with metadata describing the rail
> being monitored, units (mV or V), time recorded, etc.  I expect these
> to be consumed as they are generated, either by local processes or by
> streaming off-machine.
> 
> Debug is almost exclusively consumed by humans in cases where a
> problem has already been found and the relevant data for identifying
> the root cause is not in the telemetry and unknown.  I think of debug
> as being comprised mostly of unstructured logs.  Off-machine
> collection can be done infrequently and in bulk.
> 
> All attempts I've seen to mix the two has lead to a messy API and/or
> data model.  Even if the data model and API _can_ be shared, the
> higher-level consumers of the two types of data will be different.
> Instead of requiring the client to know which are telemetry and which
> are debug logs, separating that into two APIs makes the consumer's
> life simpler.

Having only briefly skimmed the emails, I support Rick's position.
Having recently tried to debug complex issues with OpenBMC in its
current state, the structured logging made life pretty miserable. It
lead to vague error messages that gave no chance of diagnosing the
actual problem at hand, whilst the volume of the conflated
telemetry/debug logging was large.

Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170906/0532aa25/attachment.sig>


More information about the openbmc mailing list