Event Log Error Definitions - RFC

Fri Nov 4 08:06:55 AEDT 2016

The openbmc event log is in full swing with Adriana and I working on
it last sprint and this sprint.  I'd like to throw some
questions/design thoughts on there for this sprints current work.

A little background first though
- The initial commit, with the elog interface can be found here:
https://gerrit.openbmc-project.xyz/#/c/660/
- You can follow the commit history in the upper right to see all of
the code associated with it

The event log is primarily used to log errors that occur within the
system.  Think things like my power supply isn't working or a core
wont start or a seeprom has bad vpd.  Our design is to define errors
within a yaml file (elog.yaml), compile (elog-gen.py) those into a
header file for compile time validation (elog-gen.hpp), and then have
the code use the defined error codes when they hit errors.

The event log uses exceptions.  When a user creates an event log, it
throws an exception.  That exception can be caught and handled by
calling code or it will be propagated up into the sdbusplus code which
will then (hand waving here) send the exception over to the calling
application and turn it back into an exception for the calling
application to handle.  The idea is that application code looks the
same (try/catch) when it's calling an internal function and a dbus
interface.

Now for this sprint, I'm working on a story to allow us to define the
errors within the phosphor-dbus-interface repo, but to have the meta
data associated with the error stay in the phosphor-logging repo.  The
story is https://github.com/openbmc/openbmc/issues/731

Currently the design is that all errors are defined within the
phosphor-logging/elog.yaml file but we want to move to a portion of
the error being defined within the phosphor-dbus-interface repo.  The
idea is that you define the errors a dbus application can return when
you're defining it's interface, but the meta data (i.e. the data you
collect with the error like file name info and errno) is defined
within the phosphor-logging repo.  This ends up being a bit of a pain
for a dev because they have to update 2 repo's to add a new error but
the goal is to have a fairly minimal set of errors defined that can be
shared between applications (and it feels architecturally correct,
define the errors in the interface, define the data associated with
those errors in the event log).  We foresee the meta data that can go
with errors growing in the future (calling out a piece of defective
hardware is one example).

So here's what I'm thinking:

phosphor-dbus-interface/Common.errors.yaml
- name: FILE_NOT_FOUND

phosphor-logging/elog.yaml
- name: FILE_NOT_FOUND:
        msg: "A required file was not found"
        level: INFO
        meta:
            - str: "ERRNUM=0x%.4X"
              type: int
            - str: FILE_PATH=%s
              type: const char *
            - str: FILE_NAME=%s
              type: const char *

The tool phosphor-logging/elog-gen.py is what will take the .yaml
files, validate all errors defined in phosphor-dbus-interface have a
corresponding entry in elog.yaml and then generate the elog-gen.hpp
file.  We want to keep the overlap minimal so putting most of the data
in the phosphor-logging repo makes the most sense to me.

The Common.errors.yaml is where we put common errors shared across
multiple dbus applications.  The apps will also have their own
specific yaml files in phosphor-dbus-interface when needed (i.e.
Inventory.errors.yaml).

Thought/Comments/Questions?

Andrew