Adding support for custom SEL records

Wed Oct 26 07:37:06 AEDT 2022

On 10/21/2022 2:34 PM, Patrick Williams wrote:
> On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> 
>> I'd also be curious about the reverse question.  Is there any benefit to
>> storing logs on D-Bus that makes it a better solution?
> ...
>> Is there a way we can now get together and define a new logging solution
>> that is fully upstream and avoids the drawbacks of both existing solutions?
> 
> First and foremost I'd like to see consistency come out of this.  If
> there is another proposal for how to do it that we can all consolidate
> on (and people are willing to put in effort to get there) then I'm
> on-board.  It seems to me like the lowest friction way to get there, with
> the best maintainability, is to use the phosphor-logging APIs even if we
> end up not putting them into d-bus entries.
> 
I agree that phosphor-logging seems like the right place, so I think 
looking at that but changing the back-end storage away from D-Bus 
objects may be a good direction to consider.

> It happens that phosphor-logging stores the instances on d-bus, but the
> more important aspect to me is that we have a more consistent API for
> defining and creating errors and events.  The "rsyslog-way" is that you
> make very specific journal entries that the rsyslog magic knows about,
> but there are a few issues with it:
> 
>      1. We don't have any consistency in what, when, and how events are
>         logged.  We even have cases within the same repository (looking at
>         dbus-sensors) where some of the implementations make the magic
>         SEL records and others do not.  Additionally, they're not required
>         to be the same format.  Some maintainers have even outright
>         rejected patches with the "magic log statements".
> 
Yes, this consistency would be good.  I tried to add SEL logging into 
phosphor-logging, but the patch didn't make it through review: 
https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/13956.

>      2. There is no way to generate something like a Redfish message
>         registry for the events, because they're just arbitrary strings
>         that are sprinkled around.  It isn't even easy to programatically
>         search the code for them because there are 4 different approaches
>         to that: cout/cerr, direct journald, phosphor-logging "v1", and
>         phosphor-logging lg2.
> 
I think Redfish is a more difficult case to handle, but if we can do it 
through the same or similar phosphor-logging API as IPMI, then I'm on board.

As for searching, it's true that different methods are used to get the 
log into the journal, but the Redfish MessageId is consistent in all 
cases and can be programatically searched.

>      3. Any kind of automation around these is more at the whim of
>         whatever the developers / maintainers decide to change.  It is,
>         for example, really difficult for me to write data center tooling
>         that reacts to events like "we just lost pgood to the host"
>         because I have to read through the code to find the specific text
>         and hope it never changes.
> 
Doesn't Redfish solve this issue by guaranteeing the same Message and 
MessageId are used for all the same events?

> Conversely, the phosphor-logging APIs leverage YAML-based error specifiers,
> which can be easily transposed into a Redfish message registry, and happen
> to also be the same structure we use for inter-process errors on d-bus calls.
> While I have to review the implementations to make sure they're
> appropriately created, I have far less concern about them disappearing
> or changing once they are in place (and I can review the changes to the YAML
> specifiers to keep tabs on what changes their might be).
>