Link phosphor-hostlogger and bmcweb

Nan Zhou nanzhou at google.com
Wed May 26 16:51:44 AEST 2021


> > 1. log_buffer.xpp:
> > will be removed; we propose to persist a chunk into the log file as soon
> as
> > we receive it; newline will be logged as is, so logs are still splitted
> by
> > newlines;
> If you remove the newline handler we will lose the timestamps added for
> each
> line.

Good catch. If timestamps are necessary (it is necessary for the Redfish
LogEntry), I think we can keep the current newline detection logic in
log_buffer.xpp. Once a new line is detected, we append the current
timestamp first, and then append the bytes chunk.

> we can potentially set the maximum length of a log as well (force
> > split a long line into smaller lines)
> I would leave the host output "as is". I don't think we should modify it.

OK. We have concerns about edge cases where a log line is too long.

> 2. host_console.xpp: stay unchanged;
> >
> > 3. zlib_file.xpp, zlib_exception.xpp:
> > will be removed or slightly changed; we can potentially use the linux
> > logrotate which has built-in compression and file rotation (in this case
> > these compression utilities will be removed).
> > The latest log file isn't compressed any more. History log files are
> still
> > compressed.
> Just curious, how are you going to remove the oldest messages from the
> latest
> file in runtime? You are not going to rewrite the entire file on every
> input
> character, are you?

The following is my current idea: we will rename the latest file to
something else and notify the writer (hostlogger) to close its old file
descriptor and open a new one (should be doable via linux logrotate and
inotify or some signal handlers, as logrotate is able to send some signals
to hostlogger if a rotation is performed). The writer keeps appending logs
most of the time using the same fd unless the latest file is rotated. This
should be better than truncating the file where the reader (BMCWeb) won't
have race conditions (it might read old snapshots but it is not a big deal
in our case).

> We need to implement some codes to deal with the race condition in log
> > rotations (we should reopen the writing fd after the latest log file is
> > renamed; inotify might do the trick).
> > We might also need to rotate according to host boot cycles (like what
> > postcodes are doing right now).
> >
> > 3. dbus_loop.xpp: stay unchanged;
> >
> > 4. service.xpp:
> > will be slightly changed; socket IO callback and host state watcher are
> > kept; manual flash or flash upon service restart will be removed
> The manual flush is used in the dreport, please don't forget to remove that
> call there.

Thanks for the information.

 > 5. config.xpp:
> > will be slightly changed; options for flash policy will be removed;
> Not slightly =)
> Most options are related to the buffer and flush policies.
> As I can see, there will stay the socket Id and output directory only.

Yes. I would assume the host state policy will stay as well.

How do you like the proposal now? I hope we make it more clear. Do you
think it is a good direction that we can go with?

On Tue, May 25, 2021 at 11:11 PM Artem Senichev <artemsen at gmail.com> wrote:

> On Tue, May 25, 2021 at 10:29:48PM -0700, Nan Zhou wrote:
> > Hi Artem,
> >
> > I listed the potential changes below (to the best of my understand of
> > phosphor-hostlogger),
> >
> > 1. log_buffer.xpp:
> > will be removed; we propose to persist a chunk into the log file as soon
> as
> > we receive it; newline will be logged as is, so logs are still splitted
> by
> > newlines;
>
> If you remove the newline handler we will lose the timestamps added for
> each
> line.
>
> > we can potentially set the maximum length of a log as well (force
> > split a long line into smaller lines)
>
> I would leave the host output "as is". I don't think we should modify it.
>
> > 2. host_console.xpp: stay unchanged;
> >
> > 3. zlib_file.xpp, zlib_exception.xpp:
> > will be removed or slightly changed; we can potentially use the linux
> > logrotate which has built-in compression and file rotation (in this case
> > these compression utilities will be removed).
> > The latest log file isn't compressed any more. History log files are
> still
> > compressed.
>
> Just curious, how are you going to remove the oldest messages from the
> latest
> file in runtime? You are not going to rewrite the entire file on every
> input
> character, are you?
>
> > We need to implement some codes to deal with the race condition in log
> > rotations (we should reopen the writing fd after the latest log file is
> > renamed; inotify might do the trick).
> > We might also need to rotate according to host boot cycles (like what
> > postcodes are doing right now).
> >
> > 3. dbus_loop.xpp: stay unchanged;
> >
> > 4. service.xpp:
> > will be slightly changed; socket IO callback and host state watcher are
> > kept; manual flash or flash upon service restart will be removed
>
> The manual flush is used in the dreport, please don't forget to remove that
> call there.
>
> > 5. config.xpp:
> > will be slightly changed; options for flash policy will be removed;
>
> Not slightly =)
> Most options are related to the buffer and flush policies.
> As I can see, there will stay the socket Id and output directory only.
>
> > We might split the implementation into several phases. We might not jump
> to
> > the final version at one iteration. But the above are what we eventually
> > want at this moment.
> >
> > On Tue, May 25, 2021 at 8:03 AM Nan Zhou <nanzhou at google.com> wrote:
> >
> > > "we propose to remove the ring buffer and write to the log file as soon
> > >> as some characters are received. This implicitly removes the needs of
> > >> the ring buffer, and the persistence triggering (host reboot, sigterm,
> > >> etc) in hostlogger"
> > >> I would like to get a more detailed description of further changes in
> > >> order
> > >> to see the whole picture of the solution.
> > >
> > >  Originally I didn't consider including changing the log file
> according to
> > > the boot cycle; we will keep part of the dbus/signal watcher to make
> that
> > > different reboot posts to a different log file.
> > >
> > > We will work out some more detailed descriptions for future changes
> soon.
> > >
> > > On Mon, May 24, 2021 at 11:41 PM Artem Senichev <artemsen at gmail.com>
> > > wrote:
> > >
> > >> Sorry guys, maybe this is a misunderstanding on my part.
> > >>
> > >> I was confused with the following line in the proposal:
> > >>
> > >> "we propose to remove the ring buffer and write to the log file as
> soon
> > >> as some characters are received. This implicitly removes the needs of
> > >> the ring buffer, and the persistence triggering (host reboot, sigterm,
> > >> etc) in hostlogger"
> > >>
> > >> I would like to get a more detailed description of further changes in
> > >> order
> > >> to see the whole picture of the solution.
> > >>
> > >> --
> > >> Regards,
> > >> Artem Senichev
> > >> Software Engineer, YADRO.
> > >>
> > >>
> > >> On Mon, May 24, 2021 at 09:27:39AM -0700, Ed Tanous wrote:
> > >> > On Mon, May 24, 2021 at 12:52 AM Artem Senichev <artemsen at gmail.com
> >
> > >> wrote:
> > >> > >
> > >> > > I'll try to convey the main idea that we tried to implement in
> this
> > >> service.
> > >> > >
> > >> > > Hostlogger was originally designed to work with OpenPOWER systems,
> > >> which
> > >> > > generate a very detailed log at boot time.
> > >> >
> > >> > There are definitely other non OpenPOWER systems that have this same
> > >> behavior.
> > >> >
> > >> > > It is important to save these logs and the host console output
> just
> > >> before
> > >> > > rebooting for further investigation of incidents when hardware
> errors
> > >> occur.
> > >> > > So, we have two log files for each server session (boot log +
> last OS
> > >> messages).
> > >> > > That's why we need a D-bus watcher.
> > >> > > BMC flash has around 3MiB of free RW space, this force us to use
> > >> compression
> > >> > > and file rotation.
> > >> > >
> > >> > > All of these features are unnecessary for "streaming" real-time
> log
> > >> recording.
> > >> >
> > >> > I disagree with you there.  Rotation and compression are still
> useful
> > >> > in a "streaming" case.  Because of the ways the APIs are defined,
> > >> > LogService in redfish provides both a "historical" version of past
> > >> > logs.  It's useful to have those logs rotated and compressed.
> > >> >
> > >> > > You don't need DBus watchers, rotation can be done with native
> Linux
> > >> utilities,
> > >> > > you don't even need to split the input stream into lines.
> > >> >
> > >> > I'm not following why those now wouldn't be needed.  Redfish
> LogEntry
> > >> > would separate per line, so we'd have to do the splitting somewhere.
> > >> > There's already code to do that in hostlogger.  Wouldn't you still
> > >> > want to separate log per boot, and have lines split between log
> files?
> > >> >  I'm not following why those would go away just because there's a
> > >> > desire to poll for logs and get up to date information.
> > >> >
> > >> > > Just redirect obmc-console.log: `tail -f
> /var/log/obmc-console.log >
> > >> my.log`.
> > >> >
> > >> > This doesn't solve the problem presented.  First of all, log
> rotation
> > >> > and compression are still needed.  Also, it's desirable to have dbus
> > >> > watchers and separate the logs per boot, such that they can end up
> > >> > separated in the Redfish API.  Also, in what you presented, my.log
> > >> > would quickly and easily overflow the available space, as there's no
> > >> > log rotation.
> > >> >
> > >> > >
> > >> > > I understand your desire to add a new mode to the existing project
> > >> instead of
> > >> > > creating a new one. But there is very little in common between
> these
> > >> modes.
> > >> >
> > >> > I'm not following how they're all that different, see above about
> > >> > needing very similar features.  For the sake of argument, lets say
> we
> > >> > went with a totally different implementation, would it be able to
> live
> > >> > in the hostlogger repo to be able to reuse code where needed?
> There's
> > >> > a lot of code that I suspect will be identical.
> > >> >
> > >> > > Even reading the socket will have to be done separately, since it
> is
> > >> buffered
> > >> > > for line splitting in the current implementation.
> > >> > > In the end, only bb-recipe and the `main` function will remain in
> the
> > >> common.
> > >> > >
> > >> > > --
> > >> > > Regards,
> > >> > > Artem Senichev
> > >> > > Software Engineer, YADRO.
> > >> > >
> > >> > > On Fri, May 21, 2021 at 10:51:45AM -0700, Nan Zhou wrote:
> > >> > >
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > we propose to remove the ring buffer and write to the log
> file
> > >> > > > > > > as soon as some characters are received.
> > >> > > > > >
> > >> > > > > > I am not sure it is a good idea.
> > >> > > > > > The host can generate a lot of logs, but we have very
> limited
> > >> free space.
> > >> > > > > This is a fair concern, but wouldn't the rollover limits make
> > >> this not
> > >> > > > > an issue?  They seem like they could be easily configured.
> > >> > > >
> > >> > > > Right. Logrotate will be able to handle the rotation. Maximum
> size,
> > >> # log
> > >> > > > files, and compression can be easily configured.
> > >> > > >
> > >> > > > > In
> > >> > > > > > addition, such heavy traffic will lead to a quick breakdown
> of
> > >> the flash
> > >> > > > > (most
> > >> > > > > > available products are guaranteed to withstand around
> 100,000
> > >> P/E cycles
> > >> > > > > only).
> > >> > > > > JFFS2 is wear leveled, and there are other BMC stacks that I
> know
> > >> of
> > >> > > > > that implement this without any ill effects to flash
> longevity,
> > >> with
> > >> > > > > that said, if Nan made the "last log on disk" feature
> > >> configurable,
> > >> > > > > would that alleviate your concerns?
> > >> > > >
> > >> > > > We also noticed that the obmc-server itself will buffer the log
> a
> > >> bit. Will
> > >> > > > it still be a problem if we don't write a character at once but
> a
> > >> block of
> > >> > > > them?
> > >> > > > And as Ed said, we can also make this feature configurable. I
> would
> > >> imagine
> > >> > > > the log buffer will remain if the "last log on disk" feature is
> > >> disabled.
> > >> > > >
> > >> > > >
> > >> > > > > >
> > >> > > > > > > This implicitly removes the needs
> > >> > > > > > > of the ring buffer, and the persistence triggering (host
> > >> reboot,
> > >> > > > > sigterm,
> > >> > > > > > > etc) in hostlogger. We believe this keeps the same
> > >> functionality but
> > >> > > > > saves
> > >> > > > > > > hundreds of lines of codes in phosphor-hostlogger.
> > >> > > > > Difference of opinion here, I don't think this removes the
> need
> > >> for
> > >> > > > > the host reboot event;  Having each reboot post to a
> different log
> > >> > > > > needs to be maintained, and I have to imagine that there's
> some
> > >> sort
> > >> > > > > of sigterm handler still, although it becomes a lot smaller.
> > >> > > >
> > >> > > >
> > >> > > > >
> > >> > > > > > You are suggesting to delete the buffer, DBus watcher, log
> > >> rotate. How
> > >> > > > > are you
> > >> > > > > > going to keep the same functionality if you remove
> everything
> > >> related to
> > >> > > > > it?
> > >> > > > > +1.  In the initial thought I didn't think we were removing
> any
> > >> > > > > functionality with this.  I had assumed the dbus watcher would
> > >> remain,
> > >> > > > > and we would still have the log rotation behavior.  In reading
> > >> through
> > >> > > > > Nans proposal I don't think these are getting removed;  Maybe
> I
> > >> > > > > misunderstood?
> > >> > > >
> > >> > > >
> > >> > > > Yes, if we want to keep different reboot posts to a different
> log
> > >> file, we
> > >> > > > can keep part of the dbus/signal watcher.
> > >> > > >
> > >> > > > On Fri, May 21, 2021 at 10:24 AM Ed Tanous <edtanous at google.com
> >
> > >> wrote:
> > >> > > >
> > >> > > > > On Thu, May 20, 2021 at 11:10 PM Artem Senichev <
> > >> artemsen at gmail.com>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > On Thu, May 20, 2021 at 04:29:09PM -0700, Nan Zhou wrote:
> > >> > > > > > > Hi all,
> > >> > > > > > >
> > >> > > > > > > In the previous thread (
> > >> > > > > > >
> > >> https://lists.ozlabs.org/pipermail/openbmc/2021-March/025234.html),
> we
> > >> > > > > > > (engineers from Google and Quanta) discussed our attempt
> to
> > >> share host
> > >> > > > > > > serial logs via Redfish, which includes polling logs via
> > >> LogService and
> > >> > > > > > > streaming log bytes via EventService (e.g. SSE). We went
> with
> > >> the
> > >> > > > > event log
> > >> > > > > > > architecture
> > >> > > > > > > <
> > >> > > > >
> > >>
> https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md
> > >> > > > > >
> > >> > > > > > > and did the implementation.
> > >> > > > > > >
> > >> > > > > > > We still want to reuse the phosphor-hostlogger and do some
> > >> > > > > modification. We
> > >> > > > > > > believe it is better to try to reuse existing codes if
> > >> possible and
> > >> > > > > improve
> > >> > > > > > > them rather than creating new things that have similar
> > >> functionalities
> > >> > > > > (in
> > >> > > > > > > our case, it is a daemon that could collect logs and
> persist
> > >> them).
> > >> > > > > >
> > >> > > > > > I agree, reusing code is a right choice, but only when it is
> > >> really
> > >> > > > > possible.
> > >> > > > > > For now it looks like you want to remove most of the
> Hostlogger
> > >> features
> > >> > > > > to
> > >> > > > > > transform it from buffer-like to stream-like service.
> > >> > > > >
> > >> > > > > I'm not quite following this statement.  Which features are
> > >> getting
> > >> > > > > removed?  From what I can see, he's suggesting making
> > >> > > > > phosphor-hostlogger look more like a well-behaved linux
> logging
> > >> > > > > daemon, but I don't think any features got omitted (or I'm
> missing
> > >> > > > > something critical).
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > We want to do the following modification in
> > >> phosphor-hostlogger (from
> > >> > > > > the
> > >> > > > > > > input and output point of view, just very few things will
> be
> > >> changed)
> > >> > > > > > >
> > >> > > > > > > 1. One limitation of phosphor-hostlogger is that it will
> lose
> > >> data in
> > >> > > > > the
> > >> > > > > > > memory (the ring buffer maintained by hostlogger) when BMC
> > >> gets force
> > >> > > > > > > restarted;
> > >> > > > > >
> > >> > > > > > When BMC goes to reboot it stops all services, at that
> moment
> > >> hostlogger
> > >> > > > > gets
> > >> > > > > > a signal and flushes all gathered logs to a file.
> > >> > > > >
> > >> > > > > Only if the reboot is planned.  If the BMC loses power (which
> is
> > >> > > > > "normal" for a bmc) there isn't time to persist to flash
> before
> > >> the
> > >> > > > > power goes down and the logs are most likely lost.
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > we propose to remove the ring buffer and write to the log
> file
> > >> > > > > > > as soon as some characters are received.
> > >> > > > > >
> > >> > > > > > I am not sure it is a good idea.
> > >> > > > > > The host can generate a lot of logs, but we have very
> limited
> > >> free space.
> > >> > > > >
> > >> > > > > This is a fair concern, but wouldn't the rollover limits make
> > >> this not
> > >> > > > > an issue?  They seem like they could be easily configured.
> > >> > > > >
> > >> > > > > > In
> > >> > > > > > addition, such heavy traffic will lead to a quick breakdown
> of
> > >> the flash
> > >> > > > > (most
> > >> > > > > > available products are guaranteed to withstand around
> 100,000
> > >> P/E cycles
> > >> > > > > only).
> > >> > > > >
> > >> > > > > JFFS2 is wear leveled, and there are other BMC stacks that I
> know
> > >> of
> > >> > > > > that implement this without any ill effects to flash
> longevity,
> > >> with
> > >> > > > > that said, if Nan made the "last log on disk" feature
> > >> configurable,
> > >> > > > > would that alleviate your concerns?
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > This implicitly removes the needs
> > >> > > > > > > of the ring buffer, and the persistence triggering (host
> > >> reboot,
> > >> > > > > sigterm,
> > >> > > > > > > etc) in hostlogger. We believe this keeps the same
> > >> functionality but
> > >> > > > > saves
> > >> > > > > > > hundreds of lines of codes in phosphor-hostlogger.
> > >> > > > >
> > >> > > > > Difference of opinion here, I don't think this removes the
> need
> > >> for
> > >> > > > > the host reboot event;  Having each reboot post to a
> different log
> > >> > > > > needs to be maintained, and I have to imagine that there's
> some
> > >> sort
> > >> > > > > of sigterm handler still, although it becomes a lot smaller.
> > >> > > > >
> > >> > > > > >
> > >> > > > > > You are suggesting to delete the buffer, DBus watcher, log
> > >> rotate. How
> > >> > > > > are you
> > >> > > > > > going to keep the same functionality if you remove
> everything
> > >> related to
> > >> > > > > it?
> > >> > > > >
> > >> > > > > +1.  In the initial thought I didn't think we were removing
> any
> > >> > > > > functionality with this.  I had assumed the dbus watcher would
> > >> remain,
> > >> > > > > and we would still have the log rotation behavior.  In reading
> > >> through
> > >> > > > > Nans proposal I don't think these are getting removed;  Maybe
> I
> > >> > > > > misunderstood?
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > 2. We propose not to compress the latest log file. This
> saves
> > >> us the
> > >> > > > > > > overhead of doing decompression when BMCWeb just needs to
> > >> retrieve the
> > >> > > > > most
> > >> > > > > > > recent logs. There are still going to be log rotations in
> the
> > >> file
> > >> > > > > level.
> > >> > > > > > > Files other than the latest log file are still going to be
> > >> compressed.
> > >> > > > > We
> > >> > > > > > > can modify existing codes to achieve this or use the linux
> > >> logrotate
> > >> > > > > > > directly.
> > >> > > > > > >
> > >> > > > > > > Furthermore, we will add host serial logs into BMCWeb,
> both
> > >> LogService
> > >> > > > > and
> > >> > > > > > > EventService. In LogService, we will teach BMCWeb how to
> read
> > >> the
> > >> > > > > latest
> > >> > > > > > > log file that is not compressed and the other compressed
> old
> > >> logs, and
> > >> > > > > how
> > >> > > > > > > to assemble LogEntries out of raw serial logs. We will
> discuss
> > >> > > > > EventService
> > >> > > > > > > in future threads but the very initial idea is to setup
> > >> inotify on log
> > >> > > > > > > files and SSE to stream out new bytes to clients (like
> what
> > >> the
> > >> > > > > existing
> > >> > > > > > > event logging is doing).
> > >> > > > > > >
> > >> > > > > > > As we said above, for phosphor-hostlogger, the input is
> still
> > >> the
> > >> > > > > > > obmc-server unix socket, and the output are still
> persisted
> > >> log files.
> > >> > > > > But
> > >> > > > > > > the functionality will get improved (less data loss), code
> > >> gets
> > >> > > > > simplified
> > >> > > > > > > (no ring buffer and persistence triggering), and it will
> > >> become easier
> > >> > > > > and
> > >> > > > > > > more performant to get BMCWeb hooked up for log streaming
> via
> > >> Redfish.
> > >> > > > > > >
> > >> > > > > > > Please let us know what you think. We appreciate any
> > >> feedback. Thank
> > >> > > > > you
> > >> > > > > > > very much!
> > >> > > > > > >
> > >> > > > > > > Sincerely,
> > >> > > > > > > Nan
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Regards,
> > >> > > > > > Artem Senichev
> > >> > > > > > Software Engineer, YADRO.
> > >> > > > >
> > >>
> > >
>
> --
> Regards,
> Artem Senichev
> Software Engineer, YADRO.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20210525/2cd581ab/attachment-0001.htm>


More information about the openbmc mailing list