[PATCH] nvram buffering/error logging port to 2.6

Tue Dec 2 10:19:06 EST 2003

> 1. Machine boots, we execute event-scans but dont request error log
> entries.
> 2. When the rtas proc file is opened we then start requesting error log
> information.
>
> Is this less reliable?

In a little more detail:
1.) On boot, what was in NVRAM is store into memory.
2.) Event-scans wills start pulling error logs and if there is an
error-log entry from rtas, that data overwrites what was in NVRAM.
3.) rtas_errd will pull from /proc/ppc64/rtas/error_log
4.) When the data is stored on disk, rtas_errd will go and read from
error_log again and this signals that it is safe to clear NVRAM of the
event that was stored.

So it is possible to lose the event stored from last boot if on the
current boot the system goes down inbetween the first event-scan (and
the case that there is a new event-log entry) and when the rtas_errd
runs for the first time.

I do not feel that this is a big hole, but this hole could be closed by
not starting event-scans until rtas_errd has started.  This does not
seem smart, as if rtas_errd is not installed on the system we will get a
surveillance timeout.

>  Well we already have a window between where we do
> the event scan and when we write the information to NVRAM. Im guessing
> writing NVRAM isnt fast, we could easily lose or get corrupted event
> scan data if the machine locked up in this window.

There is nothing that can be done about this.

> NVRAM is a limited resource, how do we avoid overflowing it during boot?

The OS is guaranteed 1K of NVRAM per partition.  If for some reason we
do not have the space we should not do the NVRAM buffering of the events
coming in.

> Could we lose error log information if we end up with a bunch of     >
event-scan error logs?

Yes, if we are over 64 error-logs and rtas_errd is not processing them
fast enough it is possible.  The most I have ever seen is 3 come in at
once.  If 64 come in at one time, then there is something severly
broken.

> The real way to fix this window is to have a better interface to the error
> log information (ie a read error log RTAS call and a discard error log
> RTAS call, you call discard error log once you have successfully
> committed the error log to disk).

I'm not clear.  How is this different then what is currently there?  Do
you mean storing every single error-log in NVRAM until it is on disk?
Currently we only store 1 error log because we are only guaranteed that
much space in NVRAM (i.e. could lose that NVRAM space on the next boot
and nullify the buffering of error-logs in NVRAM).  So the last fatal
error-log received is what is stored into NVRAM or if there was no fatal
then just the last error-log received.

Thanks,
Jake

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/