[Skiboot] [RFC PATCH 10/10] console: add log scraping OPAL call

Fri Jul 7 15:58:32 AEST 2017

Oliver <oohall at gmail.com> writes:
> On Wed, Jul 5, 2017 at 4:06 PM, Stewart Smith
> <stewart at linux.vnet.ibm.com> wrote:
>> Oliver O'Halloran <oohall at gmail.com> writes:
>>> Adds an OPAL call that parses the OPAL log, finds important messages
>>> (log_level below PR_WARN) and copies them up to the OS. This is intended
>>> to make firmware error messages more visible to system administrators
>>> since it allows the OS the report them in the usual places rather than
>>> a OPAL specific log file (e.g. /sys/firmware/opal/msglog under linux).
>>>
>>> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
>>
>>
>> So, I've had a renewed bit of thinking on what we should do here, and
>> something we could do without too much effort to make a decent
>> (and progressive) improvement(s) to getting messages out to users.
>>
>> I'm thinking that to start with, we may want 2-3 additional bits of info
>> being passed to linux:
>> 1) is this a message from current boot or previous
>>    The logic being that we may be able to go "and here's why you died"
>>    on subsequent boot?
>>    (I'm not sure on this...)
>
> Maybe, systemd already handles keeping track of error messages across
> boots (and kexecs) so I'm not sure this is the best place in the stack
> to handle that.

I was more thinking in recovery from checkstopping and similar, where
Linux never saw what killed it, but we can maybe help work it out in
firmware.

I'm not quite convinced it's the best way to do things. thoughts?

>
>> 2) arbitrary other bit of info (extendable). i.e. we could point to the
>>    phandle of the thing that died, the error log id (or some
>>    replacement), but the idea being we've set up the call from day 1 to
>>    support that.
>
> Getting an error log ID if the error generated one might be useful,
> but I don't see what we can do with something like a phandle. Even if
> we could localise an error to a phandle we have no idea how that error
> should be handled and realistically the driver that did the OPAL call
> that caused the error should have handled it.

r.e. phandle, my only thought was that Linux could use it to may to
something that would make sense from a Linux PoV rather than what makes
sens from an OPAL PoV

But it was all kind of a handwave to go "something useful that we don't
really have decided yet"

-- 
Stewart Smith
OPAL Architect, IBM.