Exposing POST codes
    Rob Lippert 
    rlippert at google.com
       
    Wed Mar  7 12:05:11 AEDT 2018
    
    
  
On Mon, Mar 5, 2018 at 5:05 PM, Stewart Smith <stewart at linux.vnet.ibm.com>
wrote:
> "Tanous, Ed" <ed.tanous at intel.com> writes:
> > Does POWER have any way of reporting detailed boot progress?  If, for
> > example, the USB link training starts processor init flows, is that
> > logged in a POWER system?  On x86, it would be logged as a POST code.
>
> On POWER (currently at least) there's a few things in play.
>
> On OpenPOWER systems the only thing we currently actively communicate to
> the BMC is the IPMI FW progress sensor, which isn't especially fine
> grained, but it's what we have hooked up.
>
> We do print out more detailed progress information to the console
> though. What we print out to the console is roughly in two categories:
> a) ISTEPs (probably the closest thing we have to POST codes, in that
>    they're numbers), but these also have names because text is more
>    descriptive than numbers.
> b) log messages from OPAL (words, mostly around what we've probed/are
> initing)
>
> One thing to note about the istep numbers is that they can go
> *backwards* if our firmware needs to do a reconfigure loop (e.g. we're
> after a firmware update and needing to flash a seeprom inside the chip,
> or we've discovered a problem with one of the cores and we're going to
> disable it).
>
> On the more enterprise-y POWER systems, there's SRC codes, which
> are a set of incomprehensible hexadecimal numbers in a seemingly random
> order designed to a) fit on a tiny LCD screen on the front of the
> machine and b) not be strings that would have to be translated.
> (I *always* have to google them, and even then, I don't think it helps)
>
> If there's a problem during boot, we'd generally look at the console
> output.... unless boot failure is *really* *REALLY* early, in which case
> it's before we have any communications channel to the BMC open (and you
> have to go and poke at the chip through one of the debug
> interfaces... although we would like to improve this situation)
>
> >> It just seems like the first thing anyone is going to do with these
> numbers is
> >> look them up and map them to something.  Wouldn’t it make sense to have
> >> done that mapping already at the API level so that every user and piece
> of
> >> code using this API doesn’t have to do it themselves?
> > That seems like a reasonable assumption, but practically isn't always
> > an option;  In general the POST code mappings are difficult to come
> > by, especially in initial system bringup, and that is when they are
> > most valuable.  If attempted, the ability to provide a mapping should
> > be made optional, which means the proposed interface still needs to
> > exist.
>
> What if it was a "number and/or string" kind of interface? Would that
> work? On
> x86 if you only have the method of getting a number out, you could just
> have the numbers (unless you have a mapping somewhere), but on POWER we
> could hook this up to get a number and/or string from firmware.
>
> >> I think what I’m hinting at here is that you could add a per-platform
> config file
> >> to your app that maps the codes to some enumerations in the DBus
> >> interface, and apply that mapping before you emit the signal. If you
> wanted
> >> to go back to numbers later you could just reverse the mapping using the
> >> same config file.  Please poke holes.
> >
> > I would argue that this functionality is outside the scope of Patricks
> > patch.  We could very clearly do as you're suggesting, but it would be
> > error prone, and make per-platform configuration more difficult to
> > port, and would likely take a number of months to get correct for all
> > platforms.  As is, Patricks patch adds value outside of his direct
> > platform, as other teams would have an immediate use of it, and is
> > very clear and clean to implement.  Building the platform configurable
> > API you suggest would take a lot more time and effort, for only a
> > little incremental value.  This seems like a case of "Perfect is the
> > enemy of good".  As is, both the API and the daemon are things that I
> > would use today on my platforms.
>
> Would a universal interface look something like this:
>
> - enum ProgressStages
>   (to support things like IPMI fw progress, i.e. generic and well
>   accepted what these mean)
> - int (descriptive integer, platform specific, 0=unknown)
> - string (descriptive, platform specific, can be null)
>
> with each platform implementing whatever parts of that they can.
>
> Looks like x86 post codes would go in the int, maybe a lookup table for
> the string (if available).
>
> For POWER, we'd poke the istep number into the int, and a description
> into the string (from the host, some unknown mechanism to do that).
>
> thoughts?
>
I implemented port 80h POST codes for POWER9 hostboot a while back:
https://github.com/open-power/hostboot/blob/c93bef31ae6ce781f9e0a11bb9224b6728ff120f/src/usr/initservice/istepdispatcher/istepdispatcher.C#L2312
On Zaius machines we are using that support with Patrick's snoop daemon and
a separate daemon that receives the code via dbus and outputs it over the
front 7seg debug display.
It has proven useful for getting early error/debug reports from technicians
at scale e.g. "5 machines stopped at code 35h, 2 at 72h" provides a quick
overview of what the problems are for me to debug further (since I have the
decoder ring, and the istep names would be useless to them anyways).
-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20180306/0a32db25/attachment.html>
    
    
More information about the openbmc
mailing list