[Skiboot] [PATCH] fast-boot: occ: Re-parse the pstate table during fast-boot

Vaidyanathan Srinivasan svaidy at linux.vnet.ibm.com
Sat Feb 3 17:57:02 AEDT 2018


* Nicholas Piggin <npiggin at gmail.com> [2018-02-03 16:24:25]:

> 
> On Fri,  2 Feb 2018 12:32:32 +0530
> Shilpasri G Bhat <shilpa.bhat at linux.vnet.ibm.com> wrote:
> 
> > OCC shares the frequency list to host by copying the pstate table to
> > main memory in HOMER. This table is parsed during boot to create
> > device-tree properties for frequency and pstate IDs. OCC can update
> > the pstate table to present a new set of frequencies to the host. But
> > host will remain oblivious to these changes unless it is re-inited
> > with the updated device-tree CPU frequency properties. So this patch
> > allows to re-parse the pstate table and update the device-tree
> > properties during fast-reboot.
> > 
> > OCC updates the pstate table when asked to do so using pstate-table
> > bias command. And this is mainly used by WOF team for
> > characterization purposes.
> 
> Would this ever be used in production, I'm guessing not? I don't
> think that's a bad thing as such -- designing for test is always
> good. Perhaps a comment though to explain why you're re-parsing
> it.

Never say never :)

At this time this facility is to enabling tooling to set OCC parameter
at runtime and test the system without encoding all parameters and
building a PNOR and dependent components.

This enables a very efficient workflow with just a OCC reboot and
fast-reboot on OPAL+Linux and we are back in about 2 minutes.  This
can be leveraged in automation/CI also to test various parameters.

> Without knowing much about OCC, I'll guess you're doing this so
> you can update the OCC at runtime without having to update firmware
> before each IPL?

Yep, you got the use case right.  These OCC and PState parameters and
tunings can be tested and later rolled up into the firmware.
 
> I guess we should always keep in mind fast reboot should match IPL
> as closely as possible and any undetected deviations are a pretty
> serious flaw. (e.g., you mess up your OCC state and want to return
> to normal, you would reboot).

We expect PState table to change for these tests.  If something goes
wrong, OCC will crash or pull system to safe mode.  No major change in
system configuration like cpu, memory, IO.  If something really bad
happens, we will hang/checkstop and we will have to re-ipl to recover.

If there was no OCC PState changes, then parsing it again should get
us exact state compared to Power-ON and hence no risk.

> I'm just wondering, should this be under an nvram option?

The risk actually depends on what we ask OCC to do and hence not
a major config change/risk for OPAL.  I would like to leave it as
default for fast-reboot.  We add slight time factor to rebuild the
relevant device tree.  Given that fast-reboot itself is experimental,
this is a acceptable risk and overhead.

If we hit new error/fail scenarios in future, we can add settings.
I would like to roll this into a single knob for fast-reboot like
"safe", "risky" or something that can help us choose what we want to
do in reinit path.  We need not call out OCC reinit as explicit nvram
option at this time.

--Vaidy



More information about the Skiboot mailing list