[Skiboot] [PATCH v12 08/10] skiboot: Add opal call to enable/disable Nest IMC

Thu Jun 15 13:47:57 AEST 2017

On Wed, 2017-06-14 at 20:53 +0530, Madhavan Srinivasan wrote:
> 
> On Wednesday 14 June 2017 10:06 AM, Michael Neuling wrote:
> > On Sun, 2017-05-21 at 20:40 +0530, Madhavan Srinivasan wrote:
> > > From: Anju T Sudhakar <anju at linux.vnet.ibm.com>
> > > 
> > > Add new opal calls to start, stop the Nest IMC microcode running in the
> > > OCC complex. Also, check the status from the control block structure
> > > before
> > > starting/stopping the IMC microcode.
> > > 
> > > This patch also add an opal call to init the counters. But for Nest IMC
> > > this
> > > function is a no-op currently.
> > > 
> > > Signed-off-by: Hemant Kumar <hemant at linux.vnet.ibm.com>
> > > Signed-off-by: Anju T Sudhakar <anju at linux.vnet.ibm.com>
> > > Signed-off-by: Madhavan Srinivasan <maddy at linux.vnet.ibm.com>
> > > ---
> > >   hw/imc.c           | 77
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >   include/opal-api.h | 11 +++++++-
> > >   2 files changed, 87 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/imc.c b/hw/imc.c
> > > index 00e19154d641..d56b3eb251b2 100644
> > > --- a/hw/imc.c
> > > +++ b/hw/imc.c
> > > @@ -336,3 +336,80 @@ err:
> > >   	prerror("IMC Devices not added\n");
> > >   	free(imc_catalog_buf);
> > >   }
> > > +
> > > +/*
> > > + * opal_imc_counters_init : This call initialize the IMC engine.
> > > + *
> > > + * For Nest IMC, this is no-op and returns OPAL_SUCCESS at this point.
> > > + */
> > > +static int64_t opal_imc_counters_init(uint32_t type)
> > > +{
> > > +	return OPAL_SUCCESS;
> > > +}
> > 
> > Why don't we do this on in imc_init()?  Why do we rely on the OS doing this?
> 
> Yes. From the nest imc point of view, we dont have any initialization
> today. But incase of core imc, we do have initialization procedure.
> That is, we need to update couple of scoms with a static value and
> one scom called "PDBAR" (scom spec did not expand what is PDBAR
> but will dig more and update it here) needs to be initialized with
> a memory location (where the core imc counter data will be posted
> per-the core logic). And this is a per-core resource.
> 
> Now, not doing this memory location by OPAL will avoid a
> complicated device tree node update from OPAL to pass this
> information to kernel. Secondly size request for the core imc
> is 8K (per core) and in a 2 socket or more we will increase
> the opal reserve memory area alot.
> 
> Now incase of doing it in kernel, at the kexec scenario, we
> first stop the core imc engine via OPAL_IMC_COUNTER_STOP
> and then we free the memory. And this is primarily done
> in the powernv device shutdown path.
> 
> And in the powernv device probe path we have a check
> for the kdump kernel, so we dont load or create IMC devices
> and hence will not do any OPAL_IMC* calls.

Ok, but the microcode will still be writing to memory in the kdump case.  It
would be good to be able to stop that with an opal call.

> > 
> > Anyway... as mentioned before, this alone doesn't compile.
> 
> Yes i will merge the patch 8 and 9 as mentioned before.
> 
> > 
> > > +opal_call(OPAL_IMC_COUNTERS_INIT, opal_imc_counters_init, 1);
> > > +
> > > +/* opal_imc_counters_control_start: This call starts the nest imc engine.
> > > */
> > > +static int64_t opal_imc_counters_start(uint32_t type)
> > > +{
> > > +	u64 op, status;
> > > +	struct imc_chip_cb *cb;
> > > +	int ret = OPAL_SUCCESS;
> > 
> > I'm not sure there is any benefit in having ret.  Makes the code less
> > readable.
> > Just do return OPAL_SUCCESS below.
> 
> Ok sure. Will make the changes.
> 
> > > +
> > > +	switch (type) {
> > > +	case OPAL_IMC_COUNTERS_NEST:
> > > +		/* Fetch the IMC control block structure */
> > > +		cb = get_imc_cb();
> > > +		status = be64_to_cpu(cb->imc_chip_run_status);
> > > +
> > > +		/* Check whether the engine is already running */
> > > +		if (status == NEST_IMC_RUNNING)
> > > +			return ret;
> > 
> > Do we need a memory barrier here?  What ensures ordering between the status
> > read
> > above and the command below?
> 
> Currently nest counters are per-chip and only one thread from a chip
> will update the command at a give time. Secondly, "command" is more
> an input to the microcode. OPAL will not read it back.

I still think there could be a race here.  Sequence seems to be:

1) Microcode sets status to running
2) Microcode start running and hence writing to buffer
3) CPU (in OPAL) reads status
4) CPU sends code

What ensures that at 3 the CPU sees the status correctly as running set in 1?

What are the consequences of sending a command when the microcode thinks it's
running in this scenario? Is it just that it keeps running?  If so, why do we
bother checking the status at all?

> > > +
> > > +		/* Set the run command */
> > > +		op = NEST_IMC_ENABLE;
> > > +
> > > +		/* Write the command to the control block now */
> > > +		cb->imc_chip_command = op;
> > 
> > Do we need a memory barrier before we can start reading these after this op?
> > and if so, who's responsibility is it do do that?  OPAL, or Linux?
> 
> Linux will pass on the state the engine should be and OPAL will update
> the command request from the linux. And microcode is the consumer
> when comes to "command" field. OPAL does not read it back. Now thinking
> more on it having a memory barrier will help. Will add it here.

I'm not sure we need anything... micro code just starts writing and linux will
just start seeing updates at some point.

Mikey