[PATCH] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards

Michal Suchánek msuchanek at suse.de
Sat Apr 24 04:42:16 AEST 2021


On Fri, Apr 23, 2021 at 11:59:30PM +0530, Vaidyanathan Srinivasan wrote:
> * Michal Such?nek <msuchanek at suse.de> [2021-04-23 19:45:05]:
> 
> > On Fri, Apr 23, 2021 at 09:29:39PM +0530, Vaidyanathan Srinivasan wrote:
> > > * Michal Such?nek <msuchanek at suse.de> [2021-04-23 09:35:51]:
> > > 
> > > > On Thu, Apr 22, 2021 at 08:37:29PM +0530, Gautham R. Shenoy wrote:
> > > > > From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
> > > > > 
> > > > > Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> > > > > CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> > > > > of the Extended CEDE states advertised by the platform
> > > > > 
> > > > > On some of the POWER9 LPARs, the older firmwares advertise a very low
> > > > > value of 2us for CEDE1 exit latency on a Dedicated LPAR. However the
> > > > Can you be more specific about 'older firmwares'?
> > > 
> > > Hi Michal,
> > > 
> > > This is POWER9 vs POWER10 difference, not really an obsolete FW.  The
> > > key idea behind the original patch was to make the H_CEDE latency and
> > > hence target residency come from firmware instead of being decided by
> > > the kernel.  The advantage is such that, different type of systems in
> > > POWER10 generation can adjust this value and have an optimal H_CEDE
> > > entry criteria which balances good single thread performance and
> > > wakeup latency.  Further we can have additional H_CEDE state to feed
> > > into the cpuidle.  
> > 
> > So all POWER9 machines are affected by the firmware bug where firmware
> > reports CEDE1 exit latency of 2us and the real latency is 5us which
> > causes the kernel to prefer CEDE1 too much when relying on the values
> > supplied by the firmware. It is not about 'older firmware'.
> 
> Correct.  All POWER9 systems running Linux as guest LPARs will see
> extra usage of CEDE idle state, but not baremetal (PowerNV).
> 
> The correct definition of the bug or miss-match in expectation is that
> firmware reports wakeup latency from a core/thread wakeup timing, but
> not end-to-end time from sending a wakeup event like an IPI using
> H_calls and receiving the events on the target.  Practically there are
> few extra micro-seconds needed after deciding to wakeup a target
> core/thread to getting the target to start executing instructions
> within the LPAR instance.

Thanks for the detailed explanation.

Maybe just adding a few microseconds to the reported time would be a
more reasonable workaround than using a blanket fixed value then.

> 
> > I still think it would be preferrable to adjust the latency value
> > reported by the firmware to match reality over a kernel workaround.
> 
> Right, practically we can fix for future releases and as such we
> targeted this scheme from POWER10 but expected no harm on POWER9 which
> proved to be wrong.
> 
> We can possibly change this FW value for POWER9, but it is too
> expensive and not practical because many release streams exist for
> different platforms and further customers are at different streams as
> well.  We cannot force all of them to update because that blows up
> co-dependency matrix.

>From the user point of view only few firmware release streams exist but
what is packaged in such binaries might be another story.

Thanks

Michal


More information about the Linuxppc-dev mailing list