[PATCH v2] cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards

Michael Ellerman mpe at ellerman.id.au
Thu Apr 29 19:56:25 AEST 2021


"Gautham R. Shenoy" <ego at linux.vnet.ibm.com> writes:
> From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
>
> Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> of the Extended CEDE states advertised by the platform
>
> On POWER9 LPARs, the firmwares advertise a very low value of 2us for
> CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
> PHYP hypervisor corresponds to the latency required to wakeup from the
> underlying hardware idle state. However the wakeup latency from the
> LPAR perspective should include
>
> 1. The time taken to transition the CPU from the Hypervisor into the
>    LPAR post wakeup from platform idle state
>
> 2. Time taken to send the IPI from the source CPU (waker) to the idle
>    target CPU (wakee).
>
> 1. can be measured via timer idle test, where we queue a timer, say
> for 1ms, and enter the CEDE state. When the timer fires, in the timer
> handler we compute how much extra timer over the expected 1ms have we
> consumed. On a a POWER9 LPAR the numbers are
>
> CEDE latency measured using a timer (numbers in ns)
> N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> 400     2601     5677     5668.74    5917    6413     9299   455.01
>
> 1. and 2. combined can be determined by an IPI latency test where we
> send an IPI to an idle CPU and in the handler compute the time
> difference between when the IPI was sent and when the handler ran. We
> see the following numbers on POWER9 LPAR.
>
> CEDE latency measured using an IPI (numbers in ns)
> N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
> 400     711      7564     7369.43   8559    9514      9698   1200.01
>
> Suppose, we consider the 99th percentile latency value measured using
> the IPI to be the wakeup latency, the value would be 9.5us This is in
> the ballpark of the default value of 10us.
>
> Hence, use the exit latency of CEDE(0) based on the latency values
> advertized by platform only from POWER10 onwards. The values
                                           ^^^^^^^
> advertized on POWER10 platforms is more realistic and informed by the
> latency measurements. For earlier platforms stick to the default value
> of 10us.

...

> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index a2b5c6f..7207467 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -419,7 +419,8 @@ static int pseries_idle_probe(void)
>  			cpuidle_state_table = shared_states;
>  			max_idle_state = ARRAY_SIZE(shared_states);
>  		} else {
> -			fixup_cede0_latency();
> +			if (pvr_version_is(PVR_POWER10))
> +				fixup_cede0_latency();

A PVR check like that tests for *only* Power10, not Power10 and onwards
as you say in the change log.

The other question is what should happen on a Power10 LPAR that's
running in Power9 compat mode. I assume in that case we *do* want to use
the firmware provided values, because they're tied to the underlying
CPU, not the compat mode?

In which case a check for !PVR_POWER9 would seem to achieve what we
want?

cheers


More information about the Linuxppc-dev mailing list