[PATCH] powerpc/pseries: Add pool idle time at LPAR boot
Nathan Lynch
nathanl at linux.ibm.com
Sat Apr 6 07:03:18 AEDT 2024
Shrikanth Hegde <sshegde at linux.ibm.com> writes:
> On 4/5/24 6:19 PM, Nathan Lynch wrote:
>> Shrikanth Hegde <sshegde at linux.ibm.com> writes:
>
> Hi Nathan, Thanks for reviewing this.
>
>>> When there are no options specified for lparstat, it is expected to
>>> give reports since LPAR(Logical Partition) boot. App is an indicator
>>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>>> derived using pool_idle_time which is obtained using H_PIC call.
>>
>> If "App" is short for "Available Procesoor Pool" then it should be
>> written "APP" and the it should be introduced and defined more clearly
>> than this.
>>
>
> Ok.I reworded it for v2.
>
> yes APP is Available Processor Pool.
>
>>
>>> The interval based reports show correct App value while since boot
>>> report shows very high App values. This happens because in that case app
>>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>>> time is reported by the PowerVM hypervisor since its boot, it need not
>>> align with LPAR boot. This leads to large App values.
>>>
>>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>>> use this info to derive App as below for since boot reports.
>>>
>>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>>
>>> Results:: Observe app values.
>>> ====================== Shared LPAR ================================
>>> lparstat
>>> System Configuration
>>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>>
>>> reboot
>>> stress-ng --cpu=$(nproc) -t 600
>>> sleep 600
>>> So in this case app is expected to close to 37-6=31.
>>>
>>> ====== 6.9-rc1 and lparstat 1.3.10 =============
>>> %user %sys %wait %idle physc %entc lbusy app vcsw phint
>>> ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
>>> 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21
>>>
>>> === With this patch and powerpc-utils patch to do the above equation ===
>>> %user %sys %wait %idle physc %entc lbusy app vcsw phint
>>> ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
>>> 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21
>>> =====================================================================
>>>
>>> Note: physc, purr/idle purr being inaccurate is being handled in a
>>> separate patch in powerpc-utils tree.
>>>
>>> Signed-off-by: Shrikanth Hegde <sshegde at linux.ibm.com>
>>> ---
>>> Note:
>>>
>>> This patch needs to merged first in the kernel for the powerpc-utils
>>> patches to work. powerpc-utils patches will be posted to its mailing
>>> list and link would be found in the reply to this patch if available.
>>>
>>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index f73c4d1c26af..8df4e7c529d7 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>> return rc;
>>> }
>>>
>>> +unsigned long boot_pool_idle_time;
>>
>> Should be static, and u64. Better to use explicitly sized types for data
>> at the kernel-hypervisor boundary.
>
> Current usage of h_pic doesn't follow this either. Are you suggesting we change that
> as well?
Yes pretty much. h_pic() as currently written and used has some
problems:
static unsigned h_pic(unsigned long *pool_idle_time,
unsigned long *num_procs)
{
unsigned long rc;
unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
rc = plpar_hcall(H_PIC, retbuf);
*pool_idle_time = retbuf[0];
*num_procs = retbuf[1];
return rc;
}
* Coerces the return value of plpar_hcall() to unsigned -- hcall
errors are negative.
* Assigns *pool_idle_time and *num_procs using uninitialized
data when H_PIC is unsuccessful.
* Assigns the outparams unconditionally; would be nicer if it allowed
callers to pass NULL so they don't have to provide dummy inputs that
aren't even used, as in your change.
* Should follow Linux -errno return value convention in the absence
of a need for the specific hcall status in its callers.
>>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>> printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>> return -EIO;
>>> }
>>> +
>>> + h_pic(&boot_pool_idle_time, &num_procs);
>>
>> h_pic() can fail, leaving the out parameters uninitialized.
>
> Naveen pointed to me this a while ago, but I forgot.
>
> Currently h_pic return value is not checked at all, either at boor or at runtime.
> When it fails, should we re-try or just print a kernel debug? What would be expected
> behavior? because if it fails, it would anyway result in wrong values of app even
> if the variables are initialized to 0.
There's nothing in the spec for H_PIC that suggests retrying on failure.
I'm not really in favor of printing a message about it either; that
practice tends to result in log noise in non-PowerVM guests.
When H_PIC doesn't work the corresponding lines should not be emitted in
/proc/powerpc/lparcfg IMO.
More information about the Linuxppc-dev
mailing list