[PATCH] powerpc/pseries: Add pool idle time at LPAR boot
Shrikanth Hegde
sshegde at linux.ibm.com
Sat Apr 6 04:02:24 AEDT 2024
On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde <sshegde at linux.ibm.com> writes:
Hi Nathan, Thanks for reviewing this.
>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
>
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
>
Ok.I reworded it for v2.
yes APP is Available Processor Pool.
>
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> ====================== Shared LPAR ================================
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> ====== 6.9-rc1 and lparstat 1.3.10 =============
>> %user %sys %wait %idle physc %entc lbusy app vcsw phint
>> ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
>> 47.48 0.01 0.00 52.51 0.00 0.00 47.49 69099.72 541547 21
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user %sys %wait %idle physc %entc lbusy app vcsw phint
>> ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
>> 47.48 0.01 0.00 52.51 5.73 47.75 47.49 31.21 541753 21
>> =====================================================================
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde at linux.ibm.com>
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>> return rc;
>> }
>>
>> +unsigned long boot_pool_idle_time;
>
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.
Current usage of h_pic doesn't follow this either. Are you suggesting we change that
as well? Or is this applicable to only boot_pool_idle_time?
For example in parse_ppp_data:
if (lppaca_shared_proc()) {
unsigned long pool_idle_time, pool_procs;
h_pic(&pool_idle_time, &pool_procs);
seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>
>> +
>> /*
>> * parse_ppp_data
>> * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>> h_pic(&pool_idle_time, &pool_procs);
>> seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>> seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> + seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
>
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
>
>> }
>>
>> seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>> static int __init lparcfg_init(void)
>> {
>> umode_t mode = 0444;
>> + unsigned long num_procs;
>>
>> /* Allow writing if we have FW_FEATURE_SPLPAR */
>> if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>> printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>> return -EIO;
>> }
>> +
>> + h_pic(&boot_pool_idle_time, &num_procs);
>
> h_pic() can fail, leaving the out parameters uninitialized.
Naveen pointed to me this a while ago, but I forgot.
Currently h_pic return value is not checked at all, either at boor or at runtime.
When it fails, should we re-try or just print a kernel debug? What would be expected
behavior? because if it fails, it would anyway result in wrong values of app even
if the variables are initialized to 0.
>
>> +
>> return 0;
>> }
>> machine_device_initcall(pseries, lparcfg_init);
>> --
>> 2.39.3
More information about the Linuxppc-dev
mailing list