[PATCH] powerpc/pseries: Add pool idle time at LPAR boot

Shrikanth Hegde sshegde at linux.ibm.com
Sat Apr 6 04:02:24 AEDT 2024



On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde <sshegde at linux.ibm.com> writes:

Hi Nathan, Thanks for reviewing this.

>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
> 
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
> 

Ok.I reworded it for v2. 

yes APP is Available Processor Pool. 

> 
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> ====================== Shared LPAR ================================
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>> =====================================================================
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde at linux.ibm.com>
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>  	return rc;
>>  }
>>
>> +unsigned long boot_pool_idle_time;
> 
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.

Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
as well? Or is this applicable to only boot_pool_idle_time?

For example in parse_ppp_data: 

        if (lppaca_shared_proc()) {
                unsigned long pool_idle_time, pool_procs;

		h_pic(&pool_idle_time, &pool_procs);
                seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
                seq_printf(m, "pool_num_procs=%ld\n", pool_procs);

> 
>> +
>>  /*
>>   * parse_ppp_data
>>   * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>>  		h_pic(&pool_idle_time, &pool_procs);
>>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
> 
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
> 
>>  	}
>>
>>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>>  static int __init lparcfg_init(void)
>>  {
>>  	umode_t mode = 0444;
>> +	unsigned long num_procs;
>>
>>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>  		return -EIO;
>>  	}
>> +
>> +	h_pic(&boot_pool_idle_time, &num_procs);
> 
> h_pic() can fail, leaving the out parameters uninitialized.

Naveen pointed to me this a while ago, but I forgot. 

Currently h_pic return value is not checked at all, either at boor or at runtime. 
When it fails, should we re-try or just print a kernel debug? What would be expected 
behavior? because if it fails, it would anyway result in wrong values of app even 
if the variables are initialized to 0.

> 
>> +
>>  	return 0;
>>  }
>>  machine_device_initcall(pseries, lparcfg_init);
>> --
>> 2.39.3


More information about the Linuxppc-dev mailing list