[PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

Madhavan Srinivasan maddy at linux.vnet.ibm.com
Tue May 9 16:10:10 AEST 2017



On Monday 08 May 2017 07:42 PM, Daniel Axtens wrote:
> Hi all,
>
> I've had a look at the API as it was a big thing I didn't like in the
> earlier version.
>
> I am much happier with this one.

Thanks to mpe for suggesting this. :)

>
> Some comments:
>
>   - I'm no longer subscribed to skiboot but I've had a look at the
>     patches on that side:

Thanks alot for the review comments.

>
>      * in patch 9 should opal_imc_counters_init return something other
>        than OPAL_SUCCESS in the case on invalid arguments? Maybe
>        OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

So, init call will return OPAL_PARAMETER for the unsupported
domains (core and nest are supported). And if the init operation
fails for any reason, it would return OPAL_HARDWARE. And this is
documented.

>
>      * in start/stop, should there be some sort of write barrier to make
>        sure the cb->imc_chip_command actually gets written out to memory
>        at the time we expect?

In the current implementation we make the opal call in the
*_event_stop and *_event_start function. But we wanted to
move opal call to the corresponding *_event_init(), so this
avoid a opal call on each _event_start and _event_stop to
this pmu. With this change, we may not need the barrier.

Maddy

>
> The rest of my comments are in line.
>
>> Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
>> online CPU) from each chip for nest PMUs is designated to read counters.
>>
>> On CPU hotplug, dying CPU is checked to see whether it is one of the
>> designated cpus, if yes, next online cpu from the same chip (for nest
>> units) is designated as new cpu to read counters. For this purpose, we
>> introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.
>>
>> Signed-off-by: Anju T Sudhakar <anju at linux.vnet.ibm.com>
>> Signed-off-by: Hemant Kumar <hemant at linux.vnet.ibm.com>
>> Signed-off-by: Madhavan Srinivasan <maddy at linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/imc-pmu.h             |   4 +
>>   arch/powerpc/include/asm/opal-api.h            |  12 +-
>>   arch/powerpc/include/asm/opal.h                |   4 +
>>   arch/powerpc/perf/imc-pmu.c                    | 248 ++++++++++++++++++++++++-
>>   arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
>>   include/linux/cpuhotplug.h                     |   1 +
> Who owns this? get_maintainer.pl doesn't give me anything helpful
> here... Do we need an Ack from anyone?
>
>>   6 files changed, 266 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
>> index 6bbe184..1478d0f 100644
>> --- a/arch/powerpc/include/asm/imc-pmu.h
>> +++ b/arch/powerpc/include/asm/imc-pmu.h
>> @@ -92,6 +92,10 @@ struct imc_pmu {
>>   #define IMC_DOMAIN_NEST		1
>>   #define IMC_DOMAIN_UNKNOWN	-1
>>   
>> +#define IMC_COUNTER_ENABLE	1
>> +#define IMC_COUNTER_DISABLE	0
> I'm not sure these constants are particularly useful any more, but I'll
> have more to say on that later.
>
>> +
>> +
>>   extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>>   extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index a0aa285..ce863d9 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -168,7 +168,10 @@
>>   #define OPAL_INT_SET_MFRR			125
>>   #define OPAL_PCI_TCE_KILL			126
>>   #define OPAL_NMMU_SET_PTCR			127
>> -#define OPAL_LAST				127
>> +#define OPAL_IMC_COUNTERS_INIT			149
>> +#define OPAL_IMC_COUNTERS_START			150
>> +#define OPAL_IMC_COUNTERS_STOP			151
> Yay, this is heaps better!
>
>> +#define OPAL_LAST				151
>>   
>>   /* Device tree flags */
>>   
>> @@ -928,6 +931,13 @@ enum {
>>   	OPAL_PCI_TCE_KILL_ALL,
>>   };
>>   
>> +/* Argument to OPAL_IMC_COUNTERS_*  */
>> +enum {
>> +	OPAL_IMC_COUNTERS_NEST = 1,
>> +	OPAL_IMC_COUNTERS_CORE = 2,
>> +	OPAL_IMC_COUNTERS_THREAD = 3,
>> +};
>> +
>>   #endif /* __ASSEMBLY__ */
>>   
>>   #endif /* __OPAL_API_H */
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 1ff03a6..9c16ec6 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
>>   			  uint64_t dma_addr, uint32_t npages);
>>   int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
>>   
>> +int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
> This isn't called anywhere in this patch... including (worryingly) in
> the init function...
>
>> +int64_t opal_imc_counters_start(uint32_t type);
>> +int64_t opal_imc_counters_stop(uint32_t type);
>> +
>>   /* Internal functions */
>>   extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>   				   int depth, void *data);
>> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
>> index f09a37a..40792424 100644
>> --- a/arch/powerpc/perf/imc-pmu.c
>> +++ b/arch/powerpc/perf/imc-pmu.c
>> @@ -18,6 +18,11 @@
>>   
>>   struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>> +static cpumask_t nest_imc_cpumask;
>> +
>> +static atomic_t nest_events;
>> +/* Used to avoid races in calling enable/disable nest-pmu units*/
> You need a space here between s and * ----------------------------^
>
>> +static DEFINE_MUTEX(imc_nest_reserve);
>>   
>>   /* Needed for sanity check */
>>   extern u64 nest_max_offset;
>> @@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
>>   	.attrs = imc_format_attrs,
>>   };
>>   
>> +/* Get the cpumask printed to a buffer "buf" */
>> +static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
>> +					struct device_attribute *attr,
>> +					char *buf)
>> +{
>> +	cpumask_t *active_mask;
>> +
>> +	active_mask = &nest_imc_cpumask;
>> +	return cpumap_print_to_pagebuf(true, buf, active_mask);
>> +}
>> +
>> +static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
>> +
>> +static struct attribute *imc_pmu_cpumask_attrs[] = {
>> +	&dev_attr_cpumask.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group imc_pmu_cpumask_attr_group = {
>> +	.attrs = imc_pmu_cpumask_attrs,
>> +};
>> +
>> +/*
>> + * nest_init : Initializes the nest imc engine for the current chip.
>> + * by default the nest engine is disabled.
>> + */
>> +static void nest_init(int *cpu_opal_rc)
>> +{
>> +	int rc;
>> +
>> +	/*
>> +	 * OPAL figures out which CPU to start based on the CPU that is
>> +	 * currently running when we call into OPAL
>> +	 */
>> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> Why isn't this the init call? If this is correct, a comment explaning it
> would be helpful.
>
>> +	if (rc)
>> +		cpu_opal_rc[smp_processor_id()] = 1;
>> +}
>> +
>
>> +static int nest_imc_control(int operation)
>> +{
>> +	int *cpus_opal_rc, cpu;
>> +
>> +	/*
>> +	 * Memory for OPAL call return value.
>> +	 */
>> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
>> +	if (!cpus_opal_rc)
>> +		return -ENOMEM;
>> +	switch (operation) {
>> +
>> +	case	IMC_COUNTER_ENABLE:
>> +			/* Initialize Nest PMUs in each node using designated cpus */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	case	IMC_COUNTER_DISABLE:
>> +			/* Disable the counters */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	default: return -EINVAL;
>> +
>> +	}
>> +
>> +	/* Check return value array for any OPAL call failure */
>> +	for_each_cpu(cpu, &nest_imc_cpumask) {
>> +		if (cpus_opal_rc[cpu])
>> +			return -ENODEV;
>> +	}
>> +	return 0;
>> +}
> Two things:
>
>   - It doesn't look like you're freeing cpus_opal_rc anywhere - have I
>     missed it?
>
>   - Would it be better to split this function into two: so instead of
>     passing in `operation`, you just have a nest_imc_enable and
>     nest_imc_disable? All the call sites I can see call this with a
>     constant parameter anyway. Perhaps it could even be refactored into
>     nest_imc_event_start/stop and this method could be removed
>     entirely...
>
>     (I haven't checked if you use this in future patches or if it gets
>     expanded and makes sense to keep the function this way.)
>
>> +
>>   static void imc_event_start(struct perf_event *event, int flags)
>>   {
>>   	/*
>> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>>   	imc_perf_event_update(event);
>>   }
>>   
>> -/*
>> - * The wrapper function is provided here, since we will have reserve
>> - * and release lock for imc_event_start() in the following patch.
>> - * Same in case of imc_event_stop().
>> - */
>>   static void nest_imc_event_start(struct perf_event *event, int flags)
>>   {
>> +	int rc;
>> +
>> +	/*
>> +	 * Nest pmu units are enabled only when it is used.
>> +	 * See if this is triggered for the first time.
>> +	 * If yes, take the mutex lock and enable the nest counters.
>> +	 * If not, just increment the count in nest_events.
>> +	 */
>> +	if (atomic_inc_return(&nest_events) == 1) {
>> +		mutex_lock(&imc_nest_reserve);
>> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
>> +		mutex_unlock(&imc_nest_reserve);
>> +		if (rc)
>> +			pr_err("IMC: Unbale to start the counters\n");
> Spelling: s/Unbale/Unable/ ----------^
>
>> +	}
>>   	imc_event_start(event, flags);
>>   }
>>
> Overall I'm much happer with this now, good work :)
>
> Regards,
> Daniel
>



More information about the Linuxppc-dev mailing list