[V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active

Michael Ellerman mpe at ellerman.id.au
Sat Oct 30 00:15:37 AEDT 2021


Nicholas Piggin <npiggin at gmail.com> writes:
> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>>  ./perf stat -e r1001e -I 1000
>>            time             counts unit events
>>      1.001010437         22,137,414      r1001e
>>      2.002495447         15,455,821      r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>         after migration is completed.
>> <<>>
>>     86.142535370    129,392,333,440      r1001e
>>     87.144714617                  0      r1001e
>>     88.146526636                  0      r1001e
>>     89.148085029                  0      r1001e
>
> This is the output without the patch? After the patch it keeps counting 
> I suppose? And does the very large count go away too?
>
>> 
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>> 
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the migrate is done to enable
>> back the PMU counters.
>> 
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
>> during updation of event->count since we always accumulate
>> (event->hw.prev_count - PMC value) in event->count.  If
>> event->hw.prev_count is greater PMC value, event->count becomes
>> negative. To fix this, 'prev_count' also needs to be re-initialised
>> for all events while enabling back the events. Hence read the
>> existing events and clear the PMC index (stored in event->hw.idx)
>> for all events im mobility_pmu_disable. By this way, event count
>> settings will get re-initialised correctly in power_pmu_enable.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev at linux.vnet.ibm.com>
>> [ Fixed compilation error reported by kernel test robot ]
>> Reported-by: kernel test robot <lkp at intel.com>
>> ---
>> Changelog:
>> Change from v2 -> v3:
>> Addressed review comments from Nicholas Piggin.
>>  - Removed the "migrate" field which was added in initial
>>    patch to address updation of event count settings correctly
>>    in power_pmu_enable. Instead read off existing events in
>>    mobility_pmu_disable before power_pmu_enable.
>>  - Moved the mobility_pmu_disable/enable declaration from
>>    rtas.h to perf event header file.
>> 
>> Addressed review comments from Nathan.
>>  - Moved the mobility function calls from stop_machine
>>    context out to pseries_migrate_partition. Also now this
>>    is a per cpu invocation.
>> 
>> Change from v1 -> v2:
>>  - Moved the mobility_pmu_enable and mobility_pmu_disable
>>    declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>>    Also included 'asm/rtas.h' in core-book3s to fix the
>>    compilation warning reported by kernel test robot.
>> 
>>  arch/powerpc/include/asm/perf_event.h     |  8 +++++
>>  arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
>>  arch/powerpc/platforms/pseries/mobility.c |  7 ++++
>>  3 files changed, 54 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
>> index 164e910bf654..88aab6cf840c 100644
>> --- a/arch/powerpc/include/asm/perf_event.h
>> +++ b/arch/powerpc/include/asm/perf_event.h
>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>>  static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>>  #endif
>>  
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void *unused);
>> +void mobility_pmu_enable(void *unused);
>> +#else
>> +static inline void mobility_pmu_disable(void *unused) { }
>> +static inline void mobility_pmu_enable(void *unused) { }
>> +#endif
>> +
>>  #ifdef CONFIG_FSL_EMB_PERF_EVENT
>>  #include <asm/perf_event_fsl_emb.h>
>>  #endif
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 73e62e9b179b..2e8c4c668fa3 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>>  	local_irq_restore(flags);
>>  }
>>  
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * before migration, from powerpc/mobility code.
>> + */

These are only needed if pseries is built, so should be inside a PSERIES
ifdef.

This function should handle iterating over CPUs, that shouldn't be left
up to the mobility.c code.

And the names should be something like pmu_start_migration(),
pmu_finish_migration().

>> +void mobility_pmu_disable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +	struct perf_event *event;
>> +
>> +	if (cpuhw->n_events != 0) {
>> +		int i;
>> +
>> +		power_pmu_disable(NULL);
>> +		/*
>> +		 * Read off any pre-existing events because the register
>> +		 * values may not be migrated.
>> +		 */
>> +		for (i = 0; i < cpuhw->n_events; ++i) {
>> +			event = cpuhw->event[i];
>> +			if (event->hw.idx) {
>> +				power_pmu_read(event);
>> +				event->hw.idx = 0;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>>  /*
>>   * Re-enable all events if disable == 0.
>>   * If we were previously disabled and events were added, then
>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>>  	local_irq_restore(flags);
>>  }
>>  
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * after migration, from powerpc/mobility code.
>> + */
>> +void mobility_pmu_enable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	cpuhw->n_added = cpuhw->n_events;
>> +	power_pmu_enable(NULL);
>> +}
>> +
>>  static int collect_events(struct perf_event *group, int max_count,
>>  			  struct perf_event *ctrs[], u64 *events,
>>  			  unsigned int *flags)
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e0891272d..3e96485ccba4 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/delay.h>
>>  #include <linux/slab.h>
>>  #include <linux/stringify.h>
>> +#include <linux/perf_event.h>
>>  
>>  #include <asm/machdep.h>
>>  #include <asm/rtas.h>
>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>  	if (ret)
>>  		return ret;
>>  
>> +	/* Disable PMU before suspend */
>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>
> Why was this moved out of stop machine and to an IPI?
>
> My concern would be, what are the other CPUs doing at this time? Is it 
> possible they could take interrupts and schedule? Could that mess up the
> perf state here?

pseries_migrate_partition() is called directly from migration_store(),
which is the sysfs store function, which can be called concurrently by
different CPUs.

It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
from sys_rtas(), again with no locking.

So we could have two CPUs calling into here at the same time, which
might not crash, but is unlikely to work well.

I think the lack of locking might have been OK in the past because only
one CPU will successfully get the other CPUs to call do_join() in
pseries_suspend(). But I could be wrong.

Anyway, now that we're mutating the PMU state before suspending we need
to be more careful. So I think we need a lock around the whole sequence.

cheers


More information about the Linuxppc-dev mailing list