[V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
Michael Ellerman
mpe at ellerman.id.au
Sat Oct 30 00:15:37 AEDT 2021
Nicholas Piggin <npiggin at gmail.com> writes:
> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>>
>> Example:
>> ./perf stat -e r1001e -I 1000
>> time counts unit events
>> 1.001010437 22,137,414 r1001e
>> 2.002495447 15,455,821 r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>> after migration is completed.
>> <<>>
>> 86.142535370 129,392,333,440 r1001e
>> 87.144714617 0 r1001e
>> 88.146526636 0 r1001e
>> 89.148085029 0 r1001e
>
> This is the output without the patch? After the patch it keeps counting
> I suppose? And does the very large count go away too?
>
>>
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>>
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the migrate is done to enable
>> back the PMU counters.
>>
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
>> during updation of event->count since we always accumulate
>> (event->hw.prev_count - PMC value) in event->count. If
>> event->hw.prev_count is greater PMC value, event->count becomes
>> negative. To fix this, 'prev_count' also needs to be re-initialised
>> for all events while enabling back the events. Hence read the
>> existing events and clear the PMC index (stored in event->hw.idx)
>> for all events im mobility_pmu_disable. By this way, event count
>> settings will get re-initialised correctly in power_pmu_enable.
>>
>> Signed-off-by: Athira Rajeev <atrajeev at linux.vnet.ibm.com>
>> [ Fixed compilation error reported by kernel test robot ]
>> Reported-by: kernel test robot <lkp at intel.com>
>> ---
>> Changelog:
>> Change from v2 -> v3:
>> Addressed review comments from Nicholas Piggin.
>> - Removed the "migrate" field which was added in initial
>> patch to address updation of event count settings correctly
>> in power_pmu_enable. Instead read off existing events in
>> mobility_pmu_disable before power_pmu_enable.
>> - Moved the mobility_pmu_disable/enable declaration from
>> rtas.h to perf event header file.
>>
>> Addressed review comments from Nathan.
>> - Moved the mobility function calls from stop_machine
>> context out to pseries_migrate_partition. Also now this
>> is a per cpu invocation.
>>
>> Change from v1 -> v2:
>> - Moved the mobility_pmu_enable and mobility_pmu_disable
>> declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>> Also included 'asm/rtas.h' in core-book3s to fix the
>> compilation warning reported by kernel test robot.
>>
>> arch/powerpc/include/asm/perf_event.h | 8 +++++
>> arch/powerpc/perf/core-book3s.c | 39 +++++++++++++++++++++++
>> arch/powerpc/platforms/pseries/mobility.c | 7 ++++
>> 3 files changed, 54 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
>> index 164e910bf654..88aab6cf840c 100644
>> --- a/arch/powerpc/include/asm/perf_event.h
>> +++ b/arch/powerpc/include/asm/perf_event.h
>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>> static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>> #endif
>>
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void *unused);
>> +void mobility_pmu_enable(void *unused);
>> +#else
>> +static inline void mobility_pmu_disable(void *unused) { }
>> +static inline void mobility_pmu_enable(void *unused) { }
>> +#endif
>> +
>> #ifdef CONFIG_FSL_EMB_PERF_EVENT
>> #include <asm/perf_event_fsl_emb.h>
>> #endif
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 73e62e9b179b..2e8c4c668fa3 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>> local_irq_restore(flags);
>> }
>>
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * before migration, from powerpc/mobility code.
>> + */
These are only needed if pseries is built, so should be inside a PSERIES
ifdef.
This function should handle iterating over CPUs, that shouldn't be left
up to the mobility.c code.
And the names should be something like pmu_start_migration(),
pmu_finish_migration().
>> +void mobility_pmu_disable(void *unused)
>> +{
>> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> + struct perf_event *event;
>> +
>> + if (cpuhw->n_events != 0) {
>> + int i;
>> +
>> + power_pmu_disable(NULL);
>> + /*
>> + * Read off any pre-existing events because the register
>> + * values may not be migrated.
>> + */
>> + for (i = 0; i < cpuhw->n_events; ++i) {
>> + event = cpuhw->event[i];
>> + if (event->hw.idx) {
>> + power_pmu_read(event);
>> + event->hw.idx = 0;
>> + }
>> + }
>> + }
>> +}
>> +
>> /*
>> * Re-enable all events if disable == 0.
>> * If we were previously disabled and events were added, then
>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>> local_irq_restore(flags);
>> }
>>
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * after migration, from powerpc/mobility code.
>> + */
>> +void mobility_pmu_enable(void *unused)
>> +{
>> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +
>> + cpuhw->n_added = cpuhw->n_events;
>> + power_pmu_enable(NULL);
>> +}
>> +
>> static int collect_events(struct perf_event *group, int max_count,
>> struct perf_event *ctrs[], u64 *events,
>> unsigned int *flags)
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e0891272d..3e96485ccba4 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -22,6 +22,7 @@
>> #include <linux/delay.h>
>> #include <linux/slab.h>
>> #include <linux/stringify.h>
>> +#include <linux/perf_event.h>
>>
>> #include <asm/machdep.h>
>> #include <asm/rtas.h>
>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>> if (ret)
>> return ret;
>>
>> + /* Disable PMU before suspend */
>> + on_each_cpu(&mobility_pmu_disable, NULL, 0);
>
> Why was this moved out of stop machine and to an IPI?
>
> My concern would be, what are the other CPUs doing at this time? Is it
> possible they could take interrupts and schedule? Could that mess up the
> perf state here?
pseries_migrate_partition() is called directly from migration_store(),
which is the sysfs store function, which can be called concurrently by
different CPUs.
It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
from sys_rtas(), again with no locking.
So we could have two CPUs calling into here at the same time, which
might not crash, but is unlikely to work well.
I think the lack of locking might have been OK in the past because only
one CPU will successfully get the other CPUs to call do_join() in
pseries_suspend(). But I could be wrong.
Anyway, now that we're mutating the PMU state before suspending we need
to be more careful. So I think we need a lock around the whole sequence.
cheers
More information about the Linuxppc-dev
mailing list