[PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

Nicholas Piggin npiggin at gmail.com
Thu Oct 21 21:54:59 AEDT 2021


Excerpts from Athira Rajeev's message of July 11, 2021 10:25 pm:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
> 
> Example:
>  ./perf stat -e r1001e -I 1000
>            time             counts unit events
>      1.001010437         22,137,414      r1001e
>      2.002495447         15,455,821      r1001e
> <<>> As seen in next below logs, the counter values shows zero
>         after migration is completed.
> <<>>
>     86.142535370    129,392,333,440      r1001e
>     87.144714617                  0      r1001e
>     88.146526636                  0      r1001e
>     89.148085029                  0      r1001e
> 
> Here PMU is enabled during start of perf session and counter
> values are read at intervals. Counters are only disabled at the
> end of session. The powerpc mobility code presently does not handle
> disabling and enabling back of PMU counters during partition
> migration. Also since the PMU register values are not saved/restored
> during migration, PMU registers like Monitor Mode Control Register 0
> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
> the value it was programmed with. Hence PMU counters will not be
> enabled correctly post migration.
> 
> Fix this in mobility code by handling disabling and enabling of
> PMU in all cpu's before and after migration. Patch introduces two
> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
> mobility_pmu_disable() is called before the processor threads goes
> to suspend state so as to disable the PMU counters. And disable is
> done only if there are any active events running on that cpu.
> mobility_pmu_enable() is called after the processor threads are
> back online to enable back the PMU counters.
> 
> Since the performance Monitor counters ( PMCs) are not
> saved/restored during LPM, results in PMC value being zero and the
> 'event->hw.prev_count' being non-zero value. This causes problem

Interesting. Are they defined to not be migrated, or may not be 
migrated?

I wonder what QEMU migration does with PMU registers.

> during updation of event->count since we always accumulate
> (event->hw.prev_count - PMC value) in event->count.  If
> event->hw.prev_count is greater PMC value, event->count becomes
> negative. Fix this by re-initialising 'prev_count' also for all
> events while enabling back the events. A new variable 'migrate' is
> introduced in 'struct cpu_hw_event' to achieve this for LPM cases
> in power_pmu_enable. Use the 'migrate' value to clear the PMC
> index (stored in event->hw.idx) for all events so that event
> count settings will get re-initialised correctly.
> 
> Signed-off-by: Athira Rajeev <atrajeev at linux.vnet.ibm.com>
> [ Fixed compilation error reported by kernel test robot ]
> Reported-by: kernel test robot <lkp at intel.com>
> ---
> Change from v1 -> v2:
>  - Moved the mobility_pmu_enable and mobility_pmu_disable
>    declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>    Also included 'asm/rtas.h' in core-book3s to fix the
>    compilation warning reported by kernel test robot.
> 
>  arch/powerpc/include/asm/rtas.h           |  8 ++++++
>  arch/powerpc/perf/core-book3s.c           | 44 ++++++++++++++++++++++++++++---
>  arch/powerpc/platforms/pseries/mobility.c |  4 +++
>  3 files changed, 53 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 9dc97d2..cea72d7 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
>  static inline void read_24x7_sys_info(void) { }
>  #endif
>  
> +#ifdef CONFIG_PPC_PERF_CTRS
> +void mobility_pmu_disable(void);
> +void mobility_pmu_enable(void);
> +#else
> +static inline void mobility_pmu_disable(void) { }
> +static inline void mobility_pmu_enable(void) { }
> +#endif
> +
>  #endif /* __KERNEL__ */
>  #endif /* _POWERPC_RTAS_H */

It's not implemented in rtas, maybe consider putting this into a perf 
header?

> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index bb0ee71..90da7fa 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -18,6 +18,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/code-patching.h>
>  #include <asm/interrupt.h>
> +#include <asm/rtas.h>
>  
>  #ifdef CONFIG_PPC64
>  #include "internal.h"
> @@ -58,6 +59,7 @@ struct cpu_hw_events {
>  
>  	/* Store the PMC values */
>  	unsigned long pmcs[MAX_HWEVENTS];
> +	int migrate;
>  };
>  
>  static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
> @@ -1335,6 +1337,22 @@ static void power_pmu_disable(struct pmu *pmu)
>  }
>  
>  /*
> + * Called from powerpc mobility code
> + * before migration to disable counters
> + * if the PMU is active.
> + */
> +void mobility_pmu_disable(void)
> +{
> +	struct cpu_hw_events *cpuhw;
> +
> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	if (cpuhw->n_events != 0) {
> +		power_pmu_disable(NULL);
> +		cpuhw->migrate = 1;
> +	}

Rather than add this migrate logic into power_pmu_enable(), would you be 
able to do the work in the mobility callbacks instead? Something like
this:

void mobility_pmu_disable(void)
{
        struct cpu_hw_events *cpuhw;

        cpuhw = this_cpu_ptr(&cpu_hw_events);
        if (cpuhw->n_events != 0) {
                int i;

                power_pmu_disable(NULL);
                /*
                 * Read off any pre-existing events because the register
                 * values may not be migrated.
                 */
                for (i = 0; i < cpuhw->n_events; ++i) {
                        event = cpuhw->event[i];
                        if (event->hw.idx) {
                                power_pmu_read(event);
                                event->hw.idx = 0;
                        }
                }
        }
}

void mobility_pmu_enable(void)
{
        struct cpu_hw_events *cpuhw;
 
        cpuhw = this_cpu_ptr(&cpu_hw_events);
        cpuhw->n_added = cpuhw->n_events;
        power_pmu_enable(NULL);
}

> +}
> +
> +/*
>   * Re-enable all events if disable == 0.
>   * If we were previously disabled and events were added, then
>   * put the new config on the PMU.
> @@ -1379,8 +1397,10 @@ static void power_pmu_enable(struct pmu *pmu)
>  	 * no need to recalculate MMCR* settings and reset the PMCs.
>  	 * Just reenable the PMU with the current MMCR* settings
>  	 * (possibly updated for removal of events).
> +	 * While reenabling PMU during partition migration, continue
> +	 * with normal flow.
>  	 */
> -	if (!cpuhw->n_added) {
> +	if (!cpuhw->n_added && !cpuhw->migrate) {
>  		mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
>  		mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
>  		if (ppmu->flags & PPMU_ARCH_31)
> @@ -1434,11 +1454,15 @@ static void power_pmu_enable(struct pmu *pmu)
>  	/*
>  	 * Read off any pre-existing events that need to move
>  	 * to another PMC.
> +	 * While enabling PMU during partition migration,
> +	 * skip power_pmu_read since all event count settings
> +	 * needs to be re-initialised after migration.
>  	 */
>  	for (i = 0; i < cpuhw->n_events; ++i) {
>  		event = cpuhw->event[i];
> -		if (event->hw.idx && event->hw.idx != hwc_index[i] + 1) {
> -			power_pmu_read(event);
> +		if ((event->hw.idx && event->hw.idx != hwc_index[i] + 1) || (cpuhw->migrate)) {
> +			if (!cpuhw->migrate)
> +				power_pmu_read(event);
>  			write_pmc(event->hw.idx, 0);
>  			event->hw.idx = 0;
>  		}
> @@ -1506,6 +1530,20 @@ static void power_pmu_enable(struct pmu *pmu)
>  	local_irq_restore(flags);
>  }
>  
> +/*
> + * Called from powerpc mobility code
> + * during migration completion to
> + * enable back PMU counters.
> + */
> +void mobility_pmu_enable(void)
> +{
> +	struct cpu_hw_events *cpuhw;
> +
> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	power_pmu_enable(NULL);
> +	cpuhw->migrate = 0;
> +}
> +
>  static int collect_events(struct perf_event *group, int max_count,
>  			  struct perf_event *ctrs[], u64 *events,
>  			  unsigned int *flags)
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e089..ff7a77c 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -476,6 +476,8 @@ static int do_join(void *arg)
>  retry:
>  	/* Must ensure MSR.EE off for H_JOIN. */
>  	hard_irq_disable();
> +	/* Disable PMU before suspend */
> +	mobility_pmu_disable();
>  	hvrc = plpar_hcall_norets(H_JOIN);
>  
>  	switch (hvrc) {

Does the retry case cause mobility_pmu_disable to be called twice 
without mobility_pmu_enable called? Would be neater if it was
balanced.


> @@ -530,6 +532,8 @@ static int do_join(void *arg)
>  	 * reset the watchdog.
>  	 */
>  	touch_nmi_watchdog();
> +	/* Enable PMU after resuming */
> +	mobility_pmu_enable();
>  	return ret;
>  }

Not really a big deal but while you're updating it, you could leave 
touch_nmi_watchdog() at the very end.

Thanks,
Nick


More information about the Linuxppc-dev mailing list