[PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active
Nathan Lynch
nathanl at linux.ibm.com
Fri Oct 22 04:17:36 AEDT 2021
Athira Rajeev <atrajeev at linux.vnet.ibm.com> writes:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
>
> Example:
> ./perf stat -e r1001e -I 1000
> time counts unit events
> 1.001010437 22,137,414 r1001e
> 2.002495447 15,455,821 r1001e
> <<>> As seen in next below logs, the counter values shows zero
> after migration is completed.
> <<>>
> 86.142535370 129,392,333,440 r1001e
> 87.144714617 0 r1001e
> 88.146526636 0 r1001e
> 89.148085029 0 r1001e
Confirmed in my environment:
51.099987985 300,338 cache-misses
52.101839374 296,586 cache-misses
53.116089796 263,150 cache-misses
54.117949249 232,290 cache-misses
55.602029375 68,700,421,711 cache-misses
56.610073969 0 cache-misses
57.614732000 0 cache-misses
I wonder what it means that there is a very unlikely huge value before
the counter stops working -- I believe your example has this phenomenon
too.
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e089..ff7a77c 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -476,6 +476,8 @@ static int do_join(void *arg)
> retry:
> /* Must ensure MSR.EE off for H_JOIN. */
> hard_irq_disable();
> + /* Disable PMU before suspend */
> + mobility_pmu_disable();
> hvrc = plpar_hcall_norets(H_JOIN);
>
> switch (hvrc) {
> @@ -530,6 +532,8 @@ static int do_join(void *arg)
> * reset the watchdog.
> */
> touch_nmi_watchdog();
> + /* Enable PMU after resuming */
> + mobility_pmu_enable();
> return ret;
> }
We should minimize calls into other subsystems from this context (the
callback function we've passed to stop_machine); it's fairly sensitive.
Can this be moved out to pseries_migrate_partition() or similar?
More information about the Linuxppc-dev
mailing list