[PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active

Nathan Lynch nathanl at linux.ibm.com
Fri Oct 22 04:17:36 AEDT 2021


Athira Rajeev <atrajeev at linux.vnet.ibm.com> writes:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
>
> Example:
>  ./perf stat -e r1001e -I 1000
>            time             counts unit events
>      1.001010437         22,137,414      r1001e
>      2.002495447         15,455,821      r1001e
> <<>> As seen in next below logs, the counter values shows zero
>         after migration is completed.
> <<>>
>     86.142535370    129,392,333,440      r1001e
>     87.144714617                  0      r1001e
>     88.146526636                  0      r1001e
>     89.148085029                  0      r1001e

Confirmed in my environment:

    51.099987985            300,338      cache-misses
    52.101839374            296,586      cache-misses
    53.116089796            263,150      cache-misses
    54.117949249            232,290      cache-misses
    55.602029375     68,700,421,711      cache-misses
    56.610073969                  0      cache-misses
    57.614732000                  0      cache-misses

I wonder what it means that there is a very unlikely huge value before
the counter stops working -- I believe your example has this phenomenon
too.


> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e089..ff7a77c 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -476,6 +476,8 @@ static int do_join(void *arg)
>  retry:
>  	/* Must ensure MSR.EE off for H_JOIN. */
>  	hard_irq_disable();
> +	/* Disable PMU before suspend */
> +	mobility_pmu_disable();
>  	hvrc = plpar_hcall_norets(H_JOIN);
>  
>  	switch (hvrc) {
> @@ -530,6 +532,8 @@ static int do_join(void *arg)
>  	 * reset the watchdog.
>  	 */
>  	touch_nmi_watchdog();
> +	/* Enable PMU after resuming */
> +	mobility_pmu_enable();
>  	return ret;
>  }

We should minimize calls into other subsystems from this context (the
callback function we've passed to stop_machine); it's fairly sensitive.
Can this be moved out to pseries_migrate_partition() or similar?


More information about the Linuxppc-dev mailing list