[V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
Michael Ellerman
mpe at ellerman.id.au
Thu Nov 4 16:55:07 AEDT 2021
Nathan Lynch <nathanl at linux.ibm.com> writes:
> Nicholas Piggin <npiggin at gmail.com> writes:
>> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>>> Nicholas Piggin <npiggin at gmail.com> writes:
>>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>> if (ret)
>>>>> return ret;
>>>>>
>>>>> + /* Disable PMU before suspend */
>>>>> + on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>>
>>>> Why was this moved out of stop machine and to an IPI?
>>>>
>>>> My concern would be, what are the other CPUs doing at this time? Is it
>>>> possible they could take interrupts and schedule? Could that mess up the
>>>> perf state here?
>>>
>>> pseries_migrate_partition() is called directly from migration_store(),
>>> which is the sysfs store function, which can be called concurrently by
>>> different CPUs.
>>>
>>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>>> from sys_rtas(), again with no locking.
>>>
>>> So we could have two CPUs calling into here at the same time, which
>>> might not crash, but is unlikely to work well.
>>>
>>> I think the lack of locking might have been OK in the past because only
>>> one CPU will successfully get the other CPUs to call do_join() in
>>> pseries_suspend(). But I could be wrong.
>>>
>>> Anyway, now that we're mutating the PMU state before suspending we need
>>> to be more careful. So I think we need a lock around the whole
>>> sequence.
>
> Regardless of the outcome here, generally agreed that some serialization
> should be imposed in this path. The way the platform works (and some
> extra measures by the drmgr utility) make it so that this code isn't
> entered concurrently in usual operation, but it's possible to make it
> happen if you are root.
Yeah I agree it's unlikely to be a problem in practice.
> A file-static mutex should be OK.
Ack.
>> My concern is still that we wouldn't necessarily have the other CPUs
>> under control at that point even if we serialize the migrate path.
>> They could take interrupts, possibly call into perf subsystem after
>> the mobility_pmu_disable (e.g., via syscall or context switch) which
>> might mess things up.
>>
>> I think the stop machine is a reasonable place for the code in this
>> case. It's a low level disabling of hardware facility and saving off
>> registers.
>
> That makes sense, but I can't help feeling concerned still. For this to
> be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
> re-enable interrupts or send IPIs. I don't see anything obviously unsafe
> right now, but is that already part of their contract? Is there much
> risk they could change in the future to violate those constraints?
>
> That aside, the proposed change seems like we would be hacking around a
> more generic perf/pmu limitation in a powerpc-specific way. I see the
> same behavior on x86 across suspend/resume.
>
> # perf stat -a -e cache-misses -I 1000 & sleep 2 ; systemctl suspend ; sleep 20 ; kill $(jobs -p)
> [1] 189806
> # time counts unit events
> 1.000501710 9,983,649 cache-misses
> 2.002620321 14,131,072 cache-misses
> 3.004579071 23,010,971 cache-misses
> 9.971854783 140,737,491,680,853 cache-misses
> 10.982669250 0 cache-misses
> 11.984660498 0 cache-misses
> 12.986648392 0 cache-misses
> 13.988561766 0 cache-misses
> 14.992670615 0 cache-misses
> 15.994938111 0 cache-misses
> 16.996703952 0 cache-misses
> 17.999092812 0 cache-misses
> 19.000602677 0 cache-misses
> 20.003272216 0 cache-misses
> 21.004770295 0 cache-misses
> # uname -r
> 5.13.19-100.fc33.x86_64
That is interesting.
Athira, I guess we should bring that to the perf maintainers and see if
there's any interest in solving the issue in a generic fashion.
cheers
More information about the Linuxppc-dev
mailing list