[PATCH v2] pseries: prevent free CPU ids to be reused on another node

Sat Apr 3 03:44:32 AEDT 2021

Laurent Dufour <ldufour at linux.ibm.com> writes:
> Le 02/04/2021 à 15:34, Nathan Lynch a écrit :
>> Laurent Dufour <ldufour at linux.ibm.com> writes:
>>> When a CPU is hot added, the CPU ids are taken from the available mask from
>>> the lower possible set. If that set of values was previously used for CPU
>>> attached to a different node, this seems to application like if these CPUs
>>> have migrated from a node to another one which is not expected in real
>>> life.
>> 
>> This seems like a problem that could affect other architectures or
>> platforms? I guess as long as arch code is responsible for placing new
>> CPUs in cpu_present_mask, that code will have the responsibility of
>> ensuring CPU IDs' NUMA assignments remain stable.
>
> Actually, x86 is already handling this issue in the arch code specific
> code, see 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for
> cpuid <-> apicid mapping"). I didn't check for other architectures but
> as CPU id allocation is in the arch part, I believe this is up to each
> arch to deal with this issue.
>
> Making the CPU id allocation common to all arch is outside the scope
> of this patch.

Well... we'd better avoid a situation where architectures impose
different policies in this area. (I guess we're already in this
situation, which is the problem.) A more maintainable way to achieve
that would be to put the higher-level policy in arch-independent code,
as much as possible.

I don't insist, though.

>> I don't know, should we not fail the request instead of doing the
>> ABI-breaking thing the code in this change is trying to prevent? I
>> don't think a warning in the kernel log is going to help any
>> application that would be affected by this.
>
> That's a really good question. One should argue that the most
> important is to satisfy the CPU add operation, assuming that only few
> are interested in the CPU numbering, while others would prefer the CPU
> adding to fail (which may prevent adding CPUs on another nodes if the
> whole operation is aborted as soon as a CPU add is failing).
>
> I was conservative here, but if failing the operation is the best
> option, then this will make that code simpler, removing the backward
> jump.
>
> Who is deciding?

I favor failing the request. Apart from the implications for user space,
it's not clear to me that allowing the cpu-node relationship to change
once initialized is benign in terms of internal kernel assumptions
(e.g. sched domains, workqueues?). And as you say, it would make for
more straightforward code.