[PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity

Wed Oct 29 19:31:41 AEDT 2025

* Vincent Guittot <vincent.guittot at linaro.org> [2025-10-29 08:43:34]:

> Hi Srikar,
> 
> On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar at linux.ibm.com> wrote:
> >
> > In a shared LPAR with SMT enabled, it has been observed that when a CPU
> > experiences steal time, it can trigger task migrations between sibling
> > CPUs. The idle CPU pulls a runnable task from its sibling that is
> > impacted by steal, making the previously busy CPU go idle. This reversal
> 
> IIUC, the migration is triggered by the reduced capacity case when
> there is 1 task on the CPU

Thanks Vincent for taking a look at the change.

Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
idle. The vCPUs that are busy will start seeing steal time of around 33%
because they cant run completely on the Physical CPU. Without the change,
they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
ones will have their capacity intact. So when the scheduler switches the 3
tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
will see their capacity increase since their steal time reduces. Hence the
tasks will be migrated again.

> 
> > can repeat continuously, resulting in ping-pong behavior between SMT
> > siblings.
> 
> Does it mean that the vCPU generates its own steal time or is it
> because other vcpus are already running on the other CPU and they
> starts to steal time on the sibling vCPU

There are other vCPUs running and sharing the same Physical CPU, and hence
these vCPUs are seeing steal time.

> 
> >
> > To avoid migrations solely triggered by steal time, disable steal from
> > updating CPU capacity when running in shared processor mode.
> 
> You are disabling the steal time accounting only for your arch. Does
> it mean that only powerpc are impacted by this effect ?

On PowerVM, the hypervisor schedules at a core granularity. So in the above
scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
core. So even if 2 threads are running, they would be scheduled on 2 vCores
and hence we would start seeing 50% steal. So this steal accounting is more
predominant on Shared LPARs running on PowerVM.

However we can use this same mechanism on other architectures too since the 
framework is arch independent.

Does this clarify?

-- 
Thanks and Regards
Srikar Dronamraju