[PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
Srikar Dronamraju
srikar at linux.ibm.com
Thu Nov 6 16:22:39 AEDT 2025
* Vincent Guittot <vincent.guittot at linaro.org> [2025-11-03 09:46:26]:
> On Wed, 29 Oct 2025 at 09:32, Srikar Dronamraju <srikar at linux.ibm.com> wrote:
> > * Vincent Guittot <vincent.guittot at linaro.org> [2025-10-29 08:43:34]:
> > > On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar at linux.ibm.com> wrote:
> > > >
> > > IIUC, the migration is triggered by the reduced capacity case when
> > > there is 1 task on the CPU
> >
> > Thanks Vincent for taking a look at the change.
> >
> > Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
> > cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
> > idle. The vCPUs that are busy will start seeing steal time of around 33%
> > because they cant run completely on the Physical CPU. Without the change,
> > they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
> > ones will have their capacity intact. So when the scheduler switches the 3
> > tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
> > and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
> > will see their capacity increase since their steal time reduces. Hence the
> > tasks will be migrated again.
>
> Thanks for the details
> This is probably even more visible when vcpu are not pinned to separate cpu
If workload runs on vCPUs pinned to CPUs belonging to the same core, then
yes, steal may be less visible. However if workload were to run unpinned or
were to run on vCPUs pinned to CPUs belonging to different cores, then its
more visible.
> > >
> > > > can repeat continuously, resulting in ping-pong behavior between SMT
> > > > siblings.
> > >
> > > Does it mean that the vCPU generates its own steal time or is it
> > > because other vcpus are already running on the other CPU and they
> > > starts to steal time on the sibling vCPU
> >
> > There are other vCPUs running and sharing the same Physical CPU, and hence
> > these vCPUs are seeing steal time.
> >
> > >
> > > >
> > > > To avoid migrations solely triggered by steal time, disable steal from
> > > > updating CPU capacity when running in shared processor mode.
> > >
> > > You are disabling the steal time accounting only for your arch. Does
> > > it mean that only powerpc are impacted by this effect ?
> >
> > On PowerVM, the hypervisor schedules at a core granularity. So in the above
> > scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
> > core. So even if 2 threads are running, they would be scheduled on 2 vCores
> > and hence we would start seeing 50% steal. So this steal accounting is more
> > predominant on Shared LPARs running on PowerVM.
> >
> > However we can use this same mechanism on other architectures too since the
> > framework is arch independent.
> >
> > Does this clarify?
>
> yes, thanks
> I see 2 problems in your use case, the idle cpu doesn't have steal
> time even if the host cpu on which it will run, is already busy with
> other things
> and with not pinned vcpu, we can't estimate what will be the steal
> time on the target host
> And I don't see a simple way other than disabling steal time
>
Yes, neither we can have steal time for an idle sibling nor can we estimate
the steal time for the target CPU. Thanks for acknowledging the problem.
--
Thanks and Regards
Srikar Dronamraju
More information about the Linuxppc-dev
mailing list