[PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

Thu Aug 6 08:31:32 AEST 2020

Quoting Michael Roth (2020-08-05 17:29:28)
> Quoting Michael Roth (2020-08-04 23:37:32)
> > Quoting Michael Ellerman (2020-08-04 22:07:08)
> > > Greg Kurz <groug at kaod.org> writes:
> > > > On Tue, 04 Aug 2020 23:35:10 +1000
> > > > Michael Ellerman <mpe at ellerman.id.au> wrote:
> > > >> Spinning forever seems like a bad idea, but as has been demonstrated at
> > > >> least twice now, continuing when we don't know the state of the other
> > > >> CPU can lead to straight up crashes.
> > > >> 
> > > >> So I think I'm persuaded that it's preferable to have the kernel stuck
> > > >> spinning rather than oopsing.
> > > >> 
> > > >
> > > > +1
> > > >
> > > >> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> > > >> first instinct is no, if we're stuck here for 20s a stack trace would be
> > > >> good. But then we will probably hit that on some big and/or heavily
> > > >> loaded machine.
> > > >> 
> > > >> So possibly we should call cond_resched() but have some custom logic in
> > > >> the loop to print a warning if we are stuck for more than some
> > > >> sufficiently long amount of time.
> > > >
> > > > How long should that be ?
> > > 
> > > Yeah good question.
> > > 
> > > I guess step one would be seeing how long it can take on the 384 vcpu
> > > machine. And we can probably test on some other big machines.
> > > 
> > > Hopefully Nathan can give us some idea of how long he's seen it take on
> > > large systems? I know he was concerned about the 20s timeout of the
> > > softlockup detector.
> > > 
> > > Maybe a minute or two?
> > 
> > Hmm, so I took a stab at this where I called cond_resched() after
> > every 5 seconds of polling and printed a warning at the same time (FWIW
> > that doesn't seem to trigger any warnings on a loaded 96-core mihawk
> > system using KVM running the 384vcpu unplug loop)
> > 
> > But it sounds like that's not quite what you had in mind. How frequently
> > do you think we should call cond_resched()? Maybe after 25 iterations
> > of polling smp_query_cpu_stopped() to keep original behavior somewhat
> > similar?
> > 
> > I'll let the current patch run on the mihawk system overnight in the
> > meantime so we at least have that data point, but would be good to
> > know what things look like a large pHyp machine.
> 
> At one point I did manage to get the system in a state where unplug
> operations were taking 1-2s, but still not enough to trigger any
> 5s warning, and I wasn't able to reproduce that in subsequent runs.
> 
> I also tried reworking the patch so that we print a warning and
> cond_resched() after 200 ms to make sure that path gets executed, but
> only managed to trigger the warning twice after a few hours.
> 
> So, if we print a warning after a couple minutes, that seems pretty
> conservative as far as avoiding spurious warnings. And if we
> cond_resched() after 25 loops of polling (~0.1 ms in the cases

~0.1 seconds I mean

> that caused the original crash), that would avoid most of the
> default RCU/lockup warnings.
> 
> But having a second timeout to trigger the cond_resched() after some
> set interval like 2s seems more deterministic since we're less
> susceptible to longer delays due to things like the RTAS calls
> contending for QEMU's global mutex in the the KVM case.
> 
> 
> > 
> > Thanks!
> > 
> > > 
> > > >> > Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> > > >> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
> > > >> 
> > > >> This is not public.
> > > >
> > > > I'll have a look at changing that.
> > > 
> > > Thanks.
> > > 
> > > cheers