[PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

Dongsheng.Wang at freescale.com Dongsheng.Wang at freescale.com
Tue Dec 23 17:53:27 AEDT 2014



> -----Original Message-----
> From: Michael Ellerman [mailto:mpe at ellerman.id.au]
> Sent: Tuesday, December 23, 2014 1:46 PM
> To: Wang Dongsheng-B40534
> Cc: benh at kernel.crashing.org; Wood Scott-B07421; anton at samba.org; linuxppc-
> dev at lists.ozlabs.org
> Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
> 
> On Tue, 2014-12-23 at 02:41 +0000, Dongsheng.Wang at freescale.com wrote:
> > > -----Original Message-----
> > > From: Michael Ellerman [mailto:mpe at ellerman.id.au]
> > > Sent: Tuesday, December 23, 2014 9:01 AM
> > > To: Wang Dongsheng-B40534
> > > Cc: benh at kernel.crashing.org; Wood Scott-B07421; anton at samba.org;
> > > linuxppc- dev at lists.ozlabs.org
> > > Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
> > >
> > > On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
> > > > From: Wang Dongsheng <dongsheng.wang at freescale.com>
> > > >
> > > > Kernel cannot bring up Non-boot cpus always get "Processor xx is stuck".
> > > > this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
> > > > Secondary CPUs must set cpu_callin_map after setting active and
> > > > online) We need to take timebase after bootup cpu give the timebase
> firstly.
> > > >
> > > > When start_secondary, non-boot cpus set cpu_callin_map for boot
> > > > cpu after that boot cpu will give the timebase for non-boot cpu.
> > > > Otherwise non-boot cpus will fall in dead loop to waiting bootup
> > > > cpu to give imebase.
> > >
> > > Right.
> > >
> > > However, doesn't this introduce the possibility that the secondary
> > > cpu is up and marked online but has an unsynchronised clock?
> 
> > Yes, right. But Freescale platform boot-cpu will freeze the TB until
> > secondary cpu take the time base, so the clock is synchronized.
> 
> It does the freeze in give_timebase() doesn't it?
> 
> So there's still a window there where the secondary is up & online but hasn't
> had it's timebase synchronised, and the primary hasn't frozen the timebase yet.
> So that makes me nervous.
> 
> > For generic PowerPC maybe has this issue. So for safe I think we need
> > to set cpu online after synchronized clock.
> >
> > I will update my patch if you agree this way.
> > +       if (smp_ops->take_timebase)
> > +               smp_ops->take_timebase();
> > +       secondary_cpu_time_init();
> > +
> > Move set_cpu_online to here.
> > +       set_cpu_online(cpu, true);
> 
> But that reverses the effect of the original patch, which was that we have to
> set online *before* we set the callin map.
> 
> 
> Looking harder at Anton's patch I'm not sure it's right anyway.
> 
> The issue he was trying to fix was that the cpu was online but not active, which
> confused the scheduler.
> 
> I think Anton missed that we have a loop that waits for online at the bottom of
> __cpu_up():
> 
> 	/* Wait until cpu puts itself in the online map */
> 	while (!cpu_online(cpu))
> 		cpu_relax();
> 
> 
> He must have seen a case where that popped due to the cpu being online, but the
> cpu wasn't yet active.
> 
> His patch fixed the problem by ensuring the previous loop that waits for
> cpu_callin_map doesn't finish until active & online are set, making the while
> loop above a nop.
> 
> So I think we should probably revert Anton's patch and instead change that while
> loop to:
> 
> 	/* Wait until cpu is online AND active */
> 	while (!cpu_online(cpu) || !cpu_active(cpu))
> 		cpu_relax();
> 
Umm.. Sorry about that...I forgot Anton's patch.

But set_cpu_online also set cpu_active_bits, I think this judgment cannot fix Anton's issue.
It is actually the effects of the original is the same.

Regrads,
-Dongsheng



More information about the Linuxppc-dev mailing list