[PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.

Dongsheng.Wang at freescale.com Dongsheng.Wang at freescale.com
Tue Dec 23 18:55:32 AEDT 2014



> -----Original Message-----
> From: Wang Dongsheng-B40534
> Sent: Tuesday, December 23, 2014 2:53 PM
> To: 'Michael Ellerman'
> Cc: benh at kernel.crashing.org; Wood Scott-B07421; anton at samba.org; linuxppc-
> dev at lists.ozlabs.org
> Subject: RE: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
> 
> 
> 
> > -----Original Message-----
> > From: Michael Ellerman [mailto:mpe at ellerman.id.au]
> > Sent: Tuesday, December 23, 2014 1:46 PM
> > To: Wang Dongsheng-B40534
> > Cc: benh at kernel.crashing.org; Wood Scott-B07421; anton at samba.org;
> > linuxppc- dev at lists.ozlabs.org
> > Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
> >
> > On Tue, 2014-12-23 at 02:41 +0000, Dongsheng.Wang at freescale.com wrote:
> > > > -----Original Message-----
> > > > From: Michael Ellerman [mailto:mpe at ellerman.id.au]
> > > > Sent: Tuesday, December 23, 2014 9:01 AM
> > > > To: Wang Dongsheng-B40534
> > > > Cc: benh at kernel.crashing.org; Wood Scott-B07421; anton at samba.org;
> > > > linuxppc- dev at lists.ozlabs.org
> > > > Subject: Re: [PATCH] powerpc/smp: Fix Non-boot cpus cannot be bring up.
> > > >
> > > > On Mon, 2014-12-22 at 14:38 +0800, Dongsheng Wang wrote:
> > > > > From: Wang Dongsheng <dongsheng.wang at freescale.com>
> > > > >
> > > > > Kernel cannot bring up Non-boot cpus always get "Processor xx is stuck".
> > > > > this issue bring by http://patchwork.ozlabs.org/patch/418912/ (powerpc:
> > > > > Secondary CPUs must set cpu_callin_map after setting active and
> > > > > online) We need to take timebase after bootup cpu give the
> > > > > timebase
> > firstly.
> > > > >
> > > > > When start_secondary, non-boot cpus set cpu_callin_map for boot
> > > > > cpu after that boot cpu will give the timebase for non-boot cpu.
> > > > > Otherwise non-boot cpus will fall in dead loop to waiting bootup
> > > > > cpu to give imebase.
> > > >
> > > > Right.
> > > >
> > > > However, doesn't this introduce the possibility that the secondary
> > > > cpu is up and marked online but has an unsynchronised clock?
> >
> > > Yes, right. But Freescale platform boot-cpu will freeze the TB until
> > > secondary cpu take the time base, so the clock is synchronized.
> >
> > It does the freeze in give_timebase() doesn't it?
> >
> > So there's still a window there where the secondary is up & online but
> > hasn't had it's timebase synchronised, and the primary hasn't frozen the
> timebase yet.
> > So that makes me nervous.
> >
> > > For generic PowerPC maybe has this issue. So for safe I think we
> > > need to set cpu online after synchronized clock.
> > >
> > > I will update my patch if you agree this way.
> > > +       if (smp_ops->take_timebase)
> > > +               smp_ops->take_timebase();
> > > +       secondary_cpu_time_init();
> > > +
> > > Move set_cpu_online to here.
> > > +       set_cpu_online(cpu, true);
> >
> > But that reverses the effect of the original patch, which was that we
> > have to set online *before* we set the callin map.
> >
> >
> > Looking harder at Anton's patch I'm not sure it's right anyway.
> >
> > The issue he was trying to fix was that the cpu was online but not
> > active, which confused the scheduler.
> >
> > I think Anton missed that we have a loop that waits for online at the
> > bottom of
> > __cpu_up():
> >
> > 	/* Wait until cpu puts itself in the online map */
> > 	while (!cpu_online(cpu))
> > 		cpu_relax();
> >
> >
> > He must have seen a case where that popped due to the cpu being
> > online, but the cpu wasn't yet active.
> >
> > His patch fixed the problem by ensuring the previous loop that waits
> > for cpu_callin_map doesn't finish until active & online are set,
> > making the while loop above a nop.
> >
> > So I think we should probably revert Anton's patch and instead change
> > that while loop to:
> >
> > 	/* Wait until cpu is online AND active */
> > 	while (!cpu_online(cpu) || !cpu_active(cpu))
> > 		cpu_relax();
> >

Base on Anton's patch, we should probably change __cup_up.
Please comment the changes.
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -528,12 +528,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
        }

        /*
-        * wait to see if the cpu made a callin (is actually up).
-        * use this value that I found through experimentation.
-        * -- Cort
+        * Wait until cpu puts itself in the online map
         */
        if (system_state < SYSTEM_RUNNING)
-               for (c = 50000; c && !cpu_callin_map[cpu]; c--)
+               for (c = 50000; c && !cpu_online(cpu); c--)
                        udelay(100);
 #ifdef CONFIG_HOTPLUG_CPU
        else
@@ -541,11 +539,10 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
                 * CPUs can take much longer to come up in the
                 * hotplug case.  Wait five seconds.
                 */
-               for (c = 5000; c && !cpu_callin_map[cpu]; c--)
+               for (c = 5000; c && !cpu_online(cpu); c--)
                        msleep(1);
 #endif
-
-       if (!cpu_callin_map[cpu]) {
+       if (!cpu_online(cpu)) {
                printk(KERN_ERR "Processor %u is stuck.\n", cpu);
                return -ENOENT;
        }
@@ -555,8 +552,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
        if (smp_ops->give_timebase)
                smp_ops->give_timebase();

-       /* Wait until cpu puts itself in the online map */
-       while (!cpu_online(cpu))
+       /* Wait until cpu synchronized clock */
+       while (!cpu_callin_map[cpu])
                cpu_relax();

        return 0;
@@ -703,10 +700,6 @@ void start_secondary(void *unused)

        if (smp_ops->setup_cpu)
                smp_ops->setup_cpu(cpu);
-       if (smp_ops->take_timebase)
-               smp_ops->take_timebase();
-
-       secondary_cpu_time_init();

 #ifdef CONFIG_PPC64
        if (system_state == SYSTEM_RUNNING)
@@ -738,6 +731,10 @@ void start_secondary(void *unused)
        notify_cpu_starting(cpu);
        set_cpu_online(cpu, true);

+       if (smp_ops->take_timebase)
+               smp_ops->take_timebase();
+
+       secondary_cpu_time_init();
        /*
         * CPU must be marked active and online before we signal back to the
         * master, because the scheduler needs to see the cpu_online and

Regards,
-Dongsheng

> Umm.. Sorry about that...I forgot Anton's patch.
> 
> But set_cpu_online also set cpu_active_bits, I think this judgment cannot fix
> Anton's issue.
> It is actually the effects of the original is the same.
> 
> Regrads,
> -Dongsheng



More information about the Linuxppc-dev mailing list