[PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

Will Schmidt willschm at us.ibm.com
Fri Aug 20 04:58:13 EST 2010


Ankita Garg <ankita at in.ibm.com> wrote on 08/19/2010 10:58:24 AM:
> Subject
> Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with
> CONFIG_PREEMPT on pseries
>
> Hi Darren,
>
> On Thu, Jul 22, 2010 at 11:24:13AM -0700, Darren Hart wrote:
> >
> > With some instrumentation we were able to determine that the
> > preempt_count() appears to change across the extended_cede_processor()
> > call.  Specifically across the plpar_hcall_norets(H_CEDE) call. On
> > PREEMPT_RT we call this with preempt_count=1 and return with
> > preempt_count=0xffffffff. On mainline with CONFIG_PREEMPT=y, the value
> > is different (0x65) but is still incorrect.
>
> I was trying to reproduce this issue on a 2.6.33.7-rt29 kernel. I could
> easily reproduce this on the RT kernel and not the non-RT kernel.
> However, I hit it every single time I did a cpu online operation. I also
> noticed that the issue persists even when I disable H_CEDE by passing
> the "cede_offline=0" kernel commandline parameter. Could you pl confirm
> if you observe the same in your setup ?

If you see it every time, double-check that you have
http://patchwork.ozlabs.org/patch/60922/  or an equivalent in your tree.
(The
patch moves the preempt_enable_no_resched() call up above a call to cpu_die
().  An
earlier variation called a preempt_enable before calling
start_secondary_resume()).

Darren and I have been seeing different problems (different
dedicated-processor LPARS
on the same physical system).  He's seeing scrambled preempt_count values,
I'm tending
to see a system hang/death [with processor backtraces
showing .pseries_mach_cpu_die or
.pseries_dedicated_idle_sleep as expected, but no processes running] .

I'll be curious which you end up seeing.   With 2.6.33.7-rt27 as of a few
minutes ago,
I'm still seeing a system hang/death.



>
> However, the issue still remains. Will spend few cycles looking into
> this issue.
>
> >
> > Also of interest is that this path
> > cpu_idle()->cpu_die()->pseries_mach_cpu_die() to start_secondary()
> > enters with a preempt_count=1 if it wasn't corrupted across the hcall.
> > The early boot path from _start however appears to call
> > start_secondary() with a preempt_count of 0.
> >
> > The following patch is most certainly not correct, but it does
eliminate
> > the situation on mainline 100% of the time (there is still a 25%
> > reproduction rate on PREEMPT_RT). Can someone comment on:
> >
> > 1) How can the preempt_count() get mangled across the H_CEDE hcall?
> > 2) Should we call preempt_enable() in cpu_idle() prior to cpu_die() ?
> >
> > Hacked-up-by: Darren Hart <dvhltc at us.ibm.com>
> >
> > Index: linux-2.6.33.6/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > ===================================================================
> > --- linux-2.6.33.6.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ linux-2.6.33.6/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -138,6 +138,7 @@ static void pseries_mach_cpu_die(void)
> >            * Kernel stack will be reset and start_secondary()
> >            * will be called to continue the online operation.
> >            */
> > +         preempt_count() = 0;
> >           start_secondary_resume();
> >        }
> >     }
> >
> >
>
> --
> Regards,
> Ankita Garg (ankita at in.ibm.com)
> Linux Technology Center
> IBM India Systems & Technology Labs,
> Bangalore, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20100819/441294a8/attachment-0001.html>


More information about the Linuxppc-dev mailing list