offlining cpus breakage

Michael Ellerman mpe at ellerman.id.au
Thu Jan 22 16:29:28 AEDT 2015


On Fri, 2015-01-16 at 14:40 +0530, Preeti U Murthy wrote:
> On 01/16/2015 02:26 PM, Preeti U Murthy wrote:
> > On 01/16/2015 08:34 AM, Michael Ellerman wrote:
> >> On Fri, 2015-01-16 at 13:28 +1300, Alexey Kardashevskiy wrote:
> >>> On 01/16/2015 02:22 AM, Preeti U Murthy wrote:
> >>>> Hi Alexey,
> >>>>
> >>>> Can you let me know if the following patch fixes the issue for you ?
> >>>> It did for us on one of our machines that we were investigating on.
> >>>
> >>> This fixes the issue for me as well, thanks!
> >>>
> >>> Tested-by: Alexey Kardashevskiy <aik at ozlabs.ru>	
> >>
> >> OK, that's great.
> >>
> >> But, I really don't think we can ask upstream to merge this patch to generic
> >> code when we don't have a good explanation for why it's necessary. At least I'm
> >> not going to ask anyone to do that :)
> >>
> >> So Pretti can you either write a 100% convincing explanation of why this patch
> >> is correct in the general case, or (preferably) do some more investigating to
> >> work out what Alexey's bug actually is.
> > 
> > Yes will do so. Its better to investigate where precisely is the bug.
> > This patch helped me narrow down on the buggy scenario.
> 
> On a side note, while I was tracking the race condition, I noticed that
> in the final stage of the cpu offline path, after the state of the
> hotplugged cpu is set to CPU_DEAD, we check if there were interrupts
> delivered during the soft disabled state and service them if there were.
> It makes sense to check for pending interrupts in the idle path. In the
> offline path however, this did not look right to me at first glance. Am
> I missing something ?

That does sound a bit fishy.

I guess we're just assuming that all interrupts have been migrated away prior
to the offline?

cheers




More information about the Linuxppc-dev mailing list