[Skiboot] [PATCH] core/fast-reboot.c: Add sreset opal call

Tue Nov 22 12:29:27 AEDT 2016

On Tue, 22 Nov 2016 10:59:58 +1100
Alistair Popple <alistair at popple.id.au> wrote:

> On Mon, 21 Nov 2016 07:01:42 PM Nicholas Piggin wrote:
> 
> <snip>
> 
> > > 
> > > Currently active threads on the currently executing core cannot be
> > > sreset as a thread cannot ram other threads on the same core. This
> > > means the caller will need to reset these threads to make the call
> > > from a different core.  
> > 
> > Hi Alistair,
> > 
> > Great stuff. Do you see any way to lift this restriction in future,
> > or do we need to make this part of the API? We should be able to make
> > the Linux powernv platform code have NMI'ed cores bounce the NMI
> > back to our sibling threads without too much work or changes to the
> > platform independent NMI code.  
> 
> The restriction comes from the hardware, which means one way or another we 
> need to bounce around to different cores to do a complete reset, either in 
> skiboot or Linux.
> 
> It would be possible to do it in skiboot by ramming one of the other cores to 
> a trampoline in skiboot which would then go and reset the remaining cores 
> before returning them to Linux, however this would create a couple of 
> complications (eg. when all of the cores are sleeping) so I think it would be 
> simpler to deal with this restriction in Linux.

That sounds reasonable.

> > So this should be fine, but we should ensure the API has a way to
> > communicate this type of failure (that requires an NMI bounce from
> > another core, as opposed to some other failure). Did you have any
> > thoughts there?  
> 
> Right. Currently the call will return OPAL_PARTIAL when it couldn't reset all 
> of the requested cores which I admit may be somewhat vague. We could I suppose 
> have an API where we ask OPAL to reset all the cores and it returns a list of 
> the ones which were successfully reset. This would be slightly more efficient 
> than resetting one core at a time as we wouldn't need to continually quiesce 
> other threads but I'd be interested in your thoughts here as well.

I'm not sure what's going to work best here. Unicast would make it easy for
Linux to build the list.

So perhaps if the platform does not support true broadcast, then a broadcast
request can just fail without resetting any CPUs, and the caller can go to
one at a time.

Then if we always proceed by signaling sibling threads first, we can detect
hardware/firmware that requires NMI bounce before any NMIs actually get sent,
which makes it a bit easier for Linux side to set up bouncing.

Thanks,
Nick