[PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

Nicholas Piggin npiggin at gmail.com
Tue Sep 20 13:54:23 AEST 2022


On Mon Sep 19, 2022 at 11:51 PM AEST, Nathan Lynch wrote:
> Nathan Lynch <nathanl at linux.ibm.com> writes:
> > "Nicholas Piggin" <npiggin at gmail.com> writes:
> >> On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >>> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >>> > Leonardo Brás <leobras.c at gmail.com> writes:
> >>> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> >>> > > > Leonardo Brás <leobras.c at gmail.com> writes:
> >>> > > > > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> >>> > > > > > At the time this was submitted by Leonardo, I confirmed -- or thought
> >>> > > > > > I had confirmed -- with PowerVM partition firmware development that
> >>> > > > > > the following RTAS functions:
> >>> > > > > > 
> >>> > > > > > - ibm,get-xive
> >>> > > > > > - ibm,int-off
> >>> > > > > > - ibm,int-on
> >>> > > > > > - ibm,set-xive
> >>> > > > > > 
> >>> > > > > > were safe to call on multiple CPUs simultaneously, not only with
> >>> > > > > > respect to themselves as indicated by PAPR, but with arbitrary other
> >>> > > > > > RTAS calls:
> >>> > > > > > 
> >>> > > > > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> >>> > > > > > 
> >>> > > > > > Recent discussion with firmware development makes it clear that this
> >>> > > > > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> >>> > > > > > Implement reentrant rtas call") is unsafe, likely explaining several
> >>> > > > > > strange bugs we've seen in internal testing involving DLPAR and
> >>> > > > > > LPM. These scenarios use ibm,configure-connector, whose internal state
> >>> > > > > > can be corrupted by the concurrent use of the "reentrant" functions,
> >>> > > > > > leading to symptoms like endless busy statuses from RTAS.
> >>> > > > > 
> >>> > > > > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> >>> > > > 
> >>> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >>> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >>> > > > property described in the spec applies only to the individual RTAS
> >>> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> >>> > > > simultaneously, but it must adhere to the more general requirement to
> >>> > > > serialize with other RTAS functions.
> >>> > > > 
> >>> > > 
> >>> > > I see. Thanks for explaining that part!
> >>> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >>> > > previously thought.
> >>> > > 
> >>> > > OTOH, I think that instead of reverting the change, we could make use of the
> >>> > > correct information and fix the current implementation. (This could help when we
> >>> > > do the same rtas call in multiple cpus)
> >>> > 
> >>> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >>> > that. I'm not seeing the need.
> >>> > 
> >>> > > I have an idea of a patch to fix this. 
> >>> > > Do you think it would be ok if I sent that, to prospect being an alternative to
> >>> > > this reversion?
> >>> > 
> >>> > It is my preference, and I believe it is more common, to revert to the
> >>> > well-understood prior state, imperfect as it may be. The revert can be
> >>> > backported to -stable and distros while development and review of
> >>> > another approach proceeds.
> >>>
> >>> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>>
> >>> FWIW:
> >>> Reviewed-by: Leonardo Bras <leobras.c at gmail.com>
> >>
> >> A shame. I guess a reader/writer lock would not be much help because
> >> the crash is probably more likely to hit longer running rtas calls?
> >>
> >> Alternative is just cheat and do this...?
>
> [...]
>
> >
> > I wonder - would it be worth making the panic path use a separate
> > "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> > RTAS at panic time, then leaving rtas.args untouched might make the
> > ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> > incur on the panic path more likely to succeed.

Yeah I think that's probably not a bad idea. Not sure if you've got the
bandwidth to take on doing the patch but be my guest if you do :)
Otherwise we can file it in github issues.

> Regardless, I request that we proceed with the revert while the crash
> path hardening gets sorted out. If I understand the motivation behind
> commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call"),
> then it looks like it was incomplete anyway? rtas_os_term() still takes
> the lock when calling ibm,os-term.

Yeah agree a simple revert should be done first.

Thanks,
Nick


More information about the Linuxppc-dev mailing list