[PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

Leonardo Brás leobras.c at gmail.com
Tue Sep 13 01:22:03 AEST 2022


On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:
> Hi Leonardo,
> 
> (restoring the list to the cc, hope that's ok)
> 

Sure, thanks for doing that. 
I probably had some mail composer issue here.

> Leonardo Brás <leobras.c at gmail.com> writes:
> > On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
> > > At the time this was submitted by Leonardo, I confirmed -- or thought
> > > I had confirmed -- with PowerVM partition firmware development that
> > > the following RTAS functions:
> > > 
> > > - ibm,get-xive
> > > - ibm,int-off
> > > - ibm,int-on
> > > - ibm,set-xive
> > > 
> > > were safe to call on multiple CPUs simultaneously, not only with
> > > respect to themselves as indicated by PAPR, but with arbitrary other
> > > RTAS calls:
> > > 
> > > https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
> > > 
> > > Recent discussion with firmware development makes it clear that this
> > > is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
> > > Implement reentrant rtas call") is unsafe, likely explaining several
> > > strange bugs we've seen in internal testing involving DLPAR and
> > > LPM. These scenarios use ibm,configure-connector, whose internal state
> > > can be corrupted by the concurrent use of the "reentrant" functions,
> > > leading to symptoms like endless busy statuses from RTAS.
> > 
> > Oh, does not it means PowerVM is not compliant to the PAPR specs?
> 
> No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> Implement reentrant rtas call") change is incorrect. The "reentrant"
> property described in the spec applies only to the individual RTAS
> functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
> simultaneously, but it must adhere to the more general requirement to
> serialize with other RTAS functions.
> 

I see. Thanks for explaining that part!
I agree: reentrant calls that way don't look as useful on Linux than I
previously thought.

OTOH, I think that instead of reverting the change, we could make use of the
correct information and fix the current implementation. (This could help when we
do the same rtas call in multiple cpus)

I have an idea of a patch to fix this. 
Do you think it would be ok if I sent that, to prospect being an alternative to
this reversion?


> I don't claim that this is a useful property, at least not for
> Linux. Maybe other OSes are better able to exploit it.
> 
> I apologize for my role in the confusion here. When reviewing the
> original change, I talked to firmware development about the reentrant
> property and how we wanted to use it. I've since reviewed that
> conversation and concluded that I didn't communicate clearly, and that I
> interpreted their responses too eagerly.

No problem. Thanks for even talking to firmware people during original change!


> 
> In the future, when we (pseries Linux developers at IBM) want to go
> beyond the language of the spec, we probably should initiate an
> architecture change to make the PAPR eventually align with our
> implementation choices.

Yeah, it makes sense.

> 
> > I mentioned this in the original Commit, and it's still true to the last LoPAR:
> > 
> > ###
> > For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
> > 
> > On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> > items 2 and 3 say:
> > 
> > 2 - For the PowerPC External Interrupt option: The * call must be
> > reentrant to the number of processors on the platform.
> > 3 - For the PowerPC External Interrupt option: The * argument call
> > buffer for each simultaneous call must be physically unique.
> > 
> > ### 
> > 
> > Other than that, IIRC, this change was used to avoid issues that were happening
> > on kdump/kexec: 
> > If another thread was holding the rtas lock, kdump/kexec would get stuck waiting
> > for the lock forever and never finish the process, often causing a bug
> > reproduction's kdump to not get collected. 
> > 
> > Is there any other fix for the above bug? 
> 
> Not that I'm aware of, but if this is a common failure mode for kdump,
> then perhaps rtas_call() or related code should be made to ignore the
> lock in the crash path, as a more general mitigation.

It was the first idea at the time (bust the rtas lock), but I thought it could
raise some issues. Reentrant rtas calls seemed like a better solution at the
time.

> 
> Then again, maybe it's not that urgent? Only XICS mode guests are
> potentially affected. Do we even have this problem with
> firmware-assisted dumps on PowerVM?

It's been a lot of time, I don't recall all the details on that.

> 
> > Or is that a bug which is acceptable to have back, compared to the new
> > one?
> 
> It was not acceptable to regress existing functions (DLPAR, LPM) in
> exchange for making the crash path more robust. Reverting the change is
> the correct tradeoff, unfortunately.

I was talking about the kdump bug being acceptable, which by the previous
comment it is, or can be solved in other ways (busting rtas lock, as previously
mentioned).

Thanks Nathan!

Best regars, 
Leo


More information about the Linuxppc-dev mailing list