[PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

Nathan Lynch nathanl at linux.ibm.com
Sat Sep 10 00:04:56 AEST 2022


Hi Leonardo,

(restoring the list to the cc, hope that's ok)

Leonardo Brás <leobras.c at gmail.com> writes:
> On Wed, 2022-09-07 at 17:01 -0500, Nathan Lynch wrote:
>> At the time this was submitted by Leonardo, I confirmed -- or thought
>> I had confirmed -- with PowerVM partition firmware development that
>> the following RTAS functions:
>> 
>> - ibm,get-xive
>> - ibm,int-off
>> - ibm,int-on
>> - ibm,set-xive
>> 
>> were safe to call on multiple CPUs simultaneously, not only with
>> respect to themselves as indicated by PAPR, but with arbitrary other
>> RTAS calls:
>> 
>> https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/
>> 
>> Recent discussion with firmware development makes it clear that this
>> is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
>> Implement reentrant rtas call") is unsafe, likely explaining several
>> strange bugs we've seen in internal testing involving DLPAR and
>> LPM. These scenarios use ibm,configure-connector, whose internal state
>> can be corrupted by the concurrent use of the "reentrant" functions,
>> leading to symptoms like endless busy statuses from RTAS.
>
> Oh, does not it means PowerVM is not compliant to the PAPR specs?

No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") change is incorrect. The "reentrant"
property described in the spec applies only to the individual RTAS
functions. The OS can invoke (for example) ibm,set-xive on multiple CPUs
simultaneously, but it must adhere to the more general requirement to
serialize with other RTAS functions.

I don't claim that this is a useful property, at least not for
Linux. Maybe other OSes are better able to exploit it.

I apologize for my role in the confusion here. When reviewing the
original change, I talked to firmware development about the reentrant
property and how we wanted to use it. I've since reviewed that
conversation and concluded that I didn't communicate clearly, and that I
interpreted their responses too eagerly.

In the future, when we (pseries Linux developers at IBM) want to go
beyond the language of the spec, we probably should initiate an
architecture change to make the PAPR eventually align with our
implementation choices.

> I mentioned this in the original Commit, and it's still true to the last LoPAR:
>
> ###
> For "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
>
> On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> items 2 and 3 say:
>
> 2 - For the PowerPC External Interrupt option: The * call must be
> reentrant to the number of processors on the platform.
> 3 - For the PowerPC External Interrupt option: The * argument call
> buffer for each simultaneous call must be physically unique.
>
> ### 
>
> Other than that, IIRC, this change was used to avoid issues that were happening
> on kdump/kexec: 
> If another thread was holding the rtas lock, kdump/kexec would get stuck waiting
> for the lock forever and never finish the process, often causing a bug
> reproduction's kdump to not get collected. 
>
> Is there any other fix for the above bug? 

Not that I'm aware of, but if this is a common failure mode for kdump,
then perhaps rtas_call() or related code should be made to ignore the
lock in the crash path, as a more general mitigation.

Then again, maybe it's not that urgent? Only XICS mode guests are
potentially affected. Do we even have this problem with
firmware-assisted dumps on PowerVM?

> Or is that a bug which is acceptable to have back, compared to the new
> one?

It was not acceptable to regress existing functions (DLPAR, LPM) in
exchange for making the crash path more robust. Reverting the change is
the correct tradeoff, unfortunately.


More information about the Linuxppc-dev mailing list