[PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities updates

Laurent Dufour ldufour at linux.ibm.com
Tue Oct 19 20:05:01 AEDT 2021


Le 19/10/2021 à 00:37, Tyrel Datwyler a écrit :
> On 10/18/21 9:34 AM, Nathan Lynch wrote:
>> On VMs with NX encryption, compression, and/or RNG offload, these
>> capabilities are described by nodes in the ibm,platform-facilities device
>> tree hierarchy:
>>
>>    $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>    /sys/firmware/devicetree/base/ibm,platform-facilities/
>>    ├── ibm,compression-v1
>>    ├── ibm,random-v1
>>    └── ibm,sym-encryption-v1
>>
>>    3 directories
>>
>> The acceleration functions that these nodes describe are not disrupted by
>> live migration, not even temporarily.
>>
>> But the post-migration ibm,update-nodes sequence firmware always sends
>> "delete" messages for this hierarchy, followed by an "add" directive to
>> reconstruct it via ibm,configure-connector (log with debugging statements
>> enabled in mobility.c):
>>
>>    mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>>    mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
>>    mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>    mobility: removing node /ibm,platform-facilities:4294967286
>>    ...
>>    mobility: added node /ibm,platform-facilities:4294967286
> 
> It always bothered me that the update from firmware here was an delete/add at
> the entire '/ibm,platform-facilities' node level instead of update properties
> calls. When I asked about it years ago the claim was it was just easier to pass
> an entire sub-tree as a single node add.
> 
>>
>> Note we receive a single "add" message for the entire hierarchy, and what
>> we receive from the ibm,configure-connector sequence is the top-level
>> platform-facilities node along with its three children. The debug message
>> simply reports the parent node and not the whole subtree.
>>
>> Also, significantly, the nodes added are almost completely equivalent to
>> the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
>> the leaf nodes is the only property I've observed to differ, and Linux does
>> not use that. So in practice, the sum of update messages Linux receives for
>> this hierarchy is equivalent to minor property updates.
> 
> The two props I would think maybe we would need to be most be concerned about
> ensuring don't change are "ibm,resource-id" which gets used in the vio bus code
> when configuring platform facilities nodes, and 'ibm,max-sync-cop' used by the
> pseries-nx driver.
> 
>>
>> We succeed in removing the original hierarchy from the device tree. But the
>> drivers bound to the leaf nodes are ignorant of this, and do not relinquish
>> their references. The leaf nodes, still reachable through sysfs, of course
>> still refer to the now-freed ibm,platform-facilities parent node, which
>> makes use-after-free possible:
> 
> It is actually more subtle then the drivers themselves being ignorant. They
> could register node update notifiers, but the real problem here is that the vio
> bus device itself takes a reference to each child node registered to the bus.
> I'm not sure we really want to unbind/rebind drivers as a result of LPM, but it
> would be generic to the vio bus instead of updating each driver to ensure its
> handling it device node references properly.
> 
>>
>>    refcount_t: addition on 0; use-after-free.
>>    WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
>>    refcount_warn_saturate+0x160/0x1f0 (unreliable)
>>    kobject_get+0xf0/0x100
>>    of_node_get+0x30/0x50
>>    of_get_parent+0x50/0xb0
>>    of_fwnode_get_parent+0x54/0x90
>>    fwnode_count_parents+0x50/0x150
>>    fwnode_full_name_string+0x30/0x110
>>    device_node_string+0x49c/0x790
>>    vsnprintf+0x1c0/0x4c0
>>    sprintf+0x44/0x60
>>    devspec_show+0x34/0x50
>>    dev_attr_show+0x40/0xa0
>>    sysfs_kf_seq_show+0xbc/0x200
>>    kernfs_seq_show+0x44/0x60
>>    seq_read_iter+0x2a4/0x740
>>    kernfs_fop_read_iter+0x254/0x2e0
>>    new_sync_read+0x120/0x190
>>    vfs_read+0x1d0/0x240
>>
>> Moreover, the "new" replacement subtree is not correctly added to the
>> device tree, resulting in ibm,platform-facilities parent node without the
>> appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
>>
>>    $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>    /sys/firmware/devicetree/base/ibm,platform-facilities/
>>
>>    0 directories
>>
>>    $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
>>    ./ibm,sym-encryption-v1/of_node: broken symbolic link to
>>      ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
>>    ./ibm,random-v1/of_node:         broken symbolic link to
>>      ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
>>    ./ibm,compression-v1/of_node:    broken symbolic link to
>>      ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
>>
>> This is because add_dt_node() -> dlpar_attach_node() attaches only the
>> parent node returned from configure-connector, ignoring any children. This
>> should be corrected for the general case, but fixing that won't help with
>> the stale OF node references, which is the more urgent problem.
> 
> I don't follow. If the code path you mention is truly broken in the way you say
> then DLPAR operations involving nodes with child nodes should also be broken.
> Last I had seen was that sysfs adds of the new nodes got renamed because the old
> nodes still existed as a result of there reference count not going to zero. Has
> this behavior changed, or am I misremembering the observed behavior?
> 
>>
>> One way to address that would be to make the drivers respond to node
>> removal notifications, so that node references can be dropped
>> appropriately. But this would likely force the drivers to disrupt active
>> clients for no useful purpose: equivalent nodes are immediately re-added.
>> And recall that the acceleration capabilities described by the nodes remain
>> available throughout the whole process.
> 
> See my comments above about its the vio bus more at fault here then the drivers
> themselves. I'm inclined to agree though that disrupting active operations with
> a driver unbind/rebind is a little extreme.
> 
> This also brings me back to firmware removing and re-adding the whole
> '/ibm,platform-facilities' node instead of simply updating changed properties
> could avoid this whole fiasco.
> 
>>
>> The solution I believe to be robust for this situation is to convert
>> remove+add of a node with an unchanged phandle to an update of the node's
>> properties in the Linux device tree structure. That would involve changing
>> and adding a fair amount of code, and may take several iterations to land.
>>
>> Until that can be realized we have a confirmed use-after-free and the
>> possibility of memory corruption. So add a limited workaround that
>> discriminates on the node type, ignoring adds and removes. This should be
>> amenable to backporting in the meantime.
> 
> The reality is that '/ibm,platform-facilities' and 'cache' nodes are the only
> LPM scoped device tree nodes that allow node delete/add. So, as a one-off
> workaround to deal with what I consider a bad firmware approach I think this is
> probably the best approach barring getting firmware to move to an update
> properties approach.

I do agree, this is probably the best option until the firmware is moving to an 
update notification.

> 
> An audit of the drivers is probably still a valid exercise to ensure any device
> tree props they care about they pick up a new value should it change.
> 
> -Tyrel
> 
>>
>> Signed-off-by: Nathan Lynch <nathanl at linux.ibm.com>
>> Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
>> Cc: stable at vger.kernel.org
>> ---
>>   arch/powerpc/platforms/pseries/mobility.c | 34 +++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e0891272d..210a37a065fb 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
>>   
>>   static int delete_dt_node(struct device_node *dn)
>>   {
>> +	struct device_node *pdn;
>> +	bool is_platfac;
>> +
>> +	pdn = of_get_parent(dn);
>> +	is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
>> +		     of_node_is_type(pdn, "ibm,platform-facilities");
>> +	of_node_put(pdn);
>> +
>> +	/*
>> +	 * The drivers that bind to nodes in the platform-facilities
>> +	 * hierarchy don't support node removal, and the removal directive
>> +	 * from firmware is always followed by an add of an equivalent
>> +	 * node. The capability (e.g. RNG, encryption, compression)
>> +	 * represented by the node is never interrupted by the migration.
>> +	 * So ignore changes to this part of the tree.
>> +	 */
>> +	if (is_platfac) {
>> +		pr_notice("ignoring remove operation for %pOFfp\n", dn);
>> +		return 0;
>> +	}
>> +
>>   	pr_debug("removing node %pOFfp\n", dn);
>>   	dlpar_detach_node(dn);
>>   	return 0;
>> @@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
>>   	if (!dn)
>>   		return -ENOENT;
>>   
>> +	/*
>> +	 * Since delete_dt_node() ignores this node type, this is the
>> +	 * necessary counterpart. We also know that a platform-facilities
>> +	 * node returned from dlpar_configure_connector() has children
>> +	 * attached, and dlpar_attach_node() only adds the parent, leaking
>> +	 * the children. So ignore these on the add side for now.
>> +	 */
>> +	if (of_node_is_type(dn, "ibm,platform-facilities")) {
>> +		pr_notice("ignoring add operation for %pOF\n", dn);
>> +		dlpar_free_cc_nodes(dn);
>> +		return 0;
>> +	}
>> +
>>   	rc = dlpar_attach_node(dn, parent_dn);
>>   	if (rc)
>>   		dlpar_free_cc_nodes(dn);
>>
> 



More information about the Linuxppc-dev mailing list