[PATCH] powerpc/mobility: Fix node detach/rename problem
Michael Ellerman
mpe at ellerman.id.au
Tue Jul 31 16:42:28 AEST 2018
Tyrel Datwyler <tyreld at linux.vnet.ibm.com> writes:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree. When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs. When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel. The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>>
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities". If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique. As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
>
> So, this patch actually just band aids over the real problem. This is
> a long standing problem with several PFO drivers leaking references.
> The issue here is that, during the device tree update that follows a
> migration. the update of the ibm,platform-facilities node and friends
> below are always deleted and re-added on the destination lpar and
> subsequently the leaked references prevent the devices nodes from
> every actually being properly cleaned up after detach. Thus, leading
> to the issue you are observing.
Leaking references shouldn't affect the node being detached from the
tree though.
See of_detach_node() calling __of_detach_node(), none of that depends on
the refcount.
It's only the actual freeing of the node, in of_node_release() that is
prevented by leaked reference counts.
So I agree we need to do a better job with the reference counting, but I
don't see how it is causing the problem here.
cheers
More information about the Linuxppc-dev
mailing list