[RFC PATCH v1] pseries/drmem: don't cache node id in drmem_lmb struct
Nathan Lynch
nathanl at linux.ibm.com
Tue Mar 31 04:07:29 AEDT 2020
Scott Cheloha <cheloha at linux.ibm.com> writes:
> Hi Michal,
>
> On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote:
>>
>> You basically revert the below which will likely cause the very error
>> that was fixed there:
>>
>> commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40
>> Author: Nathan Fontenot <nfont at linux.vnet.ibm.com>
>> Date: Tue Oct 2 10:35:59 2018 -0500
>>
>> powerpc/pseries: Track LMB nid instead of using device tree
>>
>> When removing memory we need to remove the memory from the node
>> it was added to instead of looking up the node it should be in
>> in the device tree.
>>
>> During testing we have seen scenarios where the affinity for a
>> LMB changes due to a partition migration or PRRN event. In these
>> cases the node the LMB exists in may not match the node the device
>> tree indicates it belongs in. This can lead to a system crash
>> when trying to DLPAR remove the LMB after a migration or PRRN
>> event. The current code looks up the node in the device tree to
>> remove the LMB from, the crash occurs when we try to offline this
>> node and it does not have any data, i.e. node_data[nid] == NULL.
>
> I'm aware of this patch and that this is a near-total revert.
>
> I'm not reintroducing the original behavior, though. Instead of going
> to the device tree to recompute the expected node id I'm retrieving it
> from the LMB's corresponding memory_block.
>
> That crucial difference is this chunk:
>
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *);
>
> static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> {
> + struct memory_block *mem_block;
> unsigned long block_sz;
> int rc;
>
> if (!lmb_is_removable(lmb))
> return -EINVAL;
>
> + mem_block = lmb_to_memblock(lmb);
> + if (mem_block == NULL)
> + return -EINVAL;
> +
Assuming lmb_to_memblock() -> find_memory_block() isn't engaging in O(n)
behavior or worse (which is the case in linux-next), then I think
Scott's change makes sense and is a net win.
More information about the Linuxppc-dev
mailing list