[PATCH v3] pseries/drmem: update LMBs after LPM
Laurent Dufour
ldufour at linux.ibm.com
Tue May 4 03:28:05 AEST 2021
Le 01/05/2021 à 01:58, Tyrel Datwyler a écrit :
> On 4/30/21 9:13 AM, Laurent Dufour wrote:
>> Le 29/04/2021 à 21:12, Tyrel Datwyler a écrit :
>>> On 4/29/21 3:27 AM, Aneesh Kumar K.V wrote:
>>>> Laurent Dufour <ldufour at linux.ibm.com> writes:
>>>>
>>>>> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
>>>>> updated by the hypervisor in the case the NUMA topology of the LPAR's
>>>>> memory is updated.
>>>>>
>>>>> This is caught by the kernel, but the memory's node is updated because
>>>>> there is no way to move a memory block between nodes.
>>>>>
>>>>> If later a memory block is added or removed, drmem_update_dt() is called
>>>>> and it is overwriting the DT node to match the added or removed LMB. But
>>>>> the LMB's associativity node has not been updated after the DT node update
>>>>> and thus the node is overwritten by the Linux's topology instead of the
>>>>> hypervisor one.
>>>>>
>>>>> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
>>>>> updated to force an update of the LMB's associativity.
>>>>>
>>>>> Cc: Tyrel Datwyler <tyreld at linux.ibm.com>
>>>>> Signed-off-by: Laurent Dufour <ldufour at linux.ibm.com>
>>>>> ---
>>>>>
>>>>> V3:
>>>>> - Check rd->dn->name instead of rd->dn->full_name
>>>>> V2:
>>>>> - Take Tyrel's idea to rely on OF_RECONFIG_UPDATE_PROPERTY instead of
>>>>> introducing a new hook mechanism.
>>>>> ---
>>>>> arch/powerpc/include/asm/drmem.h | 1 +
>>>>> arch/powerpc/mm/drmem.c | 35 +++++++++++++++++++
>>>>> .../platforms/pseries/hotplug-memory.c | 4 +++
>>>>> 3 files changed, 40 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/drmem.h
>>>>> b/arch/powerpc/include/asm/drmem.h
>>>>> index bf2402fed3e0..4265d5e95c2c 100644
>>>>> --- a/arch/powerpc/include/asm/drmem.h
>>>>> +++ b/arch/powerpc/include/asm/drmem.h
>>>>> @@ -111,6 +111,7 @@ int drmem_update_dt(void);
>>>>> int __init
>>>>> walk_drmem_lmbs_early(unsigned long node, void *data,
>>>>> int (*func)(struct drmem_lmb *, const __be32 **, void *));
>>>>> +void drmem_update_lmbs(struct property *prop);
>>>>> #endif
>>>>> static inline void invalidate_lmb_associativity_index(struct drmem_lmb
>>>>> *lmb)
>>>>> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
>>>>> index 9af3832c9d8d..f0a6633132af 100644
>>>>> --- a/arch/powerpc/mm/drmem.c
>>>>> +++ b/arch/powerpc/mm/drmem.c
>>>>> @@ -307,6 +307,41 @@ int __init walk_drmem_lmbs_early(unsigned long node,
>>>>> void *data,
>>>>> return ret;
>>>>> }
>>>>> +/*
>>>>> + * Update the LMB associativity index.
>>>>> + */
>>>>> +static int update_lmb(struct drmem_lmb *updated_lmb,
>>>>> + __maybe_unused const __be32 **usm,
>>>>> + __maybe_unused void *data)
>>>>> +{
>>>>> + struct drmem_lmb *lmb;
>>>>> +
>>>>> + /*
>>>>> + * Brut force there may be better way to fetch the LMB
>>>>> + */
>>>>> + for_each_drmem_lmb(lmb) {
>>>>> + if (lmb->drc_index != updated_lmb->drc_index)
>>>>> + continue;
>>>>> +
>>>>> + lmb->aa_index = updated_lmb->aa_index;
>>>>> + break;
>>>>> + }
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Update the LMB associativity index.
>>>>> + *
>>>>> + * This needs to be called when the hypervisor is updating the
>>>>> + * dynamic-reconfiguration-memory node property.
>>>>> + */
>>>>> +void drmem_update_lmbs(struct property *prop)
>>>>> +{
>>>>> + if (!strcmp(prop->name, "ibm,dynamic-memory"))
>>>>> + __walk_drmem_v1_lmbs(prop->value, NULL, NULL, update_lmb);
>>>>> + else if (!strcmp(prop->name, "ibm,dynamic-memory-v2"))
>>>>> + __walk_drmem_v2_lmbs(prop->value, NULL, NULL, update_lmb);
>>>>> +}
>>>>> #endif
>>>>> static int init_drmem_lmb_size(struct device_node *dn)
>>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>>> index 8377f1f7c78e..672ffbee2e78 100644
>>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>>> @@ -949,6 +949,10 @@ static int pseries_memory_notifier(struct
>>>>> notifier_block *nb,
>>>>> case OF_RECONFIG_DETACH_NODE:
>>>>> err = pseries_remove_mem_node(rd->dn);
>>>>> break;
>>>>> + case OF_RECONFIG_UPDATE_PROPERTY:
>>>>> + if (!strcmp(rd->dn->name,
>>>>> + "ibm,dynamic-reconfiguration-memory"))
>>>>> + drmem_update_lmbs(rd->prop);
>>>>> }
>>>>> return notifier_from_errno(err);
>>>>
>>>> How will this interact with DLPAR memory? When we dlpar memory,
>>>> ibm,configure-connector is used to fetch the new associativity details
>>>> and set drmem_lmb->aa_index correctly there. Once that is done kernel
>>>> then call drmem_update_dt() which will result in the above notifier
>>>> callback?
>>>>
>>>> IIUC, the call back then will update drmem_lmb->aa_index again?
>>>
>>> After digging through some of this code I'm a bit concerned about all the kernel
>>> device tree manipulation around memory DLPAR both with the assoc-lookup-array
>>> prop update and post dynamic-memory prop updating. We build a drmem_info array
>>> of the LMBs from the device-tree at boot. I don't really understand why we are
>>> manipulating the device tree property every time we add/remove an LMB. Not sure
>>> the reasoning was to write back in particular the aa_index and flags for each
>>> LMB into the device tree when we already have them in the drmem_info array. On
>>> the other hand the assoc-lookup-array I suppose would need to have an in kernel
>>> representation to avoid updating the device tree property every time.
>>
>> I think the reason is to keep the device tree in sync with the current set of LMBs.
>
> I don't really think that is how the device tree is meant to be used. We have an
> in memory representation of the LMBs separate from the device tree, and that is
> were we should track OS specific state. The values in the device-tree property
> can be updated via device node remove/add or update-properties RTAS call. These
> are the means that the platform reports (OS discovers) underlying changes. The
> new property is going to blow away any previous state that the OS wrote there.
> This is likely, one of the culprits of memory DLPAR problems that have been
> observed after LPM.
One of the issue is that the kernel is overwritting the drmem property set by
the hypervisor once a LMB is added or removed after a LPM (PRRN). My patch
prevents this to happen after a LPM, I plan to check and fix PRRN later.
>>
>> My understanding is that the kernel is not really using the
>> 'ibm,dynamic-memory*' DT property once the boot is done. But user space tools
>> (like lsslot and drmgr) read it to built the LMB tree and get the DRC index for
>> each LMBs as it is not available in SYSFS.
>
> Yeah, but as I mentioned above the property can change as a result of an update
> we process from the hypervisor in response to something like LPM (or PRRN if we
> ever figure out how to make that work correctly). So, if there is some sort of
> state drmgr needs to know we have to figure out a different way to expose that
> information.
Triggering drmgr at the end of the LPM (and after a PRRN) will be the next thing
to do, so userspace tools can be run to handle these changes.
>>
>>> Changes to the device tree should be things reported to the system from the
>>> hypervisor through the proper interfaces, and as a result any code that cares
>>> can register an of_reconfig_notifier to resepond to device tree updates. The
>>> memory dlpar code seems to be needlessly manipulating the device-tree which
>>> leads to the problem here where a notifier callback is now duplicating work.
>>
>> I don't think the hypervisor is expected to update the 'ibm,dynamic-memory' each
>> time a LMB is added, this is not design this way AFAIK.
>
> It shouldn't need to for the most part. The only information that should change
> in ibm,dynamic-memory in the first place is the aa_index as a result of an
> underlying platform reassignment. The flags in ibm,dynamic-memory aren't
> changing as a result of DLPAR add/remove. The aa_index could be out of date as I
> mentioned above. The use of DRCONF_MEM_ASSIGNED in my opinion is actually a hack
> to mark LMBs as present and owned by the partition. Its actual PAPR definition
> is soley to identify LMBs that are present at boot.
>
> As of today I don't have a problem with your patch. This was more of me pointing
> out things that I think are currently wrong with our memory hotplug
> implementation, and that we need to take a long hard look at it down the road.
I do agree, there is a lot of odd things there to address in this area.
If you're ok with that patch, do you mind to add a reviewed-by?
> -Tyrel
>
>>
>> Laurent.
>>
>>> Just my two cents FWIW.
>>>
>>> -Tyrel
>>>
>>>>
>>>> -aneesh
>>>>
>>>
>>
>
More information about the Linuxppc-dev
mailing list