[PATCH v4] pseries/drmem: update LMBs after LPM
Laurent Dufour
ldufour at linux.ibm.com
Thu May 6 00:39:23 AEST 2021
Le 05/05/2021 à 00:30, Nathan Lynch a écrit :
> Hi Laurent,
Hi Nathan,
Thanks for your review.
> Bear with me while I work through the commit message:
>
> Laurent Dufour <ldufour at linux.ibm.com> writes:
>> After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
>> updated by the hypervisor in the case the NUMA topology of the LPAR's
>> memory is updated.
>
> Yes, the RTAS functions ibm,update-nodes and ibm,update-properties,
> which the OS invokes after resuming, may bring in updated properties
> under the ibm,dynamic-reconfiguration-memory node, including the
> ibm,associativity-lookup-arrays property.
>
>> This is caught by the kernel,
>
> "Caught" makes me think this is an error condition, as in catching an
> exception. I guess "handled" better conveys your meaning?
ok
>
>> but the memory's node is updated because
>> there is no way to move a memory block between nodes.
>
> "The memory's node" refers the ibm,dynamic-reconfiguration-memory DT
> node, yes? Or is it referring to Linux's NUMA nodes? ("move a memory
> block between nodes" in your statement here refers to Linux's NUMA
> nodes, that much is clear to me.)
>
> I am failing to follow the cause->effect relationship stated. True,
> changing a block's node assignment while it's in use isn't safe. I don't
> see why that implies that "the memory's node is updated"? In fact this
> seems contradictory.
>
> This statement makes more sense to me if I change it to "the memory's
> node is _not_ updated" -- is this what you intended?
Correct, I dropped the 'not' word here ;)
>
>> If later a memory block is added or removed, drmem_update_dt() is called
>> and it is overwriting the DT node to match the added or removed LMB.
>
> I understand this, but I will expand on it.
>
> dlpar_memory()
> -> dlpar_memory_add_by_count()
> -> dlpar_add_lmb()
> -> update_lmb_associativity_index()
> ... lmb->aa_index = <value>
> -> drmem_update_dt()
>
> update_lmb_associativity_index() retrieves the firmware description of
> the new block, and sets the aa_index of the matching entry in the
> drmem_info array to the value matching the firmware description.
>
> Then, drmem_update_dt() walks the drmem_info array and synthesizes a new
> /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory-v2 property based
> on the recently updated information in that array.
Yes
>
>> But the LMB's associativity node has not been updated after the DT
>> node update and thus the node is overwritten by the Linux's topology
>> instead of the hypervisor one.
>
> So, an example of the problem is:
>
> 1. VM migrates. On resume, ibm,associativity-lookup-arrays is changed
> via ibm,update-properties. Entries in the drmem_info array remain
> unchanged, with aa_index values that correspond to the source
> system's ibm,associativity-lookup-arrays property, now inaccessible.
>
> 2. A memory block is added. We look up the new block's entry in the
> drmem_info array, and set the aa_index to the value matching the
> current ibm,associativity-lookup-arrays.
>
> 3. Then, the ibm,associativity-lookup-arrays property is completely
> regenerated from the drmem_info array, which reflects a mixture of
> information from the source and destination systems.
>
> Do I understand correctly?
Yes
>
>
>> Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
>> updated to force an update of the LMB's associativity. However, ignore the
>> call to that hook when the update has been triggered by drmem_update_dt().
>> Because, in that case, the LMB tree has been used to set the DT property
>> and thus it doesn't need to be updated back. Since drmem_update_dt() is
>> called under the protection of the device_hotplug_lock and the hook is
>> called in the same context, use a simple boolean variable to detect that
>> call.
>
> This strikes me as almost a revert of e978a3ccaa71 ("powerpc/pseries:
> remove obsolete memory hotplug DT notifier code").
Not really identical to reverting e978a3ccaa71, here only the aa_index of the
LMB is updated, everything else is kept in place. I don't try to apply the
memory layout's changes, just updating the in use LMB's aa_index field.
The only matching point with the code reverted by the commit you mentioned would
be the use of a global variable in_drmem_update instead of the previous
rtas_hp_event to prevent the LMB tree to be updated again during memory hot plug
event.
> I'd rather avoid smuggling through global state information that ought
> to be passed in function parameters, if it should be passed around at
> all. Despite having (IMO) relatively simple responsibilities, this code
> is difficult to change and review; adding this property makes it
> worse. If the structure of the code is pushing us toward this kind of
> compromise, then the code probably needs more fundamental changes.
>
> I'm probably forgetting something -- can anyone remind me why we need an
> array of these:
>
> struct drmem_lmb {
> u64 base_addr;
> u32 drc_index;
> u32 aa_index;
> u32 flags;
> };
>
> which is just a less efficient representation of what's already in the
> device tree? If we got rid of it, would this problem disappear?
I don't think this is right for the moment, at first, we should robustify the
DLPAR and LPM operations.
More information about the Linuxppc-dev
mailing list