[RFC PATCH v1] pseries/drmem: don't cache node id in drmem_lmb struct
Scott Cheloha
cheloha at linux.ibm.com
Fri Mar 13 03:07:04 AEDT 2020
Hi Michal,
On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote:
> On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote:
> > At memory hot-remove time we can retrieve an LMB's nid from its
> > corresponding memory_block. There is no need to store the nid
> > in multiple locations.
> >
> > Signed-off-by: Scott Cheloha <cheloha at linux.ibm.com>
> > ---
> > The linear search in powerpc's memory_add_physaddr_to_nid() has become a
> > bottleneck at boot on systems with many LMBs.
> >
> > As described in this patch here:
> >
> > https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/
> >
> > the linear search seriously cripples drmem_init().
> >
> > The obvious solution (shown in that patch) is to just make the search
> > in memory_add_physaddr_to_nid() faster. An XArray seems well-suited
> > to the task of mapping an address range to an LMB object.
> >
> > The less obvious approach is to just call memory_add_physaddr_to_nid()
> > in fewer places.
> >
> > I'm not sure which approach is correct, hence the RFC.
>
> You basically revert the below which will likely cause the very error
> that was fixed there:
>
> commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40
> Author: Nathan Fontenot <nfont at linux.vnet.ibm.com>
> Date: Tue Oct 2 10:35:59 2018 -0500
>
> powerpc/pseries: Track LMB nid instead of using device tree
>
> When removing memory we need to remove the memory from the node
> it was added to instead of looking up the node it should be in
> in the device tree.
>
> During testing we have seen scenarios where the affinity for a
> LMB changes due to a partition migration or PRRN event. In these
> cases the node the LMB exists in may not match the node the device
> tree indicates it belongs in. This can lead to a system crash
> when trying to DLPAR remove the LMB after a migration or PRRN
> event. The current code looks up the node in the device tree to
> remove the LMB from, the crash occurs when we try to offline this
> node and it does not have any data, i.e. node_data[nid] == NULL.
I'm aware of this patch and that this is a near-total revert.
I'm not reintroducing the original behavior, though. Instead of going
to the device tree to recompute the expected node id I'm retrieving it
from the LMB's corresponding memory_block.
That crucial difference is this chunk:
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *);
static int dlpar_remove_lmb(struct drmem_lmb *lmb)
{
+ struct memory_block *mem_block;
unsigned long block_sz;
int rc;
if (!lmb_is_removable(lmb))
return -EINVAL;
+ mem_block = lmb_to_memblock(lmb);
+ if (mem_block == NULL)
+ return -EINVAL;
+
rc = dlpar_offline_lmb(lmb);
if (rc)
return rc;
block_sz = pseries_memory_block_size();
- __remove_memory(lmb->nid, lmb->base_addr, block_sz);
+ __remove_memory(mem_block->nid, lmb->base_addr, block_sz);
/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);
invalidate_lmb_associativity_index(lmb);
- lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
return 0;
---
Unless it's possible that the call:
__add_memory(my_nid, my_addr, my_size);
does not yield the following:
memory_block.nid == my_nid
on success, then I think the solution in my patch is equivalent to and
simpler than the one introduced in the patch you quote.
Can you see a way the above would not hold?
Then again, maybe there's a good reason not to retrieve the node id in
this way. I'm thinking David Hildenbrand and/or Nathan Fontenont may
have some insight on that.
More information about the Linuxppc-dev
mailing list