[RFC PATCH v1] pseries/drmem: don't cache node id in drmem_lmb struct

Scott Cheloha cheloha at linux.ibm.com
Fri Mar 13 03:07:04 AEDT 2020


Hi Michal,

On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote:
> On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote:
> > At memory hot-remove time we can retrieve an LMB's nid from its
> > corresponding memory_block.  There is no need to store the nid
> > in multiple locations.
> > 
> > Signed-off-by: Scott Cheloha <cheloha at linux.ibm.com>
> > ---
> > The linear search in powerpc's memory_add_physaddr_to_nid() has become a
> > bottleneck at boot on systems with many LMBs.
> > 
> > As described in this patch here:
> > 
> > https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/
> > 
> > the linear search seriously cripples drmem_init().
> > 
> > The obvious solution (shown in that patch) is to just make the search
> > in memory_add_physaddr_to_nid() faster.  An XArray seems well-suited
> > to the task of mapping an address range to an LMB object.
> > 
> > The less obvious approach is to just call memory_add_physaddr_to_nid()
> > in fewer places.
> > 
> > I'm not sure which approach is correct, hence the RFC.
> 
> You basically revert the below which will likely cause the very error
> that was fixed there:
> 
> commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40
> Author: Nathan Fontenot <nfont at linux.vnet.ibm.com>
> Date:   Tue Oct 2 10:35:59 2018 -0500
> 
>     powerpc/pseries: Track LMB nid instead of using device tree
>     
>     When removing memory we need to remove the memory from the node
>     it was added to instead of looking up the node it should be in
>     in the device tree.
>     
>     During testing we have seen scenarios where the affinity for a
>     LMB changes due to a partition migration or PRRN event. In these
>     cases the node the LMB exists in may not match the node the device
>     tree indicates it belongs in. This can lead to a system crash
>     when trying to DLPAR remove the LMB after a migration or PRRN
>     event. The current code looks up the node in the device tree to
>     remove the LMB from, the crash occurs when we try to offline this
>     node and it does not have any data, i.e. node_data[nid] == NULL.

I'm aware of this patch and that this is a near-total revert.

I'm not reintroducing the original behavior, though.  Instead of going
to the device tree to recompute the expected node id I'm retrieving it
from the LMB's corresponding memory_block.

That crucial difference is this chunk:

--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
+	struct memory_block *mem_block;
 	unsigned long block_sz;
 	int rc;
 
 	if (!lmb_is_removable(lmb))
 		return -EINVAL;
 
+	mem_block = lmb_to_memblock(lmb);
+	if (mem_block == NULL)
+		return -EINVAL;
+
 	rc = dlpar_offline_lmb(lmb);
 	if (rc)
 		return rc;
 
 	block_sz = pseries_memory_block_size();
 
-	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+	__remove_memory(mem_block->nid, lmb->base_addr, block_sz);
 
 	/* Update memory regions for memory remove */
 	memblock_remove(lmb->base_addr, block_sz);
 
 	invalidate_lmb_associativity_index(lmb);
-	lmb_clear_nid(lmb);
 	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
 	return 0;
---

Unless it's possible that the call:

	__add_memory(my_nid, my_addr, my_size);

does not yield the following:

	memory_block.nid == my_nid

on success, then I think the solution in my patch is equivalent to and
simpler than the one introduced in the patch you quote.

Can you see a way the above would not hold?

Then again, maybe there's a good reason not to retrieve the node id in
this way.  I'm thinking David Hildenbrand and/or Nathan Fontenont may
have some insight on that.


More information about the Linuxppc-dev mailing list