[PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
David Gibson
david at gibson.dropbear.id.au
Mon Apr 19 15:34:25 AEST 2021
On Thu, Apr 08, 2021 at 11:51:36PM -0300, Leonardo Bras wrote:
> Hello David, thanks for the feedback!
>
> On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > > +{
> > > + /*
> > > + * Resizing-up HPT should never fail, but there are some cases system starts with higher
> > > + * SHIFT than required, and we go through the funny case of resizing HPT down while
> > > + * adding memory
> > > + */
> > > +
> > > + while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > > + newsize *= 2;
> > > + pr_warn("Hash collision while resizing HPT\n");
> >
> > This unbounded increase in newsize makes me nervous - we should be
> > bounded by the current size of the HPT at least. In practice we
> > should be fine, since the resize should always succeed by the time we
> > reach our current HPT size, but that's far from obvious from this
> > point in the code.
>
> Sure, I will add bounds in v2.
>
> >
> > And... you're doubling newsize which is a value which might not be a
> > power of 2. I'm wondering if there's an edge case where this could
> > actually cause us to skip the current size and erroneously resize to
> > one bigger than we have currently.
>
> I also though that at the start, but it seems quite reliable.
> Before using this value, htab_shift_for_mem_size() will always round it
> to next power of 2.
> Ex.
> Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
> calculation. If we multiply it by 2 (same as << 1), we have that
> anything between 0b01010 and 0b10000 will be rounded to 0b10000.
Ah, good point.
> This works just fine as long as we are multiplying.
> Division may have the behavior you expect, as 0b0101 >> 1 would become
> 0b010 and skip a shift.
>
> > > +void memory_batch_expand_prepare(unsigned long newsize)
> >
> > This wrapper doesn't seem useful.
>
> Yeah, it does little, but I can't just jump into hash_* functions
> directly from hotplug-memory.c, without even knowing if it's using hash
> pagetables. (in case the suggestion would be test for disable_radix
> inside hash_memory_batch*)
>
> >
> > > +{
> > > + if (!radix_enabled())
> > > + hash_memory_batch_expand_prepare(newsize);
> > > +}
> > > #endif /* CONFIG_MEMORY_HOTPLUG */
> > >
> > >
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() +
> > > + drmem_info->n_lmbs * drmem_lmb_size());
> >
> > This doesn't look right. memory_add_by_index() is adding a *single*
> > LMB, I think using drmem_info->n_lmbs here means you're counting this
> > as adding again as much memory as you already have hotplugged.
>
> Yeah, my mistake. This makes sense.
> I will change it to something like
> memblock_phys_mem_size() + drmem_lmb_size()
>
> > >
> > > + memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * drmem_lmb_size());
> > > +
> > > for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> > > if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > > continue;
> >
> > I don't see memory_batch_expand_prepare() suppressing any existing HPT
> > resizes. Won't this just resize to the right size for the full add,
> > then resize several times again as we perform the add? Or.. I guess
> > that will be suppressed by patch 1/3.
>
> Correct.
>
> > That's seems kinda fragile, though.
>
> What do you mean by fragile here?
> What would you suggest doing different?
>
> Best regards,
> Leonardo Bras
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20210419/b0ccfdaf/attachment-0001.sig>
More information about the Linuxppc-dev
mailing list