[PATCH v7 2/4] powerpc/pseries: Revert 'Auto-online hotplugged memory'

Michael Ellerman mpe at ellerman.id.au
Tue Feb 21 12:02:00 AEDT 2017

Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:

> On 02/15/2017 10:34 PM, Michael Ellerman wrote:
>> Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
>>> Revert the patch patch to auto-online hotplugged memory, commit
>>> id ec999072442a. Using the auto-online acpability does online added
>>> memory but does not update the associated device struct to
>>> indicate that the memory is online. The result of this is that
>>> memoryXX/online file in sysfs still reports the memory as being offline.
>> Isn't that just a bug in the auto-online code?
> After digging through the code some more and reading some of the email
> chain when the auto-online feature was submitted I can't decide if this
> is a bug or if this is by design. The fact that they only other users
> of this appear to be balloon drivers (hv and xen) makes me think this
> may be by design.

Have we asked the original authors? I don't see them on Cc?

> Changing the auto-online capability to call device_offline() instead
> would appear to also require changes to the hv and xen balloon
> drivers for the new behavior.

OK, if that's the case then that's going to make life tricky.
>> If I'm reading it right it's calling online_memory_block(). If that
>> doesn't cause the memory_block to be online that would puzzle me.
> The memory is online and usuable when the dlpar operation completes. I
> was mistaken in my original note though, the state file in sysfs does report
> the memory as being online. The underlying issue is that the device struct
> does not get updated (dev->offline) when using the auto-online capability.
> The result is that trying to remove a LMB a second time fails when we call
> device_offline() which checks the dev->offline flag and returns failure.

That still sounds like a bug to me. We asked the core to "auto online"
the added memory, but the dev is still offline? But maybe there's some

> I think reverting the patch to use the auto-online capability may be the
> way to go. This would restore the code so that we call device_online and
> device_offline for add and remove respectively, and not rely on what the 
> auto-online code is doing.
> Thoughts?

It's not great, but given we need to backport it to v4.8, yeah I think
we'll have to go with a revert.

But we should also pursue fixing the auto online logic.

>> Also commit ec999072442a went into v4.8, so is memory hotplug broken
>> since then? If so we need to backport this or whatever fix we come up.
> Yes, we need to backport whatever fix we do.

Right. I'll queue it up.


