[PATCH v7 2/4] powerpc/pseries: Revert 'Auto-online hotplugged memory'

Nathan Fontenot nfont at linux.vnet.ibm.com
Wed Feb 22 03:41:51 AEDT 2017


On 02/20/2017 07:02 PM, Michael Ellerman wrote:
> Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
> 
>> On 02/15/2017 10:34 PM, Michael Ellerman wrote:
>>> Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
>>>
>>>> Revert the patch patch to auto-online hotplugged memory, commit
>>>> id ec999072442a. Using the auto-online acpability does online added
>>>> memory but does not update the associated device struct to
>>>> indicate that the memory is online. The result of this is that
>>>> memoryXX/online file in sysfs still reports the memory as being offline.
>>>
>>> Isn't that just a bug in the auto-online code?
>>
>> After digging through the code some more and reading some of the email
>> chain when the auto-online feature was submitted I can't decide if this
>> is a bug or if this is by design. The fact that they only other users
>> of this appear to be balloon drivers (hv and xen) makes me think this
>> may be by design.
> 
> Have we asked the original authors? I don't see them on Cc?

Not yet. I'm working on a patch to use device_online() for doing
auto online of memory. I will send this out as an RFC with an explanation
of what I'm seeing and ask why it was done the way it was.

> 
>> Changing the auto-online capability to call device_offline() instead
>> would appear to also require changes to the hv and xen balloon
>> drivers for the new behavior.
> 
> OK, if that's the case then that's going to make life tricky.

Yep. I'll ask about this with the RFC I send out.

-Nathan

> 
>>> If I'm reading it right it's calling online_memory_block(). If that
>>> doesn't cause the memory_block to be online that would puzzle me.
>>
>> The memory is online and usuable when the dlpar operation completes. I
>> was mistaken in my original note though, the state file in sysfs does report
>> the memory as being online. The underlying issue is that the device struct
>> does not get updated (dev->offline) when using the auto-online capability.
>> The result is that trying to remove a LMB a second time fails when we call
>> device_offline() which checks the dev->offline flag and returns failure.
> 
> That still sounds like a bug to me. We asked the core to "auto online"
> the added memory, but the dev is still offline? But maybe there's some
> subtlety.
> 
>> I think reverting the patch to use the auto-online capability may be the
>> way to go. This would restore the code so that we call device_online and
>> device_offline for add and remove respectively, and not rely on what the 
>> auto-online code is doing.
>>
>> Thoughts?
> 
> It's not great, but given we need to backport it to v4.8, yeah I think
> we'll have to go with a revert.
> 
> But we should also pursue fixing the auto online logic.
> 
>>> Also commit ec999072442a went into v4.8, so is memory hotplug broken
>>> since then? If so we need to backport this or whatever fix we come up.
>>
>> Yes, we need to backport whatever fix we do.
> 
> Right. I'll queue it up.
> 
> cheers
> 



More information about the Linuxppc-dev mailing list