[PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

David Hildenbrand david at redhat.com
Fri Aug 17 18:27:50 AEST 2018


On 17.08.2018 10:20, Rafael J. Wysocki wrote:
> On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david at redhat.com> wrote:
>>
>> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
>> fix concurrent memory hot-add deadlock"), which tried to fix a possible
>> lock inversion reported and discussed in [1] due to the two locks
>>         a) device_lock()
>>         b) mem_hotplug_lock
>>
>> While add_memory() first takes b), followed by a) during
>> bus_probe_device(), onlining of memory from user space first took b),
>> followed by a), exposing a possible deadlock.
>>
>> In [1], and it was decided to not make use of device_hotplug_lock, but
>> rather to enforce a locking order. Looking at 1., this order is not always
>> satisfied when calling device_online() - essentially we simply don't take
>> one of both locks anymore - and fixing this would require us to
>> take the mem_hotplug_lock in core driver code (online_store()), which
>> sounds wrong.
>>
>> The problems I spotted related to this:
>>
>> 1. Memory block device attributes: While .state first calls
>>    mem_hotplug_begin() and the calls device_online() - which takes
>>    device_lock() - .online does no longer call mem_hotplug_begin(), so
>>    effectively calls online_pages() without mem_hotplug_lock. onlining/
>>    offlining of pages is no longer serialised across different devices.
>>
>> 2. device_online() should be called under device_hotplug_lock, however
>>    onlining memory during add_memory() does not take care of that. (I
>>    didn't follow how strictly this is needed, but there seems to be a
>>    reason because it is documented at device_online() and
>>    device_offline()).
>>
>> In addition, I think there is also something wrong about the locking in
>>
>> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>>    (and device_online()) without locks. This was introduced after
>>    30467e0b3be. And skimming over the code, I assume it could need some
>>    more care in regards to locking.
>>
>> ACPI code already holds the device_hotplug_lock, and as we are
>> effectively hotplugging memory block devices, requiring to hold that
>> lock does not sound too wrong, although not chosen in [1], as
>>         "I don't think resolving a locking dependency is appropriate by
>>          just serializing them with another lock."
>> I think this is the cleanest solution.
>>
>> Requiring add_memory()/add_memory_resource() to be called under
>> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
>> online_pages/offline_pages() fixes 1. and 3.
>>
>> Fixup all callers of add_memory/add_memory_resource to hold the lock if
>> not already done.
>>
>> So this is essentially a revert of 30467e0b3be, implementation of what
>> was suggested in [1] by Vitaly, applied to the current tree.
>>
>> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
>>     2015-February/065324.html
>>
>> This patch is partly based on a patch by Vitaly Kuznetsov.
>>
>> Signed-off-by: David Hildenbrand <david at redhat.com>
>> ---
>>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>>  drivers/acpi/acpi_memhotplug.c            |  1 +
>>  drivers/base/memory.c                     | 18 +++++-----
>>  drivers/hv/hv_balloon.c                   |  4 +++
>>  drivers/s390/char/sclp_cmd.c              |  3 ++
>>  drivers/xen/balloon.c                     |  3 ++
>>  mm/memory_hotplug.c                       | 42 ++++++++++++++++++-----
>>  7 files changed, 55 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> index 51dc398ae3f7..4c2737a33020 100644
>> --- a/arch/powerpc/platforms/powernv/memtrace.c
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -206,6 +206,8 @@ static int memtrace_online(void)
>>         int i, ret = 0;
>>         struct memtrace_entry *ent;
>>
>> +       /* add_memory() requires device_hotplug_lock */
>> +       lock_device_hotplug();
>>         for (i = memtrace_array_nr - 1; i >= 0; i--) {
>>                 ent = &memtrace_array[i];
>>
>> @@ -244,6 +246,7 @@ static int memtrace_online(void)
>>                 pr_info("Added trace memory back to node %d\n", ent->nid);
>>                 ent->size = ent->start = ent->nid = -1;
>>         }
>> +       unlock_device_hotplug();
>>         if (ret)
>>                 return ret;
>>
>> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
>> index 6b0d3ef7309c..e7a4c7900967 100644
>> --- a/drivers/acpi/acpi_memhotplug.c
>> +++ b/drivers/acpi/acpi_memhotplug.c
>> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>>                 if (node < 0)
>>                         node = memory_add_physaddr_to_nid(info->start_addr);
>>
>> +               /* we already hold the device_hotplug lock at this point */
>>                 result = add_memory(node, info->start_addr, info->length);
>>
>>                 /*
> 
> A very minor nit here: I would say "device_hotplug_lock is already
> held at this point" in the comment (I sort of don't like to say "we"
> in code comments as it is not particularly clear what group of people
> is represented by that and the lock is actually called
> device_hotplug_lock).

Easy to fix, thanks!

> 
> Otherwise the approach is fine by me.
> 
> BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
> hotplug is because it generally needs to be synchronized with respect
> CPU hot-remove and similar.  I believe that this may be the case in
> non-ACPI setups as well.

Yes, and that lock is the reason why we didn't have real problems with
ACPI memory hotplug in this respect so far. (as user triggered
online/offline also takes that lock already)

> 
> Thanks,
> Rafael
> 


-- 

Thanks,

David / dhildenb


More information about the Linuxppc-dev mailing list