[RFC] Remove memory from nodes for memtrace.
RashmicaGupta
rashmicy at gmail.com
Thu Feb 23 15:29:17 AEDT 2017
On 23/02/17 14:56, Nicholas Piggin wrote:
> On Thu, 23 Feb 2017 08:39:10 +1100
> Rashmica Gupta <rashmica.g at gmail.com> wrote:
>
>> Some powerpc hardware features may want to gain access to a
>> chunk of undisturbed real memory. This update provides a means to unplug
>> said memory from the kernel with a set of sysfs calls. By writing an integer
>> containing the size of memory to be unplugged into
>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that much
>> memory from the end of each available chip's memory space. In addition, the
>> means to read out the contents of the unplugged memory is also provided by
>> reading out the /sys/kernel/debug/powerpc/memtrace/<chip-id>/dump file.
>>
>> Signed-off-by: Rashmica Gupta <rashmica.g at gmail.com>
> Thanks for picking this up. A few suggestions.
>
>> ---
>> Written by Douglas Lehr <dllehr at us.ibm.com>.
> You could mention original author in the changelog. I thought Anton
> wrote some of it too?
Yup he did. Will do that.
>
>> arch/powerpc/mm/hash_native_64.c | 39 +++-
>> arch/powerpc/platforms/powernv/Makefile | 1 +
>> arch/powerpc/platforms/powernv/memtrace.c | 285 ++++++++++++++++++++++++++++++
>> 3 files changed, 321 insertions(+), 4 deletions(-)
>> create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> index cc33260..44cc6ce 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -3,7 +3,7 @@
>> *
>> * SMP scalability work:
>> * Copyright (C) 2001 Anton Blanchard <anton at au.ibm.com>, IBM
>> - *
>> + *
>> * This program is free software; you can redistribute it and/or
>> * modify it under the terms of the GNU General Public License
>> * as published by the Free Software Foundation; either version
>> @@ -181,7 +181,7 @@ static inline void native_lock_hpte(struct hash_pte *hptep)
>> while (1) {
>> if (!test_and_set_bit_lock(HPTE_LOCK_BIT, word))
>> break;
>> - while(test_bit(HPTE_LOCK_BIT, word))
>> + while (test_bit(HPTE_LOCK_BIT, word))
>> cpu_relax();
>> }
>> }
>> @@ -208,10 +208,10 @@ static long native_hpte_insert(unsigned long hpte_group, unsigned long vpn,
>> }
>>
>> for (i = 0; i < HPTES_PER_GROUP; i++) {
>> - if (! (be64_to_cpu(hptep->v) & HPTE_V_VALID)) {
>> + if (!(be64_to_cpu(hptep->v) & HPTE_V_VALID)) {
>> /* retry with lock held */
>> native_lock_hpte(hptep);
>> - if (! (be64_to_cpu(hptep->v) & HPTE_V_VALID))
>> + if (!(be64_to_cpu(hptep->v) & HPTE_V_VALID))
>> break;
>> native_unlock_hpte(hptep);
> Suggest dropping the whitespace cleanups.
>
>
>> }
>> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>> tlbie(vpn, psize, psize, ssize, 0);
>> }
>>
>> +/*
>> + * Remove a bolted kernel entry. Memory hotplug uses this.
>> + *
>> + * No need to lock here because we should be the only user.
>> + */
>> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
>> +{
>> + unsigned long vpn;
>> + unsigned long vsid;
>> + long slot;
>> + struct hash_pte *hptep;
>> +
>> + vsid = get_kernel_vsid(ea, ssize);
>> + vpn = hpt_vpn(ea, vsid, ssize);
>> +
>> + slot = native_hpte_find(vpn, psize, ssize);
>> + if (slot == -1)
>> + return -ENOENT;
>> +
>> + hptep = htab_address + slot;
>> +
>> + /* Invalidate the hpte */
>> + hptep->v = 0;
>> +
>> + /* Invalidate the TLB */
>> + tlbie(vpn, psize, psize, ssize, 0);
>> + return 0;
>> +}
>> +
>> +
>> static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
>> int bpsize, int apsize, int ssize, int local)
>> {
>> @@ -722,6 +752,7 @@ void __init hpte_init_native(void)
>> mmu_hash_ops.hpte_invalidate = native_hpte_invalidate;
>> mmu_hash_ops.hpte_updatepp = native_hpte_updatepp;
>> mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
>> + mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
> I would submit this as a separate patch. Consider if it should be made
> dependent on any CONFIG options (e.g., MEMORY_HOTREMOVE), and Aneesh had
> been looking at this too so he could help review it.
>
>> mmu_hash_ops.hpte_insert = native_hpte_insert;
>> mmu_hash_ops.hpte_remove = native_hpte_remove;
>> mmu_hash_ops.hpte_clear_all = native_hpte_clear;
>> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
>> index b5d98cb..2026661 100644
>> --- a/arch/powerpc/platforms/powernv/Makefile
>> +++ b/arch/powerpc/platforms/powernv/Makefile
>> @@ -11,4 +11,5 @@ obj-$(CONFIG_EEH) += eeh-powernv.o
>> obj-$(CONFIG_PPC_SCOM) += opal-xscom.o
>> obj-$(CONFIG_MEMORY_FAILURE) += opal-memory-errors.o
>> obj-$(CONFIG_TRACEPOINTS) += opal-tracepoints.o
>> +obj-$(CONFIG_MEMORY_HOTREMOVE) += memtrace.o
>> obj-$(CONFIG_OPAL_PRD) += opal-prd.o
> I would make a new Kconfig option for it, for the time being.
>
>
>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>> +{
>> + if (mem->state != MEM_ONLINE)
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>> +{
>> + unsigned long state = (unsigned long)arg;
>> +
>> + mem->state = state;
>> + return 0;
>> +}
>> +
>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>> +{
>> + u64 end_pfn = start_pfn + nr_pages - 1;
>> +
>> + if (walk_memory_range(start_pfn, end_pfn, NULL,
>> + check_memblock_online))
>> + return false;
>> +
>> + walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>> + change_memblock_state);
>> +
>> + if (offline_pages(start_pfn, nr_pages)) {
>> + walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>> + change_memblock_state);
>> + return false;
>> + }
>> +
>> + walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>> + change_memblock_state);
>> +
>> + /* RCU grace period? */
>> + flush_memory_region((u64)__va(start_pfn << PAGE_SHIFT), nr_pages << PAGE_SHIFT);
>> +
>> + remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
>> +
>> + return true;
>> +}
> This is the tricky part. Memory hotplug APIs don't seem well suited for
> what we're trying to do... Anyway, do a bit of grepping around for
> definitions of some of these calls, and how other code uses them. For
> example, remove_memory comment says caller must hold
> lock_device_hotplug() first, so we're missing that at least. I think
> that's also needed over the memblock state changes.
Yup, realised I missed the lock_device_hotplug()
that right after I sent the email...
>
> We don't need an RCU grace period there AFAICT, because offline_pages
> should have us covered.
>
>
> I haven't looked at memory hotplug enough to know why we're open-coding
> the memblock stuff there. It would be nice to just be able to call
> memblock_remove() like the pseries hotplug code does.
>
> I *think* it is because hot remove mostly comes from when we know about
> an online region of memory and we want to take it down. In this case we
> also are trying to discover if those addresses are covered by online
> memory. Still, I wonder if there are better memblock APIs to do this
> with? Balbir may have a better idea of that?
>
Gotcha. Will look into that and other ways to do this.
>> +/* XXX FIXME DEBUG CRAP */
>> +machine_device_initcall(pseries, memtrace_init);
> Can remove that, I think.
>
> Thanks,
> Nick
More information about the Linuxppc-dev
mailing list