[FIX PATCH v0] powerpc: Fix memory unplug failure on radix guest
Bharata B Rao
bharata at linux.vnet.ibm.com
Tue Sep 5 14:20:32 AEST 2017
On Fri, Sep 01, 2017 at 09:11:18AM -0500, Nathan Fontenot wrote:
> On 09/01/2017 01:53 AM, Bharata B Rao wrote:
> > On Thu, Aug 10, 2017 at 02:53:48PM +0530, Bharata B Rao wrote:
> >> For a PowerKVM guest, it is possible to specify a DIMM device in
> >> addition to the system RAM at boot time. When such a cold plugged DIMM
> >> device is removed from a radix guest, we hit the following warning in the
> >> guest kernel resulting in the eventual failure of memory unplug:
> >>
> >> remove_pud_table: unaligned range
> >> WARNING: CPU: 3 PID: 164 at arch/powerpc/mm/pgtable-radix.c:597 remove_pagetable+0x468/0xca0
> >> Call Trace:
> >> remove_pagetable+0x464/0xca0 (unreliable)
> >> radix__remove_section_mapping+0x24/0x40
> >> remove_section_mapping+0x28/0x60
> >> arch_remove_memory+0xcc/0x120
> >> remove_memory+0x1ac/0x270
> >> dlpar_remove_lmb+0x1ac/0x210
> >> dlpar_memory+0xbc4/0xeb0
> >> pseries_hp_work_fn+0x1a4/0x230
> >> process_one_work+0x1cc/0x660
> >> worker_thread+0xac/0x6d0
> >> kthread+0x16c/0x1b0
> >> ret_from_kernel_thread+0x5c/0x74
> >>
> >> The DIMM memory that is cold plugged gets merged to the same memblock
> >> region as RAM and hence gets mapped at 1G alignment. However since the
> >> removal is done for one LMB (lmb size 256MB) at a time, the address
> >> of the LMB (which is 256MB aligned) would get flagged as unaligned
> >> in remove_pud_table() resulting in the above failure.
> >>
> >> This problem is not seen for hot plugged memory because for the
> >> hot plugged memory, the mappings are created separately for each
> >> LMB and hence they all get aligned at 256MB.
> >>
> >> To fix this problem for the cold plugged memory, let us mark the
> >> cold plugged memblock region explicitly as HOTPLUGGED so that the
> >> region doesn't get merged with RAM. All the memory that is discovered
> >> via ibm,dynamic-memory-configuration is marked so(1). Next identify
> >> such regions in radix_init_pgtable() and create separate mappings
> >> within that region for each LMB so that they get don't get aligned
> >> like RAM region at 1G (2).
> >>
> >> (1) For PowerKVM guests, all boot time memory is represented via
> >> memory at XXXX nodes and hot plugged/pluggable memory is represented via
> >> ibm,dynamic-memory-reconfiguration property. We are marking all
> >> hotplugged memory that is in ASSIGNED state during boot as HOTPLUGGED.
> >> With this only cold plugged memory gets marked for PowerKVM but
> >> need to check how this will affect PowerVM guests.
> >>
> >> (2) To create separate mappings for every LMB in the hot plugged
> >> region, we need lmb-size. I am currently using memory_block_size_bytes()
> >> API to get the lmb-size. Since this is early init time code, the
> >> machine type isn't probed yet and hence memory_block_size_bytes()
> >> would return the default LMB size as 16MB. Hence we end up creating
> >> separate mappings at much lower granularity than what we can ideally
> >> do for pseries machine.
> >>
> >> Signed-off-by: Bharata B Rao <bharata at linux.vnet.ibm.com>
> >> ---
> >> arch/powerpc/kernel/prom.c | 1 +
> >> arch/powerpc/mm/pgtable-radix.c | 17 ++++++++++++++---
> >> 2 files changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> >> index f830562..24ecf53 100644
> >> --- a/arch/powerpc/kernel/prom.c
> >> +++ b/arch/powerpc/kernel/prom.c
> >> @@ -524,6 +524,7 @@ static int __init early_init_dt_scan_drconf_memory(unsigned long node)
> >> size = 0x80000000ul - base;
> >> }
> >> memblock_add(base, size);
> >> + memblock_mark_hotplug(base, size);
> >
> > One of the suggestions was to make the above conditional to radix so
> > that PowerVM doesn't get affected by this. However early_radix_enabled()
> > check isn't usable yet at this point and MMU_FTR_TYPE_RADIX will get set
> > only a bit later in early_init_devtree().
>
> We do walk the dynamic reconfiguration memory again in the numa code, see
> parse_drconf_memory() in numa.c, would it far enough along in boot to use
> early_radix_enabled() and mark the memory hotplug at this point?
parse_drconf_memory() in numa.c happens after radix page tables are setup.
Hence setting the hotplugged state from it will not help.
Regards,
Bharata.
More information about the Linuxppc-dev
mailing list