[RFC PATCH v0] powerpc: Fix BUG_ON during memory unplug on radix
Nicholas Piggin
npiggin at gmail.com
Wed Jun 19 20:17:01 AEST 2019
Bharata B Rao's on June 19, 2019 5:45 pm:
> We hit the following BUG_ON when memory hotplugged before reboot
> is unplugged after reboot:
>
> kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>
> remove_pagetable+0x594/0x6a0
> (unreliable)
> remove_pagetable+0x94/0x6a0
> vmemmap_free+0x394/0x410
> sparse_remove_one_section+0x26c/0x2e8
> __remove_pages+0x428/0x540
> arch_remove_memory+0xd0/0x170
> __remove_memory+0xd4/0x1a0
> dlpar_remove_lmb+0xbc/0x110
> dlpar_memory+0xa80/0xd20
> handle_dlpar_errorlog+0xa8/0x160
> pseries_hp_work_fn+0x2c/0x60
> process_one_work+0x46c/0x860
> worker_thread+0x364/0x5e0
> kthread+0x1b0/0x1c0
> ret_from_kernel_thread+0x5c/0x68
>
> This occurs because, during reboot-after-hotplug, the hotplugged
> memory range gets initialized as regular memory and page
> tables are setup using memblock allocator. This means that we
> wouldn't have initialized the PMD or PTE fragment count for
> those PMD or PTE pages.
>
> Fixing this includes 3 aspects:
>
> - Walk the init_mm page tables from mem_init() and initialize
> the PMD and PTE fragment counts appropriately.
> - When we do early allocation of PMD (and PGD as well) pages,
> allocate in page size PAGE_SIZE granularity so that we are
> sure that the complete page is available for us to set the
> fragment count which is part of struct page.
> - When PMD or PTE page is freed, check if it comes from memblock
> allocator and free it appropriately.
>
> Reported-by: Srikanth Aithal <sraithal at linux.vnet.ibm.com>
> Signed-off-by: Bharata B Rao <bharata at linux.ibm.com>
> ---
> arch/powerpc/include/asm/book3s/64/radix.h | 1 +
> arch/powerpc/include/asm/sparsemem.h | 1 +
> arch/powerpc/mm/book3s64/pgtable.c | 12 +++-
> arch/powerpc/mm/book3s64/radix_pgtable.c | 67 +++++++++++++++++++++-
> arch/powerpc/mm/mem.c | 5 ++
> arch/powerpc/mm/pgtable-frag.c | 5 +-
> 6 files changed, 87 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index 574eca33f893..4320f2790e8d 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -285,6 +285,7 @@ static inline unsigned long radix__get_tree_size(void)
> #ifdef CONFIG_MEMORY_HOTPLUG
> int radix__create_section_mapping(unsigned long start, unsigned long end, int nid);
> int radix__remove_section_mapping(unsigned long start, unsigned long end);
> +void radix__fixup_pgtable_fragments(void);
> #endif /* CONFIG_MEMORY_HOTPLUG */
> #endif /* __ASSEMBLY__ */
> #endif
> diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
> index 3192d454a733..e662f9232d35 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -15,6 +15,7 @@
> #ifdef CONFIG_MEMORY_HOTPLUG
> extern int create_section_mapping(unsigned long start, unsigned long end, int nid);
> extern int remove_section_mapping(unsigned long start, unsigned long end);
> +void fixup_pgtable_fragments(void);
>
> #ifdef CONFIG_PPC_BOOK3S_64
> extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 01bc9663360d..7efe9cc16b39 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -186,6 +186,13 @@ int __meminit remove_section_mapping(unsigned long start, unsigned long end)
>
> return hash__remove_section_mapping(start, end);
> }
> +
> +void fixup_pgtable_fragments(void)
> +{
> + if (radix_enabled())
> + radix__fixup_pgtable_fragments();
> +}
> +
> #endif /* CONFIG_MEMORY_HOTPLUG */
>
> void __init mmu_partition_table_init(void)
> @@ -320,7 +327,10 @@ void pmd_fragment_free(unsigned long *pmd)
> BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
> if (atomic_dec_and_test(&page->pt_frag_refcount)) {
> pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (PageReserved(page))
> + free_reserved_page(page);
Hmm. Rather than adding this special case here, I wonder if you can
just go along in your fixup walk and convert all these pages to
non-reserved pages?
ClearPageReserved ; init_page_count ; adjust_managed_page_count ;
should do the trick, right?
> + else
> + __free_page(page);
Thanks,
Nick
More information about the Linuxppc-dev
mailing list