Bug in reclaim logic with exhausted nodes?

Christoph Lameter cl at linux.com
Wed Mar 26 05:25:30 EST 2014


On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:

> On power, very early, we find the 16G pages (gpages in the powerpc arch
> code) in the device-tree:
>
> early_setup ->
> 	early_init_mmu ->
> 		htab_initialize ->
> 			htab_init_page_sizes ->
> 				htab_dt_scan_hugepage_blocks ->
> 					memblock_reserve
> 						which marks the memory
> 						as reserved
> 					add_gpage
> 						which saves the address
> 						off so future calls for
> 						alloc_bootmem_huge_page()
>
> hugetlb_init ->
> 		hugetlb_init_hstates ->
> 			hugetlb_hstate_alloc_pages ->
> 				alloc_bootmem_huge_page
>
> > Not sure if I understand that correctly.
>
> Basically this is present memory that is "reserved" for the 16GB usage
> per the LPAR configuration. We honor that configuration in Linux based
> upon the contents of the device-tree. It just so happens in the
> configuration from my original e-mail that a consequence of this is that
> a NUMA node has memory (topologically), but none of that memory is free,
> nor will it ever be free.

Well dont do that

> Perhaps, in this case, we could just remove that node from the N_MEMORY
> mask? Memory allocations will never succeed from the node, and we can
> never free these 16GB pages. It is really not any different than a
> memoryless node *except* when you are using the 16GB pages.

That looks to be the correct way to handle things. Maybe mark the node as
offline or somehow not present so that the kernel ignores it.



More information about the Linuxppc-dev mailing list