[RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks

Thu Mar 19 03:06:10 AEDT 2020

On Wed, Mar 18, 2020 at 03:42:19PM +0100, Vlastimil Babka wrote:
> This is a PowerPC platform with following NUMA topology:
> 
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35247 MB
> node 1 free: 30907 MB
> node distances:
> node   0   1
>   0:  10  40
>   1:  40  10
> 
> possible numa nodes: 0-31
> 
> A related issue was reported by Bharata [3] where a similar PowerPC
> configuration, but without patch [2] ends up allocating large amounts of pages
> by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
> node_to_mem_node() not behaving as expected, and might probably also lead
> to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch doesn't fix the issue of kmalloc caches consuming more
memory for the above mentioned topology. Also CONFIG_SLUB_CPU_PARTIAL is set
here and I have not observed infinite loop till now.

Or, are you expecting your fix to work on top of Srikar's other patchset
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u ?

With the above patchset, no fix is required to address increased memory
consumption of kmalloc caches because this patchset prevents such
topology from occuring thereby making it impossible for the problem
to surface (or at least impossible for the specific topology that I
mentioned)

> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..4d798cacdae1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> -	if (node == NUMA_NO_NODE)
> +	if (node == NUMA_NO_NODE || !node_online(node))
>  		page = alloc_pages(flags, order);
>  	else
>  		page = __alloc_pages_node(node, flags, order);
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);

We still come here with memory-less node=0 (and not NUMA_NO_NODE), fail to
find partial slab, go back and allocate a new one thereby continuosly
increasing the number of newly allocated slabs.

>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * node_match() false implies node != NUMA_NO_NODE
> +		 * but if the node is not online or has no pages, just
> +		 * ignore the constraint
> +		 */
> +		if ((!node_online(node) || !node_present_pages(node))) {
> +			node = NUMA_NO_NODE;
> +			goto redo;

Many calls for allocating slab object from memory-less node 0 in my case
don't even hit the above check because they get short circuited by
goto new_slab label which is present a few lines above.  Hence I don't see
any reduction in the amount of slab memory with this fix.

Regards,
Bharata.