[PATCH/RFC] mm: add and use batched version of __tlb_remove_table()

Dave Hansen dave.hansen at intel.com
Sun Dec 19 12:34:13 AEDT 2021


On 12/18/21 6:31 AM, Nikita Yushchenko wrote:
>>> This allows archs to optimize it, by
>>> freeing multiple tables in a single release_pages() call. This is
>>> faster than individual put_page() calls, especially with memcg
>>> accounting enabled.
>>
>> Could we quantify "faster"?  There's a non-trivial amount of code being
>> added here and it would be nice to back it up with some cold-hard
>> numbers.
> 
> I currently don't have numbers for this patch taken alone. This patch
> originates from work done some years ago to reduce cost of memory
> accounting, and x86-only version of this patch was in virtuozzo/openvz
> kernel since then. Other patches from that work have been upstreamed,
> but this one was missed.
> 
> Still it's obvious that release_pages() shall be faster that a loop
> calling put_page() - isn't that exactly the reason why release_pages()
> exists and is different from a loop calling put_page()?

Yep, but this patch does a bunch of stuff to some really hot paths.  It
would be greatly appreciated if you could put in the effort to actually
put some numbers behind this.  Plenty of weird stuff happens on
computers that we suck at predicting.

I'd be happy with even a quick little micro.  My favorite is:

	https://github.com/antonblanchard/will-it-scale

Although, I do wonder if anything will even be measurable.  Please at
least try.

...
>> But, even more than that, do all the architectures even need the
>> free_swap_cache()?
> 
> I was under impression that process page tables are a valid target for
> swapping out. Although I can be wrong here.

It's not out of the realm of possibilities.  But, last I checked, the
only path we free page tables in was when VMAs are being torn down.  I
have a longstanding TODO item to reclaim them if they're empty (all
zeros) or to zero them out if they're mapping page cache.


More information about the Linuxppc-dev mailing list