[PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Fri Jun 14 01:12:05 AEST 2024

On Thu, Jun 13, 2024 at 04:11:52PM +0200, Jason A. Donenfeld wrote:
> On Thu, Jun 13, 2024 at 05:46:11AM -0700, Paul E. McKenney wrote:
> > How about a kmem_cache_destroy_rcu() that marks that specified cache
> > for destruction, and then a kmem_cache_destroy_barrier() that waits?
> > 
> > I took the liberty of adding your name to the Google document [1] and
> > adding this section:
> 
> Cool, though no need to make me yellow!

No worries, Jakub is also colored yellow.  People added tomorrow
will be cyan if I follow my usual change-color ordering.  ;-)

> > > But then, if that mechanism generally works, we don't really need a new
> > > function and we can just go with the first option of making
> > > kmem_cache_destroy() asynchronously wait. It'll wait, as you described,
> > > but then we adjust the tail of every kfree_rcu batch freeing cycle to
> > > check if there are _still_ any old outstanding kmem_cache_destroy()
> > > requests. If so, then we can splat and keep the old debugging info we
> > > currently have for finding memleaks.
> > 
> > The mechanism can always be sabotaged by memory-leak bugs on the part
> > of the user of the kmem_cache structure in play, right?
> > 
> > OK, but I see your point.  I added this to the existing
> > "kmem_cache_destroy() Lingers for kfree_rcu()" section:
> > 
> > 	One way of preserving this debugging information is to splat if
> > 	all of the slab’s memory has not been freed within a reasonable
> > 	timeframe, perhaps the same 21 seconds that causes an RCU CPU
> > 	stall warning.
> > 
> > Does that capture it?
> 
> Not quite what I was thinking. Your 21 seconds as a time-based thing I
> guess could be fine. But I was mostly thinking:
> 
> 1) kmem_cache_destroy() is called, but there are outstanding objects, so
>    it defers.
> 
> 2) Sometime later, a kfree_rcu_work batch freeing operation runs.

Or not, if there has been a leak and there happens to be no outstanding
kfree_rcu() memory.

> 3) At the end of this batch freeing, the kernel notices that the
>    kmem_cache whose destruction was previously deferred still has
>    outstanding objects and has not been destroyed. It can conclude that
>    there's thus been a memory leak.

And the batch freeing can be replicated across CPUs, so it would be
necessary to determine which was last to do this effective.  Don't get
me wrong, this can be done, but the performance/latency tradeoffs can
be interesting.

> In other words, instead of having to do this based on timers, you can
> just have the batch freeing code ask, "did those pending kmem_cache
> destructions get completed as a result of this last operation?"

I agree that kfree_rcu_work-batch time is a good time to evaluate slab
(and I have added this to the document), but I do not believe that it
can completely replace timeouts.

							Thanx, Paul