[PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Wed Jun 19 21:22:12 AEST 2024

On Wed, Jun 19, 2024 at 11:56:44AM +0200, Vlastimil Babka wrote:
> On 6/19/24 11:51 AM, Uladzislau Rezki wrote:
> > On Tue, Jun 18, 2024 at 09:48:49AM -0700, Paul E. McKenney wrote:
> >> On Tue, Jun 18, 2024 at 11:31:00AM +0200, Uladzislau Rezki wrote:
> >> > > On 6/17/24 8:42 PM, Uladzislau Rezki wrote:
> >> > > >> +
> >> > > >> +	s = container_of(work, struct kmem_cache, async_destroy_work);
> >> > > >> +
> >> > > >> +	// XXX use the real kmem_cache_free_barrier() or similar thing here
> >> > > > It implies that we need to introduce kfree_rcu_barrier(), a new API, which i
> >> > > > wanted to avoid initially.
> >> > > 
> >> > > I wanted to avoid new API or flags for kfree_rcu() users and this would
> >> > > be achieved. The barrier is used internally so I don't consider that an
> >> > > API to avoid. How difficult is the implementation is another question,
> >> > > depending on how the current batching works. Once (if) we have sheaves
> >> > > proven to work and move kfree_rcu() fully into SLUB, the barrier might
> >> > > also look different and hopefully easier. So maybe it's not worth to
> >> > > invest too much into that barrier and just go for the potentially
> >> > > longer, but easier to implement?
> >> > > 
> >> > Right. I agree here. If the cache is not empty, OK, we just defer the
> >> > work, even we can use a big 21 seconds delay, after that we just "warn"
> >> > if it is still not empty and leave it as it is, i.e. emit a warning and
> >> > we are done.
> >> > 
> >> > Destroying the cache is not something that must happen right away. 
> >> 
> >> OK, I have to ask...
> >> 
> >> Suppose that the cache is created and destroyed by a module and
> >> init/cleanup time, respectively.  Suppose that this module is rmmod'ed
> >> then very quickly insmod'ed.
> >> 
> >> Do we need to fail the insmod if the kmem_cache has not yet been fully
> >> cleaned up?  If not, do we have two versions of the same kmem_cache in
> >> /proc during the overlap time?
> >> 
> > No fail :) If same cache is created several times, its s->refcount gets
> > increased, so, it does not create two entries in the "slabinfo". But i
> > agree that your point is good! We need to be carefully with removing and
> > simultaneous creating.
> 
> Note that this merging may be disabled or not happen due to various flags on
> the cache being incompatible with it. And I want to actually make sure it
> never happens for caches being already destroyed as that would lead to
> use-after-free (the workfn doesn't recheck the refcount in case a merge
> would happen during the grace period)
> 
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -150,9 +150,10 @@ int slab_unmergeable(struct kmem_cache *s)
>  #endif
> 
>         /*
> -        * We may have set a slab to be unmergeable during bootstrap.
> +        * We may have set a cache to be unmergeable during bootstrap.
> +        * 0 is for cache being destroyed asynchronously
>          */
> -       if (s->refcount < 0)
> +       if (s->refcount <= 0)
>                 return 1;
> 
>         return 0;
> 
OK, i see such flags, SLAB_NO_MERGE. Then i was wrong, it can create two
different slabs.

Thanks!

--
Uladzislau Rezki