[PATCH 2/3] powerpc/smp: Add support detecting thread-groups sharing L2 cache

Wed Dec 9 20:14:33 AEDT 2020

* Gautham R Shenoy <ego at linux.vnet.ibm.com> [2020-12-08 23:12:37]:

> 
> > For L2 we have thread_group_l2_cache_map to store the tasks from the thread
> > group.  but cpu_l2_cache_map for keeping track of tasks.
> 
> > 
> > I think we should do some renaming to keep the names consistent.
> > I would say probably say move the current cpu_l2_cache_map to
> > cpu_llc_cache_map and move the new aka  thread_group_l2_cache_map as
> > cpu_l2_cache_map to be somewhat consistent.
> 
> Hmm.. cpu_llc_cache_map is still very generic. We want to have
> something that defines l2 map.
> 
> I agree that we need to keep it consistent. How about renaming
> cpu_l1_cache_map to thread_groups_l1_cache_map ?
> 
> That way thread_groups_l1_cache_map and thread_groups_l2_cache_map
> refer to the corresponding L1 and L2 siblings as discovered from
> ibm,thread-groups property.

I am fine with this.

> > > +
> > > +	for_each_possible_cpu(cpu) {
> > > +		int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L2);
> > > +
> > > +		if (err)
> > > +			return err;
> > > +	}
> > > +
> > > +	thread_group_shares_l2 = true;
> > 
> > Why do we need a separate loop. Why cant we merge this in the above loop
> > itself?
> 
> No, there are platforms where one THREAD_GROUP_SHARE_L1 exists while
> THREAD_GROUP_SHARE_L2 doesn't exist. It becomes easier if these are
> separately tracked. Also, what do we gain if we put this in the same
> loop? It will be (nr_possible_cpus * 2 * invocations of
> init_cpu_cache_map()) as opposed to 2 * (nr_possible_cpus *
> invocations of init_cpu_cache_map()). Isn't it ?
> 
Its not about the number of invocations but per-cpu thread group list
that would need not be loaded again. Currently they would probably be in the
cache-line, but get dropped to be loaded again in the next loop.
And we still can support platforms with only THREAD_GROUP_SHARE_L1 since
parse_thread_groups would have given us how many levels of thread groups are
supported on a platform.

> > 
> > > +	pr_info("Thread-groups in a core share L2-cache\n");
> > 
> > Can this be moved to a pr_debug? Does it help any regular user/admins to
> > know if thread-groups shared l2 cache. Infact it may confuse users on what
> > thread groups are and which thread groups dont share cache.
> > I would prefer some other name than thread_group_shares_l2 but dont know any
> > better alternatives and may be my choices are even worse.
> 
> Would you be ok with "L2 cache shared by threads of the small core" ?

Sounds better to me. I would still think pr_debug is better since regular
Admins/users may not make too much information from this.
> 
> > 
> > Ah this can be simplified to:
> > if (thread_group_shares_l2) {
> > 	cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu));
> > 
> > 	for_each_cpu(i, per_cpu(thread_group_l2_cache_map, cpu)) {
> > 		if (cpu_online(i))
> > 			set_cpus_related(i, cpu, cpu_l2_cache_mask);
> > 	}
> 
> Don't we want to enforce that the siblings sharing L1 be a subset of
> the siblings sharing L2 ? Or do you recommend putting in a check for
> that somewhere ?
> 
I didnt think about the case where the device-tree could show L2 to be a
subset of L1.

How about initializing thread_group_l2_cache_map itself with
cpu_l1_cache_map. It would be a simple one time operation and reduce the
overhead here every CPU online.
And it would help in your subsequent patch too. We dont want the cacheinfo
for L1 showing CPUs not present in L2.

-- 
Thanks and Regards
Srikar Dronamraju