[RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support
Balbir Singh
bsingharora at gmail.com
Thu Nov 17 11:28:12 AEDT 2016
On 16/11/16 20:01, Vladimir Davydov wrote:
> Hello,
>
> On Wed, Nov 16, 2016 at 10:44:59AM +1100, Balbir Singh wrote:
>> The lack of hotplug support makes us allocate all memory
>> upfront for per node data structures. With large number
>> of cgroups this can be an overhead. PPC64 actually limits
>> n_possible nodes to n_online to avoid some of this overhead.
>>
>> This patch adds the basic notifiers to listen to hotplug
>> events and does the allocation and free of those structures
>> per cgroup. We walk every cgroup per event, its a trade-off
>> of allocating upfront vs allocating on demand and freeing
>> on offline.
>>
>> Cc: Tejun Heo <tj at kernel.org>
>> Cc: Andrew Morton <akpm at linux-foundation.org>
>> Cc: Johannes Weiner <hannes at cmpxchg.org>
>> Cc: Michal Hocko <mhocko at kernel.org>
>> Cc: Vladimir Davydov <vdavydov.dev at gmail.com>
>>
>> Signed-off-by: Balbir Singh <bsingharora at gmail.com>
>> ---
>> mm/memcontrol.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 91dfc7c..5585fce 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -63,6 +63,7 @@
>> #include <linux/lockdep.h>
>> #include <linux/file.h>
>> #include <linux/tracehook.h>
>> +#include <linux/memory.h>
>> #include "internal.h"
>> #include <net/sock.h>
>> #include <net/ip.h>
>> @@ -1342,6 +1343,10 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
>> {
>> return 0;
>> }
>> +
>> +static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
>> +{
>> +}
>> #endif
>>
>> static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
>> @@ -4115,14 +4120,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>> {
>> struct mem_cgroup_per_node *pn;
>> int tmp = node;
>> - /*
>> - * This routine is called against possible nodes.
>> - * But it's BUG to call kmalloc() against offline node.
>> - *
>> - * TODO: this routine can waste much memory for nodes which will
>> - * never be onlined. It's better to use memory hotplug callback
>> - * function.
>> - */
>> +
>> if (!node_state(node, N_NORMAL_MEMORY))
>> tmp = -1;
>> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>> @@ -5773,6 +5771,59 @@ static int __init cgroup_memory(char *s)
>> }
>> __setup("cgroup.memory=", cgroup_memory);
>>
>> +static void memcg_node_offline(int node)
>> +{
>> + struct mem_cgroup *memcg;
>> +
>> + if (node < 0)
>> + return;
>
> Is this possible?
Yes, please see node_states_check_changes_online/offline
>
>> +
>> + for_each_mem_cgroup(memcg) {
>> + free_mem_cgroup_per_node_info(memcg, node);
>> + mem_cgroup_may_update_nodemask(memcg);
>
> If memcg->numainfo_events is 0, mem_cgroup_may_update_nodemask() won't
> update memcg->scan_nodes. Is it OK?
>
>> + }
>
> What if a memory cgroup is created or destroyed while you're walking the
> tree? Should we probably use get_online_mems() in mem_cgroup_alloc() to
> avoid that?
>
The iterator internally takes rcu_read_lock() to avoid any side-effects
of cgroups added/removed. I suspect you are also suggesting using get_online_mems()
around each call to for_each_online_node
My understanding so far is
1. invalidate_reclaim_iterators should be safe (no bad side-effects)
2. mem_cgroup_free - should be safe as well
3. mem_cgroup_alloc - needs protection
4. mem_cgroup_init - needs protection
5. mem_cgroup_remove_from_tress - should be safe
>> +}
>> +
>> +static void memcg_node_online(int node)
>> +{
>> + struct mem_cgroup *memcg;
>> +
>> + if (node < 0)
>> + return;
>> +
>> + for_each_mem_cgroup(memcg) {
>> + alloc_mem_cgroup_per_node_info(memcg, node);
>> + mem_cgroup_may_update_nodemask(memcg);
>> + }
>> +}
>> +
>> +static int memcg_memory_hotplug_callback(struct notifier_block *self,
>> + unsigned long action, void *arg)
>> +{
>> + struct memory_notify *marg = arg;
>> + int node = marg->status_change_nid;
>> +
>> + switch (action) {
>> + case MEM_GOING_OFFLINE:
>> + case MEM_CANCEL_ONLINE:
>> + memcg_node_offline(node);
>
> Judging by __offline_pages(), the MEM_GOING_OFFLINE event is emitted
> before migrating pages off the node. So, I guess freeing per-node info
> here isn't quite correct, as pages still need it to be moved from the
> node's LRU lists. Better move it to MEM_OFFLINE?
>
Good point, will redo
>> + break;
>> + case MEM_GOING_ONLINE:
>> + case MEM_CANCEL_OFFLINE:
>> + memcg_node_online(node);
>> + break;
>> + case MEM_ONLINE:
>> + case MEM_OFFLINE:
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block memcg_memory_hotplug_nb __meminitdata = {
>> + .notifier_call = memcg_memory_hotplug_callback,
>> + .priority = IPC_CALLBACK_PRI,
>
> I wonder why you chose this priority?
>
I just chose the lowest priority
>> +};
>> +
>> /*
>> * subsys_initcall() for memory controller.
>> *
>> @@ -5797,6 +5848,7 @@ static int __init mem_cgroup_init(void)
>> #endif
>>
>> hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>> + register_hotmemory_notifier(&memcg_memory_hotplug_nb);
>>
>> for_each_possible_cpu(cpu)
>> INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>
> I guess, we should modify mem_cgroup_alloc/free() in the scope of this
> patch, otherwise it doesn't make much sense IMHO. May be, it's even
> worth merging patches 1 and 2 altogether.
>
Thanks for the review, I'll revisit the organization of the patches.
Balbir Singh
More information about the Linuxppc-dev
mailing list