[Skiboot] [PATCH] fast-reboot: parallel memory clearing

Tue Apr 17 19:30:00 AEST 2018

On Tue, 17 Apr 2018 18:46:07 +1000
Stewart Smith <stewart at linux.ibm.com> wrote:

> A real simple hack to split things up into 16GB jobs,
> and name them as how far we are into zeroing, so we can
> simply print progress out as XGB cleared as we wait for
> the jobs to finish.
> 
> This seems to cut at least ~40% time from memory zeroing on
> fast-reboot on a 256GB Boston system.

Cool! I'm assuming this is just an initial hack, but I'll give some
comments anyway.

> 
> Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
> ---
>  core/mem_region.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/core/mem_region.c b/core/mem_region.c
> index 8ae49bb790fd..a06e725db3ec 100644
> --- a/core/mem_region.c
> +++ b/core/mem_region.c
> @@ -1206,19 +1206,47 @@ static void mem_clear_range(uint64_t s, uint64_t e)
>  		return;
>  	}
>  
> -	prlog(PR_NOTICE, "Clearing region %llx-%llx\n",
> +	prlog(PR_DEBUG, "Clearing region %llx-%llx\n",
>  	      (long long)s, (long long)e);
>  	memset((void *)s, 0, e - s);
>  }
>  
> +static void mem_region_clear_job(void *data)
> +{
> +	uint64_t *arg = (uint64_t*)data;
> +	mem_clear_range(arg[0], arg[1]);
> +}
> +
> +#define MEM_REGION_CLEAR_JOB_SIZE (16ULL*(1<<30))
> +
>  void mem_region_clear_unused(void)
>  {
> +	int njobs = 0;
> +	struct cpu_job **jobs;
>  	struct mem_region *r;
> +	uint64_t *job_args;
> +	uint64_t s,l;
> +	uint64_t total = 0;
> +	char **job_names;
> +	int i;

Can you do a struct for all of these dynamic allocations?

>  
>  	lock(&mem_region_lock);
>  	assert(mem_regions_finalised);
>  
> +	list_for_each(&regions, r, list) {
> +		if (!(r->type == REGION_OS))
> +			continue;
> +		njobs++;
> +		/* One job per 16GB */
> +		njobs += r->len / MEM_REGION_CLEAR_JOB_SIZE;

A quick glance and I thought you had an off by one there in case of not
16GB length, but no, because that initial increment. So maybe you have
one too many if it is 16GB multiple?

How about

  njobs += (r->len + MEM_REGION_CLEAR_JOB_SIZE - 1) /
                   MEM_REGION_CLEAR_JOB_SIZE;

> @@ -1226,9 +1254,40 @@ void mem_region_clear_unused(void)
>  
>  		assert(r != &skiboot_heap);
>  
> -		mem_clear_range(r->start, r->start + r->len);
> +		s = r->start;
> +		l = r->len;
> +		while(l > MEM_REGION_CLEAR_JOB_SIZE) {
> +			job_args[i*2] = s+l - MEM_REGION_CLEAR_JOB_SIZE;
> +			job_args[i*2+1] = s+l;
> +			l-=MEM_REGION_CLEAR_JOB_SIZE;
> +			job_names[i] = malloc(sizeof(char)*40);
> +			total+=MEM_REGION_CLEAR_JOB_SIZE;
> +			snprintf(job_names[i], 40, "%lldGB cleared", total/(1<<30));
> +			jobs[i] = cpu_queue_job(NULL, job_names[i],
> +						mem_region_clear_job,
> +						&job_args[i*2]);
> +			i++;

I'd be going the other way and incrementing s by the job size on each
round, and assigning a length of max(job size, l - s) each iteration,
so you should get them all into the one loop. Just my preference.

Bonus points for scheduling memory clearing on to node local cores.
That looks like it will take a bit of work in the job scheduler code,
so probably best left to patch 2.

Thanks,
Nick