[Skiboot] [RFC v2 PATCH 2/6] fast-reboot: parallel memory clearing

Tue Jul 17 14:08:24 AEST 2018

Nicholas Piggin <npiggin at gmail.com> writes:
> On Thu, 28 Jun 2018 12:54:57 +1000
> Stewart Smith <stewart at linux.ibm.com> wrote:
>
>> Arbitrarily pick 16GB as the unit of parallelism, and
>> split up clearing memory into jobs and schedule them
>> node-local to the memory (or on node 0 if we can't
>> work that out because it's the memory up to SKIBOOT_BASE)
>> 
>> This seems to cut at least ~40% time from memory zeroing on
>> fast-reboot on a 256GB Boston system.
>> 
>> Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
>
> [...]
>
>> +		while(l > MEM_REGION_CLEAR_JOB_SIZE) {
>> +			job_args[i].s = s+l - MEM_REGION_CLEAR_JOB_SIZE;
>> +			job_args[i].e = s+l;
>> +			l-=MEM_REGION_CLEAR_JOB_SIZE;
>> +			job_args[i].job_name = malloc(sizeof(char)*100);
>> +			total+=MEM_REGION_CLEAR_JOB_SIZE;
>> +			chip_id = __dt_get_chip_id(r->node);
>> +			if (chip_id == -1)
>> +				chip_id = 0;
>> +			path = dt_get_path(r->node);
>> +			snprintf(job_args[i].job_name, 100,
>> +				 "clear %s, %s 0x%"PRIx64" len: %"PRIx64" on %d",
>> +				 r->name, path,
>> +				 job_args[i].s,
>> +				 (job_args[i].e - job_args[i].s),
>> +				 chip_id);
>> +			free(path);
>> +			printf("job: %s\n", job_args[i].job_name);
>> +			jobs[i] = cpu_queue_job_on_node(chip_id,
>> +							job_args[i].job_name,
>> +							mem_region_clear_job,
>> +							&job_args[i]);
>
> Ahh, the API will return NULL if it can't be scheduled on the requested
> node it will return. So you'll want to re-run cpu_queue_job() in that
> case to ensure it runs.

Ahh yep, good point. Fixed in next revision.

-- 
Stewart Smith
OPAL Architect, IBM.