[Skiboot] [PATCH] fast-reboot: parallel memory clearing
Stewart Smith
stewart at linux.ibm.com
Tue Apr 17 20:11:41 AEST 2018
Nicholas Piggin <nicholas.piggin at gmail.com> writes:
> On Tue, 17 Apr 2018 18:46:07 +1000
> Stewart Smith <stewart at linux.ibm.com> wrote:
>
>> A real simple hack to split things up into 16GB jobs,
>> and name them as how far we are into zeroing, so we can
>> simply print progress out as XGB cleared as we wait for
>> the jobs to finish.
>>
>> This seems to cut at least ~40% time from memory zeroing on
>> fast-reboot on a 256GB Boston system.
>
> Cool! I'm assuming this is just an initial hack, but I'll give some
> comments anyway.
Initial hack *while* distracted and trying not to accidentally 'git
reset --hard' the wrong tree. Miracle it boots at all. git-send-email is
my backup system.
>> Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
>> ---
>> core/mem_region.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 61 insertions(+), 2 deletions(-)
>>
>> diff --git a/core/mem_region.c b/core/mem_region.c
>> index 8ae49bb790fd..a06e725db3ec 100644
>> --- a/core/mem_region.c
>> +++ b/core/mem_region.c
>> @@ -1206,19 +1206,47 @@ static void mem_clear_range(uint64_t s, uint64_t e)
>> return;
>> }
>>
>> - prlog(PR_NOTICE, "Clearing region %llx-%llx\n",
>> + prlog(PR_DEBUG, "Clearing region %llx-%llx\n",
>> (long long)s, (long long)e);
>> memset((void *)s, 0, e - s);
>> }
>>
>> +static void mem_region_clear_job(void *data)
>> +{
>> + uint64_t *arg = (uint64_t*)data;
>> + mem_clear_range(arg[0], arg[1]);
>> +}
>> +
>> +#define MEM_REGION_CLEAR_JOB_SIZE (16ULL*(1<<30))
>> +
>> void mem_region_clear_unused(void)
>> {
>> + int njobs = 0;
>> + struct cpu_job **jobs;
>> struct mem_region *r;
>> + uint64_t *job_args;
>> + uint64_t s,l;
>> + uint64_t total = 0;
>> + char **job_names;
>> + int i;
>
> Can you do a struct for all of these dynamic allocations?
Yeah, that makes a lot more sense. I should probably name the jobs
separately from what I log too.
>> lock(&mem_region_lock);
>> assert(mem_regions_finalised);
>>
>> + list_for_each(®ions, r, list) {
>> + if (!(r->type == REGION_OS))
>> + continue;
>> + njobs++;
>> + /* One job per 16GB */
>> + njobs += r->len / MEM_REGION_CLEAR_JOB_SIZE;
>
> A quick glance and I thought you had an off by one there in case of not
> 16GB length, but no, because that initial increment. So maybe you have
> one too many if it is 16GB multiple?
>
> How about
>
> njobs += (r->len + MEM_REGION_CLEAR_JOB_SIZE - 1) /
> MEM_REGION_CLEAR_JOB_SIZE;
Yeah, that looks better.
>> @@ -1226,9 +1254,40 @@ void mem_region_clear_unused(void)
>>
>> assert(r != &skiboot_heap);
>>
>> - mem_clear_range(r->start, r->start + r->len);
>> + s = r->start;
>> + l = r->len;
>> + while(l > MEM_REGION_CLEAR_JOB_SIZE) {
>> + job_args[i*2] = s+l - MEM_REGION_CLEAR_JOB_SIZE;
>> + job_args[i*2+1] = s+l;
>> + l-=MEM_REGION_CLEAR_JOB_SIZE;
>> + job_names[i] = malloc(sizeof(char)*40);
>> + total+=MEM_REGION_CLEAR_JOB_SIZE;
>> + snprintf(job_names[i], 40, "%lldGB cleared", total/(1<<30));
>> + jobs[i] = cpu_queue_job(NULL, job_names[i],
>> + mem_region_clear_job,
>> + &job_args[i*2]);
>> + i++;
>
> I'd be going the other way and incrementing s by the job size on each
> round, and assigning a length of max(job size, l - s) each iteration,
> so you should get them all into the one loop. Just my preference.
Yeah, that's nicer.
> Bonus points for scheduling memory clearing on to node local cores.
> That looks like it will take a bit of work in the job scheduler code,
> so probably best left to patch 2.
Yeah, I initially started down that route and quickly realised that it
needed multiple patches :)
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list