[PATCH] x86: OLPC: speed up device tree creation during boot

Andres Salomon dilinger at queued.net
Thu Oct 28 04:50:52 EST 2010


On Wed, 27 Oct 2010 11:39:24 +0100
Grant Likely <grant.likely at secretlab.ca> wrote:

> On Fri, Oct 22, 2010 at 05:22:47PM -0700, Andres Salomon wrote:
> > 
> > Calling alloc_bootmem() for tiny chunks of memory over and over is
> > really slow; on an XO-1, it caused the time between when the kernel
> > started booting and when the display came alive (post-lxfb probe)
> > to increase to 44s.  This patch optimizes the prom_early_alloc
> > function by calling alloc_bootmem for 4k-sized blocks of memory,
> > and handing out chunks of that to callers.  With this hack, the
> > time between kernel load and display initialization decreased to
> > 23s.  If there's a better way to do this early in the boot process,
> > please let me know.
> > 
> > (Note: increasing the chunk size to 16k didn't noticably affect
> > boot time, and wasted 9k.)
> > 
> > Signed-off-by: Andres Salomon <dilinger at queued.net>
> > ---
> >  arch/x86/kernel/olpc_dt.c |   27 +++++++++++++++++++++++----
> >  1 files changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/olpc_dt.c b/arch/x86/kernel/olpc_dt.c
> > index f660a11..44dd2ae 100644
> > --- a/arch/x86/kernel/olpc_dt.c
> > +++ b/arch/x86/kernel/olpc_dt.c
> > @@ -123,16 +123,35 @@ static int __init olpc_dt_pkg2path(phandle
> > node, char *buf, }
> >  
> >  static unsigned int prom_early_allocated __initdata;
> > +#define DT_CHUNK_SIZE (1<<12)
> 
> PAGE_SIZE perhaps?
> 

I'd rather not imply that it's anything but completely arbitrary..



> >  
> >  void * __init prom_early_alloc(unsigned long size)
> >  {
> > +	static u8 *mem = NULL;
> > +	static size_t free_mem = 0;
> >  	void *res;
> >  
> > -	res = alloc_bootmem(size);
> > -	if (res)
> > -		memset(res, 0, size);
> > +	if (free_mem >= size) {
> > +		/* allocate from the local cache */
> > +		free_mem -= size;
> > +		res = mem;
> > +		mem += size;
> > +		return res;
> > +	}
> >  
> > -	prom_early_allocated += size;
> > +	/*
> > +	 * To mimimize the number of allocations, grab 4k of
> > memory (that's
> > +	 * an arbitrary choice that matches PAGE_SIZE on the
> > platforms we care
> > +	 * about, and minimizes wasted bootmem) and hand off
> > chunks of it to
> > +	 * callers.
> > +	 */
> > +	res = alloc_bootmem(DT_CHUNK_SIZE);
> > +	if (res) {
> > +		prom_early_allocated += DT_CHUNK_SIZE;
> > +		memset(res, 0, DT_CHUNK_SIZE);
> > +		free_mem = DT_CHUNK_SIZE - size;
> > +		mem = res + size;
> > +	}
> 
> These two hunks should be flipped around so that only one chunk does
> the allocation from the pool.  As so:
> 
> 	/*
> 	 * To mimimize the number of allocations, grab 4k of memory
> (that's
> 	 * an arbitrary choice that matches PAGE_SIZE on the
> platforms we care
> 	 * about, and minimizes wasted bootmem) and hand off chunks
> of it to
> 	 * callers.
> 	 */
> 	if (free_mem < size) {
> 		free_mem = max(DT_CHUNK_SIZE, size);
> 		mem = alloc_bootmem(free_mem);
> 		if (!mem) {
> 			free_mem = 0;
> 			return NULL;
> 		}
> 		memset(mem, 0, free_mem);
> 		prom_early_allocated += free_mem;
> 	}
> 
> 	res = mem;
> 	free_mem -= size;
> 	mem += size;
> 	return res;
> 
> g.

Makes sense, thanks.


More information about the devicetree-discuss mailing list