[PATCH] x86: OLPC: speed up device tree creation during boot

Grant Likely grant.likely at secretlab.ca
Wed Oct 27 21:39:24 EST 2010


On Fri, Oct 22, 2010 at 05:22:47PM -0700, Andres Salomon wrote:
> 
> Calling alloc_bootmem() for tiny chunks of memory over and over is really
> slow; on an XO-1, it caused the time between when the kernel started
> booting and when the display came alive (post-lxfb probe) to increase
> to 44s.  This patch optimizes the prom_early_alloc function by
> calling alloc_bootmem for 4k-sized blocks of memory, and handing out
> chunks of that to callers.  With this hack, the time between kernel load
> and display initialization decreased to 23s.  If there's a better way to
> do this early in the boot process, please let me know.
> 
> (Note: increasing the chunk size to 16k didn't noticably affect boot time,
> and wasted 9k.)
> 
> Signed-off-by: Andres Salomon <dilinger at queued.net>
> ---
>  arch/x86/kernel/olpc_dt.c |   27 +++++++++++++++++++++++----
>  1 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/olpc_dt.c b/arch/x86/kernel/olpc_dt.c
> index f660a11..44dd2ae 100644
> --- a/arch/x86/kernel/olpc_dt.c
> +++ b/arch/x86/kernel/olpc_dt.c
> @@ -123,16 +123,35 @@ static int __init olpc_dt_pkg2path(phandle node, char *buf,
>  }
>  
>  static unsigned int prom_early_allocated __initdata;
> +#define DT_CHUNK_SIZE (1<<12)

PAGE_SIZE perhaps?

>  
>  void * __init prom_early_alloc(unsigned long size)
>  {
> +	static u8 *mem = NULL;
> +	static size_t free_mem = 0;
>  	void *res;
>  
> -	res = alloc_bootmem(size);
> -	if (res)
> -		memset(res, 0, size);
> +	if (free_mem >= size) {
> +		/* allocate from the local cache */
> +		free_mem -= size;
> +		res = mem;
> +		mem += size;
> +		return res;
> +	}
>  
> -	prom_early_allocated += size;
> +	/*
> +	 * To mimimize the number of allocations, grab 4k of memory (that's
> +	 * an arbitrary choice that matches PAGE_SIZE on the platforms we care
> +	 * about, and minimizes wasted bootmem) and hand off chunks of it to
> +	 * callers.
> +	 */
> +	res = alloc_bootmem(DT_CHUNK_SIZE);
> +	if (res) {
> +		prom_early_allocated += DT_CHUNK_SIZE;
> +		memset(res, 0, DT_CHUNK_SIZE);
> +		free_mem = DT_CHUNK_SIZE - size;
> +		mem = res + size;
> +	}

These two hunks should be flipped around so that only one chunk does
the allocation from the pool.  As so:

	/*
	 * To mimimize the number of allocations, grab 4k of memory (that's
	 * an arbitrary choice that matches PAGE_SIZE on the platforms we care
	 * about, and minimizes wasted bootmem) and hand off chunks of it to
	 * callers.
	 */
	if (free_mem < size) {
		free_mem = max(DT_CHUNK_SIZE, size);
		mem = alloc_bootmem(free_mem);
		if (!mem) {
			free_mem = 0;
			return NULL;
		}
		memset(mem, 0, free_mem);
		prom_early_allocated += free_mem;
	}

	res = mem;
	free_mem -= size;
	mem += size;
	return res;

g.


More information about the devicetree-discuss mailing list