[RFC/PATCH] ppc64: Add mem=X option

Michael Ellerman michael at ellerman.id.au
Thu Feb 24 15:17:58 EST 2005


Hi Olof,

Thanks for the comments, I'll send an updated patch later today or tomorrow.
I'll fix 

On Wed, 23 Feb 2005 14:40, Olof Johansson wrote:
> > + for (i = 0; i < mem->cnt; i++) {
> > +  total += mem->region[i].size;
> > +
> > +  if (total <= memory_limit)
> > +   continue;
> > +
> > +  crop = (memory_limit - (total - mem->region[i].size));
...
> > +  mem->region[i].size = crop;
> > +  mem->cnt = i + 1;
> > +  break;
>
> Also, if you just reduce the size of memory_limit instead of keep the
> rolling total, the crop calculation will be simpler. It took a bit of
> thinking to make sure it's right the way it's written now.

I've rewritten this twice already, and I agree it's a bit ugly. I hadn't 
thought of decrementing the limit. I'll fix it to avoid a 0 sized region too.

> > + /* Bolt kernel mappings for all of memory */
> > + iSeries_bolt_kernel(0, systemcfg->physicalMemorySize);
>
> Do you want to bolt it all, even if there's a mem= limitation?

I'm not sure, it seems to boot happily either way. But I think it's better not 
to bolt it all - that more closely resembles the situation we're trying to 
simulate, ie that there is no more RAM than the limit.

> > + if (unlikely(memory_limit && tce_alloc_start && tce_alloc_end)) {
>
> Do you need to check both tce_alloc_start and tce_alloc_end? Won't both
> be set if one is?

Yeah either both tce variables should be set or none.

> > +  if (base + size >= tce_alloc_start)
> > +   tce_alloc_start = base + size + 1;
> > +
> > +  create_pte_mapping(tce_alloc_start, tce_alloc_end,
> > +   mode_rw, use_largepages);
>
> Could tce_alloc_end ever be below memory_limit too?

No. tce_alloc_end is just ram_top in prom_init.c (before memory_limit) 
But I might change the prom_init.c code to make that more explicit, rather 
than the current code where we copy ram_top -> alloc_top_high -> 
tce_alloc_end.

> You might need to check tce_alloc_start and end to make sure you can use
> 16MB pages, or if you need 4K because of alignment/size constraints.
>
> Even if you don't have to, a comment related to it could be warranted.

Not sure what you mean here. If start/end aren't 16MB aligned then I should 
use 4K pages for the mapping?

> > +  for (i++; i <= max_domain; i++) {
>
> I think I would change this to a regular while(++i < max_domain) loop
> instead.

Agreed. That's ugly, I just copied the for loop that was there.

> > +   dbg("NUMA: offlining node %ld for memory_limit\n", i);
> > +   node_set_offline(i);
>
> Just because the node doesn't get memory allocated we can't set it
> offline. I.e. the cpus will still be online.

Ah crud, I thought I was just offlining the memory.

> > +  for (i = 0; i <= max_domain; i++) 
> > +   node_set_online(i);
>
> Good question, I think can go. That code was added by Matt Dobson earlier
> this year.

Well, if I'm not calling node_set_offline() above then it doesn't matter.
He added that when he added the for_each_online_node() stuff, and if there are 
gaps in the node numbering then the extra for loop does have an effect, I'm 
not game to touch it in case it's doing something subtle that I'm missing.

> > +static unsigned long __initdata memory_limit;
> > +static unsigned long __initdata tce_alloc_start;
> > +static unsigned long __initdata tce_alloc_end;
>
> A little confusing to have the same variable names here as local
> statics. Maybe rename them? Or use the global.

Benh wants to keep the prom_init <=> kernel interface clean so I'll keep them 
but rename them.

> > + while (isxdigit(*cp) &&
> > +        (value = isdigit(*cp) ? *cp - '0' : toupper(*cp) - 'A' + 10) <
> > base) { +  result = result * base + value;
> > +  cp++;
> > + }
>
> Hmm. Would you mind breaking that up? It'll be a few more lines but much
> easier to read.

Yeah it's not pretty, I just copied it from the regular strtoul but I'll clean 
it up.

> > +unsigned long prom_memparse(const char *ptr, const char **retptr)
> > +{
> > + unsigned long ret = prom_strtoul(ptr, retptr);
> > +
> > + switch (**retptr) {
> > + case 'G':
> > + case 'g':
> > +  ret <<= 10;
> > + case 'M':
> > + case 'm':
> > +  ret <<= 10;
> > + case 'K':
> > + case 'k':
> > +  ret <<= 10;
> > +  (*retptr)++;
>
> Do other architectures swallow/tolerate a b/B after the unit? Could be
> nice.

Yes and no. That's copied from the generic memparse() which everyone else 
uses. And although it doesn't have a b/B case it shouldn't choke if we do 
have one, it'll just be ignored and the next strstr() will skip it.

> > +  /* Align to 16 MB == size of large page */
> > +  RELOC(memory_limit) = ALIGN(RELOC(memory_limit), 0x1000000);
>
> Maybe a printk to say that it's been rounded up, so we don't surprise
> the user?

Good plan.

> > +  if (RELOC(memory_limit) <= RELOC(alloc_bottom)) {
> > +   prom_printf("Ignoring mem=%x <= alloc_bottom.\n",
> > +    RELOC(memory_limit));
> > +   RELOC(memory_limit) = 0;
>
> ...or should it just be bumped up to include alloc_bottom instead?

That would just mean the first allocation will fail and we'll panic, because 
alloc_bottom == memory_limit == alloc_top (see alloc_up() in prom_init.c). We 
could just get rid of the checking but I figure it's nicer to complain and 
still boot than panic.

cheers!

-- 
Michael Ellerman
OzLabs Canberra
IBM Linux Technology Centre
====================================
Phone: +61 2 6212 1183
Email: michael at ellerman.id.au
WWWeb: http://michael.ellerman.id.au



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20050224/bb50daae/attachment.pgp 


More information about the Linuxppc64-dev mailing list