[PATCH] powerpc: check crash_base for relocatable kernel

Thu Jan 8 14:35:42 EST 2009

On Wed, 2009-01-07 at 08:57 -0600, Milton Miller wrote:
> [removed Paul from cc and fixed Mohan's email]
> 
> On Jan 6, 2009, at 5:44 PM, Michael Ellerman wrote:
> 
> > On Fri, 2009-01-02 at 14:46 -0600, Milton Miller wrote:
> >> @@ -94,10 +95,35 @@ void __init reserve_crashkernel(void)
> >>  				KDUMP_KERNELBASE);
> >>
> >>  	crashk_res.start = KDUMP_KERNELBASE;
> >> +#else
> >> +	if (!crashk_res.start) {
> >> +		/*
> >> +		 * unspecified address, choose a region of specified size
> >> +		 * can overlap with initrd (ignoring corruption when retained)
> >> +		 * ppc64 requires kernel and some stacks to be in first segemnt
> >> +		 */
> >> +		crashk_res.start = KDUMP_KERNELBASE;
> >> +	}
> >> +
> >> +	crash_base = PAGE_ALIGN(crashk_res.start);
> >> +	if (crash_base != crashk_res.start) {
> >> +		printk("Crash kernel base must be aligned to 0x%lx\n",
> >> +				PAGE_SIZE);
> >> +		crashk_res.start = crash_base;
> >> +	}
> >> +
> >>  #endif
> >>  	crash_size = PAGE_ALIGN(crash_size);
> >>  	crashk_res.end = crashk_res.start + crash_size - 1;
> >>
> >> +	/* The crash region must not overlap the current kernel */
> >> +	if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
> >> +		printk(KERN_WARNING
> >> +			"Crash kernel can not overlap current kernel\n");
> >> +		crashk_res.start = crashk_res.end = 0;
> >> +		return;
> >> +	}
> >
> > I think we can be smarter here. Why don't we adjust the crash kernel
> > region so that it doesn't overlap the first kernel? ie. move it up a
> > bit.
> 
> How much?   In addition to the size of the kernel, we have to allocate 
> (1) the emergeency stacks as we use them to bring up secondary cpus (2) 
> the irq stacks in the first segment.   While the second could be met 
> easier on systems with 1TB slbs we don't take advantage of that yet.

Hmm, we could try and work it out though. I guess we don't know how many
CPUs we have at that point, which makes it a little trickier.

So we have the emergency stack and the hard & soft irq stacks per cpu,
which is 48KB AFAICT. So for a 256-way system that would be 12MB.

I don't think I've seen an RMO smaller than 128MB, though I notice our
RPA note specifies 64M as the minimum we'll accept. That would probably
be a bit tight.

How about something like:

min_space = _end + 16MB		(16 to be safe?)

if min_space < rmo_size / 2:
	min_space = rmo_size / 2

if crash_base < min_space:
	crash_base = min_space

> > There's also the issue of the RMO, I'm not sure what we should do 
> > there,
> > but I think the kernel needs some smarts otherwise users are going to
> > shoot themselves in the foot.
> 
> I was looking at the code in kexec-tools for the rmo, and it seems 
> extremely broken (ie it sets rmo_top on every memory block instead of 
> the lowest; the clamp to 768M is the savior for systems with multiple 
> blocks).

Oh surprise.

> Do we care about loading a kernel below a relocated kernel (between the 
> interrupt vectors and the new kernel)?   I ignored that for now, 
> arguing that we always run the first kernel at 0.

No I don't think so.

> > We could ignore the @x setting and split the RMO between both kernels
> > somewhat intelligently.
> >
> > What might work is multiple crash regions, that way we could have some
> > space in the RMO for the second kernel (say 32MB?), but the rest 
> > outside
> > - leaving some RMO for the first kernel. But I think that would require
> > some serious surgery.
> >
> 
> Other archs have this, i guess because they read the memory out of 
> /proc/iomem.   The trick is knowing what has to be put in real space 
> and what can go abvoe the rmo.   Also, we have those horrible hard-code 
> rmo to 768M max because some platform (one of the cell ones?) didn't 
> make the device tree to show it.  Maybe we can track it down and add 
> linux,usable-mem-ranges to fix it up?

Dunno about the cell, but some of the early blades did have crufty
firmware.

> Does the generic code support loading into the split regions, or is it 
> just for giving the kernel room to run?

I don't think so. I don't see any logic that deals with gaps in the
crashk region.

> So while all of these are nice, what do you think about merging this as 
> an interm measure, especially for backporting to 2.6.28 stable (and any 
> distro that wants to pick up relocatable kdump)?

I guess. I'd rather do something smarter, like I suggested above.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090108/b49b42c4/attachment.pgp>