[PATCH] powerpc: check crash_base for relocatable kernel
Michael Ellerman
michael at ellerman.id.au
Thu Jan 8 14:35:42 EST 2009
On Wed, 2009-01-07 at 08:57 -0600, Milton Miller wrote:
> [removed Paul from cc and fixed Mohan's email]
>
> On Jan 6, 2009, at 5:44 PM, Michael Ellerman wrote:
>
> > On Fri, 2009-01-02 at 14:46 -0600, Milton Miller wrote:
> >> @@ -94,10 +95,35 @@ void __init reserve_crashkernel(void)
> >> KDUMP_KERNELBASE);
> >>
> >> crashk_res.start = KDUMP_KERNELBASE;
> >> +#else
> >> + if (!crashk_res.start) {
> >> + /*
> >> + * unspecified address, choose a region of specified size
> >> + * can overlap with initrd (ignoring corruption when retained)
> >> + * ppc64 requires kernel and some stacks to be in first segemnt
> >> + */
> >> + crashk_res.start = KDUMP_KERNELBASE;
> >> + }
> >> +
> >> + crash_base = PAGE_ALIGN(crashk_res.start);
> >> + if (crash_base != crashk_res.start) {
> >> + printk("Crash kernel base must be aligned to 0x%lx\n",
> >> + PAGE_SIZE);
> >> + crashk_res.start = crash_base;
> >> + }
> >> +
> >> #endif
> >> crash_size = PAGE_ALIGN(crash_size);
> >> crashk_res.end = crashk_res.start + crash_size - 1;
> >>
> >> + /* The crash region must not overlap the current kernel */
> >> + if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
> >> + printk(KERN_WARNING
> >> + "Crash kernel can not overlap current kernel\n");
> >> + crashk_res.start = crashk_res.end = 0;
> >> + return;
> >> + }
> >
> > I think we can be smarter here. Why don't we adjust the crash kernel
> > region so that it doesn't overlap the first kernel? ie. move it up a
> > bit.
>
> How much? In addition to the size of the kernel, we have to allocate
> (1) the emergeency stacks as we use them to bring up secondary cpus (2)
> the irq stacks in the first segment. While the second could be met
> easier on systems with 1TB slbs we don't take advantage of that yet.
Hmm, we could try and work it out though. I guess we don't know how many
CPUs we have at that point, which makes it a little trickier.
So we have the emergency stack and the hard & soft irq stacks per cpu,
which is 48KB AFAICT. So for a 256-way system that would be 12MB.
I don't think I've seen an RMO smaller than 128MB, though I notice our
RPA note specifies 64M as the minimum we'll accept. That would probably
be a bit tight.
How about something like:
min_space = _end + 16MB (16 to be safe?)
if min_space < rmo_size / 2:
min_space = rmo_size / 2
if crash_base < min_space:
crash_base = min_space
> > There's also the issue of the RMO, I'm not sure what we should do
> > there,
> > but I think the kernel needs some smarts otherwise users are going to
> > shoot themselves in the foot.
>
> I was looking at the code in kexec-tools for the rmo, and it seems
> extremely broken (ie it sets rmo_top on every memory block instead of
> the lowest; the clamp to 768M is the savior for systems with multiple
> blocks).
Oh surprise.
> Do we care about loading a kernel below a relocated kernel (between the
> interrupt vectors and the new kernel)? I ignored that for now,
> arguing that we always run the first kernel at 0.
No I don't think so.
> > We could ignore the @x setting and split the RMO between both kernels
> > somewhat intelligently.
> >
> > What might work is multiple crash regions, that way we could have some
> > space in the RMO for the second kernel (say 32MB?), but the rest
> > outside
> > - leaving some RMO for the first kernel. But I think that would require
> > some serious surgery.
> >
>
> Other archs have this, i guess because they read the memory out of
> /proc/iomem. The trick is knowing what has to be put in real space
> and what can go abvoe the rmo. Also, we have those horrible hard-code
> rmo to 768M max because some platform (one of the cell ones?) didn't
> make the device tree to show it. Maybe we can track it down and add
> linux,usable-mem-ranges to fix it up?
Dunno about the cell, but some of the early blades did have crufty
firmware.
> Does the generic code support loading into the split regions, or is it
> just for giving the kernel room to run?
I don't think so. I don't see any logic that deals with gaps in the
crashk region.
> So while all of these are nice, what do you think about merging this as
> an interm measure, especially for backporting to 2.6.28 stable (and any
> distro that wants to pick up relocatable kdump)?
I guess. I'd rather do something smarter, like I suggested above.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090108/b49b42c4/attachment.pgp>
More information about the Linuxppc-dev
mailing list