devicetree: Musings on reserved regions

Tue Feb 8 23:24:52 EST 2011

On Mon, Feb 07, 2011 at 01:59:53PM -0700, Grant Likely wrote:
> As part of the process of bringing dt support up on non-powerpc
> platforms, I've been thinking about the usage mode of the reserved
> regions section in the flattened device tree structure.  First, for
> reference, here is the description of reserved regions from ePAPR
> 1.0:
> 
> > 7.3 Memory Reservation Block
> > 7.3.1 Purpose
> > The memory reservation block provides the client program with
> > a list of areas in physical memory which are reserved; that
> > is, which shall not be used for general memory allocations. It
> > is used to protect vital data structures from being
> > overwritten by the client program. For example, on some
> > systems with an IOMMU, the TCE (translation control entry)
> > tables initialized by an ePAPR boot program would need to be
> > protected in this manner. Likewise, any boot program code or
> > data used during the client program’s runtime would need to be
> > reserved (e.g., RTAS on Open Firmware platforms). The ePAPR
> > does not require the boot program to provide any such runtime
> > components, but it does not prohibit implementations from
> > doing so as an extension.
> >
> > More specifically, a client program shall not access memory in
> > a reserved region unless other information provided by the
> > boot program explicitly indicates that it shall do so. The
> > client program may then access the indicated section of the
> > reserved memory in the indicated manner. Methods by which the
> > boot program can indicate to the client program specific uses
> > for reserved memory may appear in this document, in optional
> > extensions to it, or in platform-specific documentation.
> >
> > The reserved regions supplied by a boot program may, but are
> > not required to, encompass the device tree blob itself. The
> > client program shall ensure that it does not overwrite this
> > data structure before it is used, whether or not it is in the
> > reserved areas.
> 
> The problem that I'm having, is that the lifecycle for reserved
> regions is not defined and the client program has no mechanism to
> determine what each reserved region is intended for.  This issue has
> been raised before in the context of kexec from one Linux instance to
> another[1].  When going from one kernel to the next, kexec needs to
> populate a valid set of reserved regions, but it isn't easy to figure
> out what the new set should be because there isn't a reliable way to
> figure out which regions from the previous boot need to be preserved
> (ie. runtime firmware or framebuffers) vs regions that are obsolete
> (original initrd and dtb regions).

Hrm, I agree that the fact that the reserved sections meaning and
lifetime may not be known is awkward, but I'm not yet convinced that
we need something like this.

Here's how I envisaged this working under the current spec, which the
text is meant to convey (but maybe needs work):

So, you start with the reserved regions from the special block.  It
wants to be here, rather than built into the tree, because you need to
parse this before doing *any* memory allocation, and doing tree
parsing and tracking the regions that early, without memory allocation
is painful.

Absent other information, those regions must remain untouched
*forever*.  The client program must not read or right them, and it
must pass those regions onto subsequent things to run - that could be
kexec(), or the real OS if this client is a second stage loader.

However other parts of the spec may desribe things giving a specific
use of part of the reserved area, for example, the initrd-start/end
properties and the spin-table entries.

The binding that describes some use of reserved memory may also bound
that reservation's lifetime.  So, once you've processed the initrd
properties, you may excise the initrd area from the reserved sections.
There is no guarantee the initrd reservation maps to exactly one entry
in the reserve map, so yes, this may mean some fairly involved extent
intersection type code to do.

Likewise the spin-table means that once you have brought a secondary
cpu out of spin with that method, you may reclaim the 16 (?) bytes of
its spin table entry (and no more).

Now, that's pretty ad-hoc, but the point is that you *must* understand
the use of a section of reserved memory before de-reserving it anyway,
and there is no guarantee that a reserved area can *ever* be freed.

Furthermore, I don't think we should require the boot program to bind
each reserve entry to exactly one purpose.  It seems to be a obvious
scheme for a boot program would be to lay out all the things it needs
to sequentially in RAM, and put in just one reserve entry for the
whole block.  For one thing, that means it will never need to resize /
memmove() the dt blob even if what things it needs to reserve vary.

Remember that all else being equal client program complexity is
preferable to boot program complexity.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson