[PATCH V4] powerpc/prom: Export device tree physical address via proc

Grant Likely grant.likely at secretlab.ca
Fri Jul 16 04:37:33 EST 2010

On Thu, Jul 15, 2010 at 12:03 PM, Matthew McClintock <msm at freescale.com> wrote:
> On Thu, 2010-07-15 at 10:57 -0600, Grant Likely wrote:
>> On Thu, Jul 15, 2010 at 10:39 AM, Matthew McClintock <msm at freescale.com> wrote:
>> > On Thu, 2010-07-15 at 10:22 -0600, Grant Likely wrote:
>> >> > Thanks for taking a look. My first thought was to just blow away all
>> >> the
>> >> > memreserve regions and start over. But, there are reserve regions
>> >> for
>> >> > other things that I might not want to blow away. For example, on
>> >> mpc85xx
>> >> > SMP systems we have an additional reserve region for our boot page.
>> >>
>> >> What is your starting point?  Where does the device tree (and
>> >> memreserve list) come from
>> >> that you're passing to kexec?  My first impression is that if you have
>> >> to scrub the memreserve list, then the source being used to
>> >> obtain the memreserves is either faulty or unsuitable to the task.
>> >
>> > I'm pulling the device tree passed in via u-boot and passing it to
>> > kexec.
>> How?  (what mechanism?)  I hope you're not using the debugfs
>> flat-device-tree file.
> That is one way to get a good working copy. What is wrong with this
> mechanism?

It's unstable.  It is in the debugfs, so there are no guarantees that
the ABI will remain the same.  Plus it doesn't reflect any changes
that the kernel may make to the device tree.  That interface is *debug
only*.  Do not use it.

> Should we duplicate everything u-boot does in kexec to build up a flat
> device tree? Or is there another way to get a good tree?

That is one option.  U-Boot really shouldn't be modifying the tree
very much anyway (I know on some platforms U-Boot is almost creating a
tree from scratch, but that is insane and an entirely different
discussion).  /proc/device-tree always gives the kernel's current view
of the tree.  You can use dtc to extract it and write it into a dtb.

> Ideally, we
> don't make the end user manually edit a device tree.

Of course not, any device tree manipulation is the job of the kexec
tools.  None of this should be manual.  However, the data source is a
significant and important question.

>> > It is the most complete device tree and requires the least amount
>> > of fixup.
>> >
>> > I have to scrub two items, the ramdisk/initrd and the device tree
>> > because upon kexec'ing the kernel we have the ability to pass in new
>> > ramdisk/initrd and device tree. They can also live at different physical
>> > addresses for the second reboot.
>> This sounds like the model is backwards.  Rather than scrubbing items,
>> the memreserve list should be built up from a known good source.
> You can build one up yourself and it will still work out fine. Or you
> can pull one from debugfs to get yourself started. Or you can pull it
> every time.

What do you mean by "pull it every time"?

Out of curiosity, what is responsible for building up the memreserve
list?  The userspace portion, or the kernel portion of kexec?  Or is
it done by a totally separate program?

>> > The initrd addresses are already exposed, so we can update/remove/reuse
>> > that entry, we just need a way for kexec to determine the current device
>> > tree address so it can replace the correct memreserve region for the
>> > kexec'ing kernels' device tree.
>> >
>> > The whole problem comes from repeatedly kexec'ing, we need to make sure
>> > we don't keep losing blobs of memory to reserve regions (so we can't
>> > just blindly add). We also need to make sure we don't lose other
>> > memreserve regions that might be important for other things (so we can't
>> > just blow them all away).
>> Right, so you need to have a known-good list of reserve sections.
>> Trying to go the other way sounds very fragile.
> Yes. Where would we get a list of memreserve sections?

I would say the list of reserves that are not under the control of
Linux should be explicitly described in the device tree proper.  For
instance, if you have a region that firmware depends on, then have a
node for describing the firmware and a property stating the memory
regions that it depends on.  The memreserve regions can be generated
from that.

> Should we export
> the reserve sections instead of the device tree location?

It shouldn't really be something that the kernel is explicitly
exporting because it is a characteristic of the board design.  It is
something that belongs in the tree-proper.  ie. when you extract the
tree you have data telling what the region is, and why it is reserved.

> We just need a
> way to preserve what was there at boot to pass to the new kernel.

Yet there is no differentiation between the board-dictated memory
reserves and the things that U-Boot/Linux made an arbitrary decision
on.  The solution should focus not on "can I throw this one away?" but
rather "Is this one I should keep?"  :-)  A subtle difference, I know,
but it changes the way you approach the solution.


Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

More information about the Linuxppc-dev mailing list