OF_DYNAMIC node lifecycle

Grant Likely grant.likely at linaro.org
Thu Jun 26 06:24:46 EST 2014


On Tue, 24 Jun 2014 15:10:55 -0500, Nathan Fontenot <nfont at austin.ibm.com> wrote:
> On 06/23/2014 09:48 AM, Grant Likely wrote:
> > On Thu, 19 Jun 2014 10:26:15 -0500, Nathan Fontenot <nfont at austin.ibm.com> wrote:
> >> On 06/18/2014 03:07 PM, Grant Likely wrote:
> >>> Hi Nathan and Tyrel,
> >>>
> >>> I'm looking into lifecycle issues on nodes modified by OF_DYNAMIC, and
> >>> I'm hoping you can help me. Right now, pseries seems to be the only
> >>> user of OF_DYNAMIC, but making OF_DYNAMIC work has a huge impact on
> >>> the entire kernel because it requires all DT code to manage reference
> >>> counting with iterating over nodes. Most users simply get it wrong.
> >>> Pantelis did some investigation and found that the reference counts on
> >>> a running kernel are all over the place. I have my doubts that any
> >>> code really gets it right.
> >>>
> >>> The problem is that users need to know when it is appropriate to call
> >>> of_node_get()/of_node_put(). All list traversals that exit early need
> >>> an extra call to of_node_put(), and code that is searching for a node
> >>> in the tree and holding a reference to it needs to call of_node_get().
> >>>
> >>> I've got a few pseries questions:
> >>> - What are the changes being requested by pseries firmware? Is it only
> >>> CPUs and memory nodes, or does it manipulate things all over the tree?
> >>
> >> The short answer, everything.
> > 
> > :-)
> > 
> >> For pseries the two big actions that can change the device tree are
> >> adding/removing resources and partition migration.
> >>
> >> The most frequent updates to the device tree happen during resource
> >> (cpu, memory, and pci/phb) add and remove. During this process we add
> >> and remove the node and its properties from the device tree.
> >> - For memory on newer systems this just involves updating the
> >>   ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory property. Older
> >>   firmware levels add and remove the memroy at XXX nodes and their properties.
> >> - For cpus the cpus/PowerPC,POWERXXXX nodes and its properties are added
> >>   or removed
> >> - For pci/phb the pci at XXXXX nodes and properties are added/removed.
> >>
> >> The less frequent operation of live partition migration (and suspend/resume)
> >> can update just about anything in the device tree. When this occurs and the
> >> systems starts after being migrated (or waking up after a suspend) we make
> >> a call to firmware to get updates to the device tree for the new hardware
> >> we are running on.
> >>  
> >>> - How frequent are the changes? How many changes would be likely over
> >>> the runtime of the system?
> >>
> >> This can happen frequently.
> > 
> > Thanks, that is exactly the information that I want. I'm not so much
> > concerned with the addition or removal of nodes/properties, which is
> > actually pretty easy to handle. It is the lifecycle of allocations on
> > dynamic nodes that causes heartburn.
> > 
> >>> - Are you able to verify that removed nodes are actually able to be
> >>> freed correctly? Do you have any testcases for node removal?
> >>
> >> I have always tested this by doing resource add/remove, usually cpu and memory
> >> since it is the easiest.
> > 
> > Is that just testing the functionality, or do you have tests that check
> > if the memory gets freed?
> 
> In general it's just functionality testing.
> 
> > 
> >>> I'm thinking very seriously about changing the locking semantics of DT
> >>> code entirely so that most users never have to worry about
> >>> of_node_get/put at all. If the DT code is switched to use rcu
> >>> primitives for tree iteration (which also means making DT code use
> >>> list_head, something I'm already investigating), then instead of
> >>> trying to figure out of_node_get/put rules, callers could use
> >>> rcu_read_lock()/rcu_read_unlock() to protect the region that is
> >>> searching over nodes, and only call of_node_get() if the node pointer
> >>> is needed outside the rcu read-side lock.
> >>>
> >>
> >> This sounds good. I like just taking the rcu lock around accessing the DT.
> >> Do we have many places where DT node pointers are held that require
> >> keeping the of_node_get/put calls? If this did exist perhaps we could
> >> update those places to look up the DT node every time instead of
> >> holding on to the pointer. We could just get rid of the reference counting
> >> altogether then.
> > 
> > There are a few, but I would be happy to restrict reference counting to
> > only those locations. Most places will decode the DT data, and then
> > throw away the reference. We /might/ even be able to do rcu_lock/unlock
> > around the entire probe path which would make it transparent to all
> > device drivers.
> > 
> >>> I'd really like to be rid of the node reference counting entirely, but
> >>> I can't figure out a way of doing that safely, so I'd settle for
> >>> making it a lot easier to get correct.
> >>>
> >>
> >> heh! I have often thought about adding reference counting to device tree
> >> properties.
> > 
> > You horrible, horrible man.
> 
> Yes. I are evil :)
> 
> After looking again the work needed to add reference counts to properties
> would be huge. The few properties I am concerned with are specific to powerpc
> so perhaps just adding an arch specific lock around updating those
> properties would work.

Which code/properties? I'd like to have a look myself.

g.



More information about the Linuxppc-dev mailing list