[PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

Frank Rowand frowand.list at gmail.com
Wed Apr 19 12:31:21 AEST 2017


On 04/18/17 18:31, Michael Ellerman wrote:
> Frank Rowand <frowand.list at gmail.com> writes:
> 
>> On 04/17/17 17:32, Tyrel Datwyler wrote:
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>> easy and quick mechanism for validating the reference counting of
>>> device_nodes during their lifetime.
>>>
>>> Further, when pseries lpars are migrated to a different machine we
>>> perform a live update of our device tree to bring it into alignment with the
>>> configuration of the new machine. The of_reconfig_notify trace point
>>> provides a mechanism that can be turned for debuging the device tree
>>> modifications with out having to build a custom kernel to get at the
>>> DEBUG code introduced by commit 00aa3720.
>>
>> I do not like changing individual (or small groups of) printk() style
>> debugging information to tracepoint style.
> 
> I'm not quite sure which printks() you're referring to.
> 
> The only printks that are removed in this series are under #ifdef DEBUG,
> and so are essentially not there unless you build a custom kernel.

Yes, I am talking about pr_debug(), pr_info(), pr_err(), etc.


> 
> They also only cover the reconfig case, which is actually less
> interesting than the much more common and bug-prone get/put logic.

When I was looking at the get/put issue I used pr_debug().


>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> If you enable CONFIG_PRINTK_TIME then you should be able to just sort
> the trace and the printk output by the timestamp. If you're really
> trying to correlate the two then you should probably just be using
> trace_printk().

Except the existing debug code that uses pr_debug() does not use
trace_printk().

And "just sort" does not apply to multi-line output like:

cpuhp/23-147   [023] ....   128.324827:
	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8 at 10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8 at 10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8 at 10
cpuhp/23-147   [023] ....   128.324831:
	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8 at 10
   drmgr-7284  [009] ....   128.439000:
	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8 at 10
   drmgr-7284  [009] ....   128.439002:
	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8 at 10,
			    prop->name=null, old_prop->name=null
   drmgr-7284  [009] ....   128.439015:
	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8 at 10
   drmgr-7284  [009] ....   128.439016:
	of_node_release: dn->full_name=/cpus/PowerPC,POWER8 at 10, dn->_flags=4

I was kinda hoping that maybe someone had already created a tool to deal
with this issue.  But not too optimistic.


> But IMO this level of detail, tracing every get/put, does not belong in
> printk. Trace points are absolutely the right solution for this type of
> debugging.
> 
> cheers
> .
> 



More information about the Linuxppc-dev mailing list