phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

Michael Bringmann mwb at linux.vnet.ibm.com
Thu Aug 2 03:26:08 AEST 2018


On 08/01/2018 09:26 AM, Michael Ellerman wrote:
> Frank Rowand <frowand.list at gmail.com> writes:
>> On 07/31/18 07:17, Rob Herring wrote:
>>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <mpe at ellerman.id.au> wrote:
>>>>
>>>> Hi Rob/Frank,
>>>>
>>>> I think we might have a problem with the phandle_cache not interacting
>>>> well with of_detach_node():
>>>
>>> Probably needs a similar fix as this commit did for overlays:
>>>
>>> commit b9952b5218added5577e4a3443969bc20884cea9
>>> Author: Frank Rowand <frank.rowand at sony.com>
>>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>>
>>>     of: overlay: update phandle cache on overlay apply and remove
>>>
>>>     A comment in the review of the patch adding the phandle cache said that
>>>     the cache would have to be updated when modules are applied and removed.
>>>     This patch implements the cache updates.
>>>
>>>     Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>>> of_find_node_by_phandle()")
>>>     Reported-by: Alan Tull <atull at kernel.org>
>>>     Suggested-by: Alan Tull <atull at kernel.org>
>>>     Signed-off-by: Frank Rowand <frank.rowand at sony.com>
>>>     Signed-off-by: Rob Herring <robh at kernel.org>
>>
>> Agreed.  Sorry about missing the of_detach_node() case.
>>
>>
>>> Really what we need here is an "invalidate phandle" function rather
>>> than free and re-allocate the whole damn cache.
>>
>> The big hammer approach was chosen to avoid the race conditions that
>> would otherwise occur.  OF does not have a locking strategy that
>> would be able to protect against the races.
>>
>> We could maybe implement a slightly smaller hammer by (1) disabling
>> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
>> the cache.  That is an off the cuff thought - I would have to look
>> a little bit more carefully to make sure it would work.
>>
>> But I don't see a need to add the complexity of the smaller hammer
>> or the bigger hammer of proper locking _unless_ we start seeing that
>> the cache is being freed and re-allocated frequently.  For overlays
>> I don't expect the high frequency because it happens on a per overlay
>> removal basis (not per node removal basis).  For of_detach_node() the
>> event _is_ on a per node removal basis.  Michael, do you expect node
>> removals to be a frequent event with low latency being important?  If
>> so, a rough guess on what the frequency would be?
> 
> No in practice we expect them to be fairly rare and in the thousands at
> most I think.
> 
> TBH you could just disable the phandle_cache on the first detach and
> that would be fine by me. I don't think we've ever noticed phandle
> lookups being much of a problem for us on our hardware.

I ran a couple of migrations with the calls to

of_free_phandle_cache() ... of_populate_phandle_cache()

bracketing the body of

arch/powerpc/platforms/pseries/mobility.c:post_mobility_fixup()

with a patch like,

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e245a88..efc9442 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -22,6 +22,9 @@
 #include <asm/rtas.h>
 #include "pseries.h"

+extern int of_free_phandle_cache(void);
+extern void of_populate_phandle_cache(void);
+
 static struct kobject *mobility_kobj;

 struct update_props_workarea {
@@ -343,6 +346,8 @@ void post_mobility_fixup(void)
                rc = rtas_call(activate_fw_token, 0, 1, NULL);
        } while (rtas_busy_delay(rc));

+       of_free_phandle_cache();
+
        if (rc)
                printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);

@@ -354,6 +359,8 @@ void post_mobility_fixup(void)
        /* Possibly switch to a new RFI flush type */
        pseries_setup_rfi_flush();

+       of_populate_phandle_cache();
+
        return;
 }

and did not observe the warnings/crashes that have been
plaguing me.  We will need to move their prototypes out
of drivers/of/of_private.h to include/linux/of.h, though.

> 
> cheers
> 

Michael
 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb at linux.vnet.ibm.com



More information about the Linuxppc-dev mailing list