[PATCH 3/3] powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration

Gautham R Shenoy ego.lkml at gmail.com
Fri Jun 14 15:56:59 AEST 2019


On Wed, Jun 12, 2019 at 10:17 AM Nathan Lynch <nathanl at linux.ibm.com> wrote:
>
> It's common for the platform to replace the cache device nodes after a
> migration. Since the cacheinfo code is never informed about this, it
> never drops its references to the source system's cache nodes, causing
> it to wind up in an inconsistent state resulting in warnings and oopses
> as soon as CPU online/offline occurs after the migration, e.g.
>
> cache for /cpus/l3-cache at 3113(Unified) refers to cache for /cpus/l2-cache at 200d(Unified)
> WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
> [...]
> NIP [c00000000002d9bc] release_cache+0x1bc/0x1d0
> LR [c00000000002d9b8] release_cache+0x1b8/0x1d0
> Call Trace:
> [c0000001fc99fa70] [c00000000002d9b8] release_cache+0x1b8/0x1d0 (unreliable)
> [c0000001fc99fb10] [c00000000002ebf4] cacheinfo_cpu_offline+0x1c4/0x2c0
> [c0000001fc99fbe0] [c00000000002ae58] unregister_cpu_online+0x1b8/0x260
> [c0000001fc99fc40] [c000000000165a64] cpuhp_invoke_callback+0x114/0xf40
> [c0000001fc99fcd0] [c000000000167450] cpuhp_thread_fun+0x270/0x310
> [c0000001fc99fd40] [c0000000001a8bb8] smpboot_thread_fn+0x2c8/0x390
> [c0000001fc99fdb0] [c0000000001a1cd8] kthread+0x1b8/0x1c0
> [c0000001fc99fe20] [c00000000000c2d4] ret_from_kernel_thread+0x5c/0x68
>
> Using device tree notifiers won't work since we want to rebuild the
> hierarchy only after all the removals and additions have occurred and
> the device tree is in a consistent state. Call cacheinfo_teardown()
> before processing device tree updates, and rebuild the hierarchy
> afterward.
>
> Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
> Signed-off-by: Nathan Lynch <nathanl at linux.ibm.com>

Reviewed-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>

> ---
>  arch/powerpc/platforms/pseries/mobility.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index edc1ec408589..b8c8096907d4 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -23,6 +23,7 @@
>  #include <asm/machdep.h>
>  #include <asm/rtas.h>
>  #include "pseries.h"
> +#include "../../kernel/cacheinfo.h"
>
>  static struct kobject *mobility_kobj;
>
> @@ -345,11 +346,20 @@ void post_mobility_fixup(void)
>          */
>         cpus_read_lock();
>
> +       /*
> +        * It's common for the destination firmware to replace cache
> +        * nodes.  Release all of the cacheinfo hierarchy's references
> +        * before updating the device tree.
> +        */
> +       cacheinfo_teardown();
> +
>         rc = pseries_devicetree_update(MIGRATION_SCOPE);
>         if (rc)
>                 printk(KERN_ERR "Post-mobility device tree update "
>                         "failed: %d\n", rc);
>
> +       cacheinfo_rebuild();
> +
>         cpus_read_unlock();
>
>         /* Possibly switch to a new RFI flush type */
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.


More information about the Linuxppc-dev mailing list