[PATCH AUTOSEL 4.19 416/671] powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
Sasha Levin
sashal at kernel.org
Fri Jan 17 04:00:54 AEDT 2020
From: Nathan Lynch <nathanl at linux.ibm.com>
[ Upstream commit e610a466d16a086e321f0bd421e2fc75cff28605 ]
It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.
cache for /cpus/l3-cache at 3113(Unified) refers to cache for /cpus/l2-cache at 200d(Unified)
WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
[...]
NIP release_cache+0x1bc/0x1d0
LR release_cache+0x1b8/0x1d0
Call Trace:
release_cache+0x1b8/0x1d0 (unreliable)
cacheinfo_cpu_offline+0x1c4/0x2c0
unregister_cpu_online+0x1b8/0x260
cpuhp_invoke_callback+0x114/0xf40
cpuhp_thread_fun+0x270/0x310
smpboot_thread_fn+0x2c8/0x390
kthread+0x1b8/0x1c0
ret_from_kernel_thread+0x5c/0x68
Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl at linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---
arch/powerpc/platforms/pseries/mobility.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e4ea71383383..70744b4fbd9e 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -24,6 +24,7 @@
#include <asm/machdep.h>
#include <asm/rtas.h>
#include "pseries.h"
+#include "../../kernel/cacheinfo.h"
static struct kobject *mobility_kobj;
@@ -360,11 +361,20 @@ void post_mobility_fixup(void)
*/
cpus_read_lock();
+ /*
+ * It's common for the destination firmware to replace cache
+ * nodes. Release all of the cacheinfo hierarchy's references
+ * before updating the device tree.
+ */
+ cacheinfo_teardown();
+
rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc)
printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc);
+ cacheinfo_rebuild();
+
cpus_read_unlock();
/* Possibly switch to a new RFI flush type */
--
2.20.1
More information about the Linuxppc-dev
mailing list