[PATCH v2 1/3] powerpc/rtas: use device model APIs and serialization during LPM

Tyrel Datwyler tyreld at linux.ibm.com
Tue Aug 6 09:05:53 AEST 2019


On 8/2/19 12:29 PM, Nathan Lynch wrote:
> The LPAR migration implementation and userspace-initiated cpu hotplug
> can interleave their executions like so:
> 
> 1. Set cpu 7 offline via sysfs.
> 
> 2. Begin a partition migration, whose implementation requires the OS
>    to ensure all present cpus are online; cpu 7 is onlined:
> 
>      rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
> 
>    This sets cpu 7 online in all respects except for the cpu's
>    corresponding struct device; dev->offline remains true.
> 
> 3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
>    already online and returns success. The driver core (device_online)
>    sets dev->offline = false.
> 
> 4. The migration completes and restores cpu 7 to offline state:
> 
>      rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
> 
> This leaves cpu7 in a state where the driver core considers the cpu
> device online, but in all other respects it is offline and
> unused. Attempts to online the cpu via sysfs appear to succeed but the
> driver core actually does not pass the request to the lower-level
> cpuhp support code. This makes the cpu unusable until the cpu device
> is manually set offline and then online again via sysfs.
> 
> Instead of directly calling cpu_up/cpu_down, the migration code should
> use the higher-level device core APIs to maintain consistent state and
> serialize operations.
> 
> Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
> Signed-off-by: Nathan Lynch <nathanl at linux.ibm.com>
> ---
>  arch/powerpc/kernel/rtas.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 5faf0a64c92b..05824eb4323b 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -871,15 +871,17 @@ static int rtas_cpu_state_change_mask(enum rtas_cpu_state state,
>  		return 0;
> 
>  	for_each_cpu(cpu, cpus) {
> +		struct device *dev = get_cpu_device(cpu);
> +
>  		switch (state) {
>  		case DOWN:
> -			cpuret = cpu_down(cpu);
> +			cpuret = device_offline(dev);
>  			break;
>  		case UP:
> -			cpuret = cpu_up(cpu);
> +			cpuret = device_online(dev);

Not that I have a problem with using the core device api, but as an FYI we had
discussed in the past introducing one shot functions in kernel/cpu.c for doing
our take down, bring up where cpu_update_maps() can be held for the whole
process. The thought was maybe it would be useful generically being able to
online/offline a bulk subset.

>  			break;
>  		}
> -		if (cpuret) {
> +		if (cpuret < 0) {
>  			pr_debug("%s: cpu_%s for cpu#%d returned %d.\n",
>  					__func__,
>  					((state == UP) ? "up" : "down"),
> @@ -968,6 +970,8 @@ int rtas_ibm_suspend_me(u64 handle)
>  	data.token = rtas_token("ibm,suspend-me");
>  	data.complete = &done;
> 
> +	lock_device_hotplug();
> +

Does taking the device hotplug lock suffice to prevent races with sysfs attempts
to hotplug (on/off) cpus? And if so we can strip out the code that checks the
mask to see if we raced, correct?

-Tyrel

>  	/* All present CPUs must be online */
>  	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
>  	cpuret = rtas_online_cpus_mask(offline_mask);
> @@ -1006,6 +1010,7 @@ int rtas_ibm_suspend_me(u64 handle)
>  				__func__);
> 
>  out:
> +	unlock_device_hotplug();
>  	free_cpumask_var(offline_mask);
>  	return atomic_read(&data.error);
>  }
> 



More information about the Linuxppc-dev mailing list