[PATCH] powerpc/pseries: Move vas_migration_handler early during migration
Nathan Lynch
nathanl at linux.ibm.com
Sat Sep 24 10:11:06 AEST 2022
Haren Myneni <haren at linux.ibm.com> writes:
> On Thu, 2022-09-22 at 07:14 -0500, Nathan Lynch wrote:
>> Haren Myneni <haren at linux.ibm.com> writes:
>> > When the migration is initiated, the hypervisor changes VAS
>> > mappings as part of pre-migration event. Then the OS gets the
>> > migration event which closes all VAS windows before the migration
>> > starts. NX generates continuous faults until windows are closed
>> > and the user space can not differentiate these NX faults coming
>> > from the actual migration. So to reduce this time window, close
>> > VAS windows first in pseries_migrate_partition().
>>
>> I'm concerned that this is only narrowing a window of time where
>> undesirable faults occur, and that it may not be sufficient for all
>> configurations. Migrations can be in progress for minutes or hours,
>> while the time that we wait for the VASI state transition is usually
>> seconds or minutes. So I worry that this works around a problem in
>> limited cases but doesn't cover them all.
>>
>> Maybe I don't understand the problem well enough. How does user space
>> respond to the NX faults?
>
> The user space resend the request to NX whenever the request is
> returned with NX fault. So the process should be same even for faults
> caused by the pre-migration.
>
> Whereas the paste will be returned with failure when the window is
> closed (unmap the paste address) and it can be considered as NX busy.
> Up to the user space whether to send the request again after some delay
> or fall back to SW compression and send the request again later.
>
> For the migration, pre-migration event is notified to the hypervisor
> and then OS will receive the migration event (SUSPEND) - So this patch
> close windows early before VASI so that removing NX fault handling
> during the time taken for VASI state transistion.
OK, so we can consider this a quality of implementation improvement that
allows better behavior and less wasted retries for NX clients in a
migration scenario, but there's not a correctness issue, really. With
that clarified, I've confirmed that the slightly altered control flow
and error handling in pseries_migrate_partition() look correct after
your change.
Reviewed-by: Nathan Lynch <nathanl at linux.ibm.com>
More information about the Linuxppc-dev
mailing list