[PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows
Nathan Lynch
nathanl at linux.ibm.com
Tue Oct 17 07:30:15 AEDT 2023
Nathan Lynch <nathanl at linux.ibm.com> writes:
> Haren Myneni <haren at linux.ibm.com> writes:
>>> Haren Myneni <haren at linux.ibm.com> writes:
>>>> The hypervisor returns migration failure if all VAS windows are not
>>>> closed. During pre-migration stage, vas_migration_handler() sets
>>>> migration_in_progress flag and closes all windows from the list.
>>>> The allocate VAS window routine checks the migration flag, setup
>>>> the window and then add it to the list. So there is possibility of
>>>> the migration handler missing the window that is still in the
>>>> process of setup.
>>>>
>>>> t1: Allocate and open VAS t2: Migration event
>>>> window
>>>>
>>>> lock vas_pseries_mutex
>>>> If migration_in_progress set
>>>> unlock vas_pseries_mutex
>>>> return
>>>> open window HCALL
>>>> unlock vas_pseries_mutex
>>>> Modify window HCALL lock vas_pseries_mutex
>>>> setup window migration_in_progress=true
>>>> Closes all windows from
>>>> the list
>>>> unlock vas_pseries_mutex
>>>> lock vas_pseries_mutex return
>>>> if nr_closed_windows == 0
>>>> // No DLPAR CPU or migration
>>>> add to the list
>>>> unlock vas_pseries_mutex
>>>> return
>>>> unlock vas_pseries_mutex
>>>> Close VAS window
>>>> // due to DLPAR CPU or migration
>>>> return -EBUSY
>>>
>>> Could the the path t1 takes simply hold the mutex for the duration of
>>> its execution instead of dropping and reacquiring it in the middle?
>>>
>>> Here's the relevant code from vas_allocate_window():
>>>
>>> mutex_lock(&vas_pseries_mutex);
>>> if (migration_in_progress)
>>> rc = -EBUSY;
>>> else
>>> rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>>> cop_feat_caps->win_type);
>>> mutex_unlock(&vas_pseries_mutex);
>>> if (rc)
>>> goto out;
>>>
>>> rc = h_modify_vas_window(txwin);
>>> if (!rc)
>>> rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>>> if (rc)
>>> goto out_free;
>>>
>>> txwin->win_type = cop_feat_caps->win_type;
>>> mutex_lock(&vas_pseries_mutex);
>>> if (!caps->nr_close_wins) {
>>> list_add(&txwin->win_list, &caps->list);
>>> caps->nr_open_windows++;
>>> mutex_unlock(&vas_pseries_mutex);
>>> vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>>> return &txwin->vas_win;
>>> }
>>> mutex_unlock(&vas_pseries_mutex);
>>>
>>> Is there something about h_modify_vas_window() or get_vas_user_win_ref()
>>> that requires temporarily dropping the lock?
>>>
>>
>> Thanks Nathan for your comments.
>>
>> vas_pseries_mutex protects window ID and IRQ allocation between alloc
>> and free window HCALLs, and window list. Generally try to not using
>> mutex in HCALLs, but we need this mutex with these HCALLs.
>>
>> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the
>> mutex context, but not needed.
>
> Hmm. I contend that it would fix your bug in a simpler way that
> eliminates the race instead of coping with it by adding more state and
> complicating the locking model. With your change, readers of the
> migration_in_progress flag check it under the mutex, but the writer
> updates it outside of the mutex, which seems strange and unlikely to be
> correct.
Expanding on this, with your change, migration_in_progress becomes a
boolean atomic_t flag accessed only with atomic_set() and
atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt
says:
Non-RMW ops:
The non-RMW ops are (typically) regular LOADs and STOREs and are
canonically implemented using READ_ONCE(), WRITE_ONCE(),
smp_load_acquire() and smp_store_release() respectively. Therefore, if
you find yourself only using the Non-RMW operations of atomic_t, you
do not in fact need atomic_t at all and are doing it wrong.
So making migration_in_progress an atomic_t does not confer any
advantageous properties to it that it lacks as a plain boolean.
Considering also (from the same document):
- non-RMW operations are unordered;
- RMW operations that have no return value are unordered;
I am concerned that threads executing these segments of code will not
always observe each others' effects in the intended order:
// vas_allocate_window()
mutex_lock(&vas_pseries_mutex);
if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) {
list_add(&txwin->win_list, &caps->list);
caps->nr_open_windows++;
atomic_dec(&caps->nr_open_wins_progress);
mutex_unlock(&vas_pseries_mutex);
vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
return &txwin->vas_win;
}
mutex_unlock(&vas_pseries_mutex);
...
atomic_dec(&caps->nr_open_wins_progress);
wake_up(&open_win_progress_wq);
// vas_migration_handler()
atomic_set(&migration_in_progress, 1);
...
mutex_lock(&vas_pseries_mutex);
rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows,
true);
mutex_unlock(&vas_pseries_mutex);
/*
* Windows are included in the list after successful
* open. So wait for closing these in-progress open
* windows in vas_allocate_window() which will be
* done if the migration_in_progress is set.
*/
rc = wait_event_interruptible(open_win_progress_wq,
!atomic_read(&vcaps->nr_open_wins_progress));
Maybe it's OK in practice for some reason? I'm finding it difficult to
reason about :-)
More information about the Linuxppc-dev
mailing list