[PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows
Haren Myneni
haren at linux.ibm.com
Tue Oct 17 08:15:20 AEDT 2023
On 10/16/23 1:30 PM, Nathan Lynch wrote:
> Nathan Lynch <nathanl at linux.ibm.com> writes:
>> Haren Myneni <haren at linux.ibm.com> writes:
>>>> Haren Myneni <haren at linux.ibm.com> writes:
>>>>> The hypervisor returns migration failure if all VAS windows are not
>>>>> closed. During pre-migration stage, vas_migration_handler() sets
>>>>> migration_in_progress flag and closes all windows from the list.
>>>>> The allocate VAS window routine checks the migration flag, setup
>>>>> the window and then add it to the list. So there is possibility of
>>>>> the migration handler missing the window that is still in the
>>>>> process of setup.
>>>>>
>>>>> t1: Allocate and open VAS t2: Migration event
>>>>> window
>>>>>
>>>>> lock vas_pseries_mutex
>>>>> If migration_in_progress set
>>>>> unlock vas_pseries_mutex
>>>>> return
>>>>> open window HCALL
>>>>> unlock vas_pseries_mutex
>>>>> Modify window HCALL lock vas_pseries_mutex
>>>>> setup window migration_in_progress=true
>>>>> Closes all windows from
>>>>> the list
>>>>> unlock vas_pseries_mutex
>>>>> lock vas_pseries_mutex return
>>>>> if nr_closed_windows == 0
>>>>> // No DLPAR CPU or migration
>>>>> add to the list
>>>>> unlock vas_pseries_mutex
>>>>> return
>>>>> unlock vas_pseries_mutex
>>>>> Close VAS window
>>>>> // due to DLPAR CPU or migration
>>>>> return -EBUSY
>>>>
>>>> Could the the path t1 takes simply hold the mutex for the duration of
>>>> its execution instead of dropping and reacquiring it in the middle?
>>>>
>>>> Here's the relevant code from vas_allocate_window():
>>>>
>>>> mutex_lock(&vas_pseries_mutex);
>>>> if (migration_in_progress)
>>>> rc = -EBUSY;
>>>> else
>>>> rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>>>> cop_feat_caps->win_type);
>>>> mutex_unlock(&vas_pseries_mutex);
>>>> if (rc)
>>>> goto out;
>>>>
>>>> rc = h_modify_vas_window(txwin);
>>>> if (!rc)
>>>> rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>>>> if (rc)
>>>> goto out_free;
>>>>
>>>> txwin->win_type = cop_feat_caps->win_type;
>>>> mutex_lock(&vas_pseries_mutex);
>>>> if (!caps->nr_close_wins) {
>>>> list_add(&txwin->win_list, &caps->list);
>>>> caps->nr_open_windows++;
>>>> mutex_unlock(&vas_pseries_mutex);
>>>> vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>>>> return &txwin->vas_win;
>>>> }
>>>> mutex_unlock(&vas_pseries_mutex);
>>>>
>>>> Is there something about h_modify_vas_window() or get_vas_user_win_ref()
>>>> that requires temporarily dropping the lock?
>>>>
>>>
>>> Thanks Nathan for your comments.
>>>
>>> vas_pseries_mutex protects window ID and IRQ allocation between alloc
>>> and free window HCALLs, and window list. Generally try to not using
>>> mutex in HCALLs, but we need this mutex with these HCALLs.
>>>
>>> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the
>>> mutex context, but not needed.
>>
>> Hmm. I contend that it would fix your bug in a simpler way that
>> eliminates the race instead of coping with it by adding more state and
>> complicating the locking model. With your change, readers of the
>> migration_in_progress flag check it under the mutex, but the writer
>> updates it outside of the mutex, which seems strange and unlikely to be
>> correct.
>
> Expanding on this, with your change, migration_in_progress becomes a
> boolean atomic_t flag accessed only with atomic_set() and
> atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt
> says:
>
> Non-RMW ops:
>
> The non-RMW ops are (typically) regular LOADs and STOREs and are
> canonically implemented using READ_ONCE(), WRITE_ONCE(),
> smp_load_acquire() and smp_store_release() respectively. Therefore, if
> you find yourself only using the Non-RMW operations of atomic_t, you
> do not in fact need atomic_t at all and are doing it wrong.
>
> So making migration_in_progress an atomic_t does not confer any
> advantageous properties to it that it lacks as a plain boolean.
>
> Considering also (from the same document):
>
> - non-RMW operations are unordered;
>
> - RMW operations that have no return value are unordered;
>
> I am concerned that threads executing these segments of code will not
> always observe each others' effects in the intended order:
>
> // vas_allocate_window()
>
> mutex_lock(&vas_pseries_mutex);
> if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) {
> list_add(&txwin->win_list, &caps->list);
> caps->nr_open_windows++;
> atomic_dec(&caps->nr_open_wins_progress);
> mutex_unlock(&vas_pseries_mutex);
> vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
> return &txwin->vas_win;
> }
> mutex_unlock(&vas_pseries_mutex);
> ...
> atomic_dec(&caps->nr_open_wins_progress);
> wake_up(&open_win_progress_wq);
>
> // vas_migration_handler()
>
> atomic_set(&migration_in_progress, 1);
> ...
> mutex_lock(&vas_pseries_mutex);
> rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows,
> true);
> mutex_unlock(&vas_pseries_mutex);
> /*
> * Windows are included in the list after successful
> * open. So wait for closing these in-progress open
> * windows in vas_allocate_window() which will be
> * done if the migration_in_progress is set.
> */
> rc = wait_event_interruptible(open_win_progress_wq,
> !atomic_read(&vcaps->nr_open_wins_progress));
>
> Maybe it's OK in practice for some reason? I'm finding it difficult to
> reason about :-)
>
Thanks for the review.
Should be OK in this case since holding mutex before reading. But as you
pointed, migration_in_progress flag should be just boolean. I will
repost the patch with this change.
Thanks
Haren
More information about the Linuxppc-dev
mailing list