[PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows

Haren Myneni haren at linux.ibm.com
Tue Oct 17 08:15:20 AEDT 2023



On 10/16/23 1:30 PM, Nathan Lynch wrote:
> Nathan Lynch <nathanl at linux.ibm.com> writes:
>> Haren Myneni <haren at linux.ibm.com> writes:
>>>> Haren Myneni <haren at linux.ibm.com> writes:
>>>>> The hypervisor returns migration failure if all VAS windows are not
>>>>> closed. During pre-migration stage, vas_migration_handler() sets
>>>>> migration_in_progress flag and closes all windows from the list.
>>>>> The allocate VAS window routine checks the migration flag, setup
>>>>> the window and then add it to the list. So there is possibility of
>>>>> the migration handler missing the window that is still in the
>>>>> process of setup.
>>>>>
>>>>> t1: Allocate and open VAS	t2: Migration event
>>>>>       window
>>>>>
>>>>> lock vas_pseries_mutex
>>>>> If migration_in_progress set
>>>>>     unlock vas_pseries_mutex
>>>>>     return
>>>>> open window HCALL
>>>>> unlock vas_pseries_mutex
>>>>> Modify window HCALL		lock vas_pseries_mutex
>>>>> setup window			migration_in_progress=true
>>>>> 				Closes all windows from
>>>>> 				the list
>>>>> 				unlock vas_pseries_mutex
>>>>> lock vas_pseries_mutex		return
>>>>> if nr_closed_windows == 0
>>>>>     // No DLPAR CPU or migration
>>>>>     add to the list
>>>>>     unlock vas_pseries_mutex
>>>>>     return
>>>>> unlock vas_pseries_mutex
>>>>> Close VAS window
>>>>> // due to DLPAR CPU or migration
>>>>> return -EBUSY
>>>>
>>>> Could the the path t1 takes simply hold the mutex for the duration of
>>>> its execution instead of dropping and reacquiring it in the middle?
>>>>
>>>> Here's the relevant code from vas_allocate_window():
>>>>
>>>> 	mutex_lock(&vas_pseries_mutex);
>>>> 	if (migration_in_progress)
>>>> 		rc = -EBUSY;
>>>> 	else
>>>> 		rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>>>> 				   cop_feat_caps->win_type);
>>>> 	mutex_unlock(&vas_pseries_mutex);
>>>> 	if (rc)
>>>> 		goto out;
>>>>
>>>> 	rc = h_modify_vas_window(txwin);
>>>> 	if (!rc)
>>>> 		rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>>>> 	if (rc)
>>>> 		goto out_free;
>>>>
>>>> 	txwin->win_type = cop_feat_caps->win_type;
>>>> 	mutex_lock(&vas_pseries_mutex);
>>>> 	if (!caps->nr_close_wins) {
>>>> 		list_add(&txwin->win_list, &caps->list);
>>>> 		caps->nr_open_windows++;
>>>> 		mutex_unlock(&vas_pseries_mutex);
>>>> 		vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>>>> 		return &txwin->vas_win;
>>>> 	}
>>>> 	mutex_unlock(&vas_pseries_mutex);
>>>>
>>>> Is there something about h_modify_vas_window() or get_vas_user_win_ref()
>>>> that requires temporarily dropping the lock?
>>>>
>>>
>>> Thanks Nathan for your comments.
>>>
>>> vas_pseries_mutex protects window ID and IRQ allocation between alloc
>>> and free window HCALLs, and window list. Generally try to not using
>>> mutex in HCALLs, but we need this mutex with these HCALLs.
>>>
>>> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the
>>> mutex context, but not needed.
>>
>> Hmm. I contend that it would fix your bug in a simpler way that
>> eliminates the race instead of coping with it by adding more state and
>> complicating the locking model. With your change, readers of the
>> migration_in_progress flag check it under the mutex, but the writer
>> updates it outside of the mutex, which seems strange and unlikely to be
>> correct.
> 
> Expanding on this, with your change, migration_in_progress becomes a
> boolean atomic_t flag accessed only with atomic_set() and
> atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt
> says:
> 
>    Non-RMW ops:
> 
>    The non-RMW ops are (typically) regular LOADs and STOREs and are
>    canonically implemented using READ_ONCE(), WRITE_ONCE(),
>    smp_load_acquire() and smp_store_release() respectively. Therefore, if
>    you find yourself only using the Non-RMW operations of atomic_t, you
>    do not in fact need atomic_t at all and are doing it wrong.
> 
> So making migration_in_progress an atomic_t does not confer any
> advantageous properties to it that it lacks as a plain boolean.
> 
> Considering also (from the same document):
> 
>   - non-RMW operations are unordered;
> 
>   - RMW operations that have no return value are unordered;
> 
> I am concerned that threads executing these segments of code will not
> always observe each others' effects in the intended order:
> 
> // vas_allocate_window()
> 
>          mutex_lock(&vas_pseries_mutex);
>          if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) {
>                  list_add(&txwin->win_list, &caps->list);
>                  caps->nr_open_windows++;
>                  atomic_dec(&caps->nr_open_wins_progress);
>                  mutex_unlock(&vas_pseries_mutex);
>                  vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>                  return &txwin->vas_win;
>          }
>          mutex_unlock(&vas_pseries_mutex);
>          ...
>          atomic_dec(&caps->nr_open_wins_progress);
>          wake_up(&open_win_progress_wq);
> 
> // vas_migration_handler()
> 
>                  atomic_set(&migration_in_progress, 1);
>                  ...
>                          mutex_lock(&vas_pseries_mutex);
>                          rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows,
>                                                          true);
>                          mutex_unlock(&vas_pseries_mutex);
>                          /*
>                           * Windows are included in the list after successful
>                           * open. So wait for closing these in-progress open
>                           * windows in vas_allocate_window() which will be
>                           * done if the migration_in_progress is set.
>                           */
>                          rc = wait_event_interruptible(open_win_progress_wq,
>                                  !atomic_read(&vcaps->nr_open_wins_progress));
> 
> Maybe it's OK in practice for some reason? I'm finding it difficult to
> reason about :-)
> 

Thanks for the review.

Should be OK in this case since holding mutex before reading. But as you 
pointed, migration_in_progress flag should be just boolean. I will 
repost the patch with this change.

Thanks
Haren


More information about the Linuxppc-dev mailing list