[PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows

Nathan Lynch nathanl at linux.ibm.com
Tue Oct 17 07:30:15 AEDT 2023

Nathan Lynch <nathanl at linux.ibm.com> writes:
> Haren Myneni <haren at linux.ibm.com> writes:
>>> Haren Myneni <haren at linux.ibm.com> writes:
>>>> The hypervisor returns migration failure if all VAS windows are not
>>>> closed. During pre-migration stage, vas_migration_handler() sets
>>>> migration_in_progress flag and closes all windows from the list.
>>>> The allocate VAS window routine checks the migration flag, setup
>>>> the window and then add it to the list. So there is possibility of
>>>> the migration handler missing the window that is still in the
>>>> process of setup.
>>>> t1: Allocate and open VAS	t2: Migration event
>>>>      window
>>>> lock vas_pseries_mutex
>>>> If migration_in_progress set
>>>>    unlock vas_pseries_mutex
>>>>    return
>>>> open window HCALL
>>>> unlock vas_pseries_mutex
>>>> Modify window HCALL		lock vas_pseries_mutex
>>>> setup window			migration_in_progress=true
>>>> 				Closes all windows from
>>>> 				the list
>>>> 				unlock vas_pseries_mutex
>>>> lock vas_pseries_mutex		return
>>>> if nr_closed_windows == 0
>>>>    // No DLPAR CPU or migration
>>>>    add to the list
>>>>    unlock vas_pseries_mutex
>>>>    return
>>>> unlock vas_pseries_mutex
>>>> Close VAS window
>>>> // due to DLPAR CPU or migration
>>>> return -EBUSY
>>> Could the the path t1 takes simply hold the mutex for the duration of
>>> its execution instead of dropping and reacquiring it in the middle?
>>> Here's the relevant code from vas_allocate_window():
>>> 	mutex_lock(&vas_pseries_mutex);
>>> 	if (migration_in_progress)
>>> 		rc = -EBUSY;
>>> 	else
>>> 		rc = allocate_setup_window(txwin, (u64 *)&domain[0],
>>> 				   cop_feat_caps->win_type);
>>> 	mutex_unlock(&vas_pseries_mutex);
>>> 	if (rc)
>>> 		goto out;
>>> 	rc = h_modify_vas_window(txwin);
>>> 	if (!rc)
>>> 		rc = get_vas_user_win_ref(&txwin->vas_win.task_ref);
>>> 	if (rc)
>>> 		goto out_free;
>>> 	txwin->win_type = cop_feat_caps->win_type;
>>> 	mutex_lock(&vas_pseries_mutex);
>>> 	if (!caps->nr_close_wins) {
>>> 		list_add(&txwin->win_list, &caps->list);
>>> 		caps->nr_open_windows++;
>>> 		mutex_unlock(&vas_pseries_mutex);
>>> 		vas_user_win_add_mm_context(&txwin->vas_win.task_ref);
>>> 		return &txwin->vas_win;
>>> 	}
>>> 	mutex_unlock(&vas_pseries_mutex);
>>> Is there something about h_modify_vas_window() or get_vas_user_win_ref()
>>> that requires temporarily dropping the lock?
>> Thanks Nathan for your comments.
>> vas_pseries_mutex protects window ID and IRQ allocation between alloc 
>> and free window HCALLs, and window list. Generally try to not using 
>> mutex in HCALLs, but we need this mutex with these HCALLs.
>> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the 
>> mutex context, but not needed.
> Hmm. I contend that it would fix your bug in a simpler way that
> eliminates the race instead of coping with it by adding more state and
> complicating the locking model. With your change, readers of the
> migration_in_progress flag check it under the mutex, but the writer
> updates it outside of the mutex, which seems strange and unlikely to be
> correct.

Expanding on this, with your change, migration_in_progress becomes a
boolean atomic_t flag accessed only with atomic_set() and
atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt

  Non-RMW ops:

  The non-RMW ops are (typically) regular LOADs and STOREs and are
  canonically implemented using READ_ONCE(), WRITE_ONCE(),
  smp_load_acquire() and smp_store_release() respectively. Therefore, if
  you find yourself only using the Non-RMW operations of atomic_t, you
  do not in fact need atomic_t at all and are doing it wrong.

So making migration_in_progress an atomic_t does not confer any
advantageous properties to it that it lacks as a plain boolean.

Considering also (from the same document):

 - non-RMW operations are unordered;

 - RMW operations that have no return value are unordered;

I am concerned that threads executing these segments of code will not
always observe each others' effects in the intended order:

// vas_allocate_window()

        if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) {
                list_add(&txwin->win_list, &caps->list);
                return &txwin->vas_win;

// vas_migration_handler()

                atomic_set(&migration_in_progress, 1);
                        rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows,
                         * Windows are included in the list after successful
                         * open. So wait for closing these in-progress open
                         * windows in vas_allocate_window() which will be
                         * done if the migration_in_progress is set.
                        rc = wait_event_interruptible(open_win_progress_wq,

Maybe it's OK in practice for some reason? I'm finding it difficult to
reason about :-)

More information about the Linuxppc-dev mailing list