[PATCH v4] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
Aneesh Kumar K.V
aneesh.kumar at kernel.org
Mon Jan 1 17:16:17 AEDT 2024
Haren Myneni <haren at linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
> On the large machine configuration which allows more simultaneous
> open/close windows (Ex: 240 cores provides 4800 VAS credits), the
> user can observe hung task traces in dmesg due to mutex contention
> around open/close HCAlls.
>
> So instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again.
>
> Signed-off-by: Haren Myneni <haren at linux.ibm.com>
> Suggested-by: Nathan Lynch <nathanl at linux.ibm.com>
>
> ---
> v1 -> v2:
> - Use usleep_range instead of using RTAS sleep routine as
> suggested by Nathan
> v2 -> v3:
> - Sleep 10MSecs even for HCALL delay > 10MSecs and the other
> commit / comemnt changes as suggested by Nathan and Ellerman.
> v4 -> v3:
> - More description in the commit log with the visible impact for
> the current code as suggested by Aneesh
> ---
> arch/powerpc/platforms/pseries/vas.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..5cf81c564d4b 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -38,7 +38,30 @@ static long hcall_return_busy_check(long rc)
> {
> /* Check if we are stalled for some time */
> if (H_IS_LONG_BUSY(rc)) {
> - msleep(get_longbusy_msecs(rc));
> + unsigned int ms;
> + /*
> + * Allocate, Modify and Deallocate HCALLs returns
> + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> + * for the long delay. So the sleep time should always
> + * be either 1 or 10msecs, but in case if the HCALL
> + * returns the long delay > 10 msecs, clamp the sleep
> + * time to 10msecs.
> + */
> + ms = clamp(get_longbusy_msecs(rc), 1, 10);
> +
> + /*
> + * msleep() will often sleep at least 20 msecs even
> + * though the hypervisor suggests that the OS reissue
> + * HCALLs after 1 or 10msecs. Also the delay hint from
> + * the HCALL is just a suggestion. So OK to pause for
> + * less time than the hinted delay. Use usleep_range()
> + * to ensure we don't sleep much longer than actually
> + * needed.
> + *
> + * See Documentation/timers/timers-howto.rst for
> + * explanation of the range used here.
> + */
> + usleep_range(ms * 100, ms * 1000);
>
Is there more details on this range? (ms *100, ms * 1000)
can we use USEC_PER_MSEC instead of 1000.
> rc = H_BUSY;
> } else if (rc == H_BUSY) {
> cond_resched();
It would be good to convert this to a helper and switch rtas_busy_delay
to use this new helper. One question though is w.r.t the clamp values.
Does that need to be specific to each hcall? Can we make it generic?
rtas_busy_delay() expliclity check for 20msec. Any reason to do that?
timers-howto.rst suggest > 10msec to use msleep.
if (ms <= 20)
usleep_range(ms * 100, ms * 1000);
else
msleep(ms);
More information about the Linuxppc-dev
mailing list