[PATCH] powerpc/opal: Fix EBUSY bug in acquiring tokens

Michael Ellerman mpe at ellerman.id.au
Mon Nov 6 21:33:13 AEDT 2017


William Kennington <wak at google.com> writes:

>> On Nov 4, 2017, at 2:14 AM, Michael Ellerman <mpe at ellerman.id.au <mailto:mpe at ellerman.id.au>> wrote:
>> 
>> "William A. Kennington III" <wak at google.com <mailto:wak at google.com>> writes:
>> 
>>> The current code checks the completion map to look for the first token
>>> that is complete. In some cases, a completion can come in but the token
>>> can still be on lease to the caller processing the completion. If this
>>> completed but unreleased token is the first token found in the bitmap by
>>> another tasks trying to acquire a token, then the __test_and_set_bit
>>> call will fail since the token will still be on lease. The acquisition
>>> will then fail with an EBUSY.
>>> 
>>> This patch reorganizes the acquisition code to look at the
>>> opal_async_token_map for an unleased token. If the token has no lease it
>>> must have no outstanding completions so we should never see an EBUSY,
>>> unless we have leased out too many tokens. Since
>>> opal_async_get_token_inrerruptible is protected by a semaphore, we will
>>> practically never see EBUSY anymore.
>>> 
>>> Signed-off-by: William A. Kennington III <wak at google.com <mailto:wak at google.com>>
>>> ---
>>> arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> I think this is superseeded by Cyrils rework (which he's finally
>> posted):
>> 
>>  http://patchwork.ozlabs.org/patch/833630/ <http://patchwork.ozlabs.org/patch/833630/>
>> 
>> If not please let us know.
>
> Yeah, I think Cyril’s rework fixes this. I wasn’t sure how long it
> would take for master to receive his changes so I figured we could use
> something in the interim to fix the locking failures. If his changes
> will be mailed into the next merge window then we should have the
> issue fixed in master. I understand that rework probably won’t make it
> into stable kernels? If not then we should probably send this along to
> stable kernel maintainers.

OK. I didn't realise the bug was sufficiently bad to need a backport
to stable.

To make a backport easier I've merged this patch first, and then Cyril's
on top of it (which essentially deletes this patch).

I assume you've tested this patch at least somewhat? :)

cheers


More information about the Linuxppc-dev mailing list