[PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid reuse

Mon May 20 22:54:31 AEST 2024

On 2024/5/20 20:42, Baokun Li wrote:
> On 2024/5/20 18:04, Jeff Layton wrote:
>> On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:
>>> Hi Jeff,
>>>
>>> Thank you very much for your review!
>>>
>>> On 2024/5/19 19:11, Jeff Layton wrote:
>>>> On Wed, 2024-05-15 at 20:51 +0800, libaokun at huaweicloud.com wrote:
>>>>> From: Baokun Li <libaokun1 at huawei.com>
>>>>>
>>>>> Reusing the msg_id after a maliciously completed reopen request may cause
>>>>> a read request to remain unprocessed and result in a hung, as shown below:
>>>>>
>>>>>          t1       |      t2       |      t3
>>>>> -------------------------------------------------
>>>>> cachefiles_ondemand_select_req
>>>>>    cachefiles_ondemand_object_is_close(A)
>>>>>    cachefiles_ondemand_set_object_reopening(A)
>>>>>    queue_work(fscache_object_wq, &info->work)
>>>>>                   ondemand_object_worker
>>>>>                    cachefiles_ondemand_init_object(A)
>>>>>                     cachefiles_ondemand_send_req(OPEN)
>>>>>                       // get msg_id 6
>>>>>                       wait_for_completion(&req_A->done)
>>>>> cachefiles_ondemand_daemon_read
>>>>>    // read msg_id 6 req_A
>>>>>    cachefiles_ondemand_get_fd
>>>>>    copy_to_user
>>>>>                                   // Malicious completion msg_id 6
>>>>>                                   copen 6,-1
>>>>>                                   cachefiles_ondemand_copen
>>>>>                                    complete(&req_A->done)
>>>>>                                    // will not set the object to close
>>>>>                                    // because ondemand_id && fd is valid.
>>>>>
>>>>>                   // ondemand_object_worker() is done
>>>>>                   // but the object is still reopening.
>>>>>
>>>>>                                   // new open req_B
>>>>>                                   cachefiles_ondemand_init_object(B)
>>>>>                                    cachefiles_ondemand_send_req(OPEN)
>>>>>                                    // reuse msg_id 6
>>>>> process_open_req
>>>>>    copen 6,A.size
>>>>>    // The expected failed copen was executed successfully
>>>>>
>>>>> Expect copen to fail, and when it does, it closes fd, which sets the
>>>>> object to close, and then close triggers reopen again. However, due to
>>>>> msg_id reuse resulting in a successful copen, the anonymous fd is not
>>>>> closed until the daemon exits. Therefore read requests waiting for reopen
>>>>> to complete may trigger hung task.
>>>>>
>>>>> To avoid this issue, allocate the msg_id cyclically to avoid reusing the
>>>>> msg_id for a very short duration of time.
>>>>>
>>>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
>>>>> Signed-off-by: Baokun Li <libaokun1 at huawei.com>
>>>>> ---
>>>>>    fs/cachefiles/internal.h |  1 +
>>>>>    fs/cachefiles/ondemand.c | 20 ++++++++++++++++----
>>>>>    2 files changed, 17 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>>>>> index 8ecd296cc1c4..9200c00f3e98 100644
>>>>> --- a/fs/cachefiles/internal.h
>>>>> +++ b/fs/cachefiles/internal.h
>>>>> @@ -128,6 +128,7 @@ struct cachefiles_cache {
>>>>>        unsigned long            req_id_next;
>>>>>        struct xarray            ondemand_ids;    /* xarray for ondemand_id allocation */
>>>>>        u32                ondemand_id_next;
>>>>> +    u32                msg_id_next;
>>>>>    };
>>>>>    static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
>>>>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>>>>> index f6440b3e7368..b10952f77472 100644
>>>>> --- a/fs/cachefiles/ondemand.c
>>>>> +++ b/fs/cachefiles/ondemand.c
>>>>> @@ -433,20 +433,32 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
>>>>>            smp_mb();
>>>>>            if (opcode == CACHEFILES_OP_CLOSE &&
>>>>> -            !cachefiles_ondemand_object_is_open(object)) {
>>>>> +            !cachefiles_ondemand_object_is_open(object)) {
>>>>>                WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
>>>>>                xas_unlock(&xas);
>>>>>                ret = -EIO;
>>>>>                goto out;
>>>>>            }
>>>>> -        xas.xa_index = 0;
>>>>> +        /*
>>>>> +         * Cyclically find a free xas to avoid msg_id reuse that would
>>>>> +         * cause the daemon to successfully copen a stale msg_id.
>>>>> +         */
>>>>> +        xas.xa_index = cache->msg_id_next;
>>>>>            xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
>>>>> +        if (xas.xa_node == XAS_RESTART) {
>>>>> +            xas.xa_index = 0;
>>>>> +            xas_find_marked(&xas, cache->msg_id_next - 1, XA_FREE_MARK);
>>>>> +        }
>>>>>            if (xas.xa_node == XAS_RESTART)
>>>>>                xas_set_err(&xas, -EBUSY);
>>>>> +
>>>>>            xas_store(&xas, req);
>>>>> -        xas_clear_mark(&xas, XA_FREE_MARK);
>>>>> -        xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>> +        if (xas_valid(&xas)) {
>>>>> +            cache->msg_id_next = xas.xa_index + 1;
>>>> If you have a long-standing stuck request, could this counter wrap
>>>> around and you still end up with reuse?
>>> Yes, msg_id_next is declared to be of type u32 in the hope that when
>>> xa_index == UINT_MAX, a wrap around occurs so that msg_id_next
>>> goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid
>>> the xarry being too deep.
>>>
>>> If msg_id_next is equal to the id of a long-standing stuck request
>>> after the wrap-around, it is true that the reuse in the above problem
>>> may also occur.
>>>
>>> But I feel that a long stuck request is problematic in itself, it means
>>> that after we have sent 4294967295 requests, the first one has not
>>> been processed yet, and even if we send a million requests per
>>> second, this one hasn't been completed for more than an hour.
>>>
>>> We have a keep-alive process that pulls the daemon back up as
>>> soon as it exits, and there is a timeout mechanism for requests in
>>> the daemon to prevent the kernel from waiting for long periods
>>> of time. In other words, we should avoid the situation where
>>> a request is stuck for a long period of time.
>>>
>>> If you think UINT_MAX is not enough, perhaps we could raise
>>> the maximum value of msg_id_next to ULONG_MAX?
>>>> Maybe this should be using
>>>> ida_alloc/free instead, which would prevent that too?
>>>>
>>> The id reuse here is that the kernel has finished the open request
>>> req_A and freed its id_A and used it again when sending the open
>>> request req_B, but the daemon is still working on req_A, so the
>>> copen id_A succeeds but operates on req_B.
>>>
>>> The id that is being used by the kernel will not be allocated here
>>> so it seems that ida _alloc/free does not prevent reuse either,
>>> could you elaborate a bit more how this works?
>>>
>> ida_alloc and free absolutely prevent reuse while the id is in use.
>> That's sort of the point of those functions. Basically it uses a set of
>> bitmaps in an xarray to track which IDs are in use, so ida_alloc only
>> hands out values which are not in use. See the comments over
>> ida_alloc_range() in lib/idr.c.
>>
> Thank you for the explanation!
> 
> The logic now provides the same guarantees as ida_alloc/free.
> The "reused" id, indeed, is no longer in use in the kernel, but it is still
> in use in the userland, so a multi-threaded daemon could be handling
> two different requests for the same msg_id at the same time.
> 
> Previously, the logic for allocating msg_ids was to start at 0 and look
> for a free xas.index, so it was possible for an id to be allocated to a
> new request just as the id was being freed.
> 
> With the change to cyclic allocation, the kernel will not use the same
> id again until INT_MAX requests have been sent, and during the time
> it takes to send requests, the daemon has enough time to process
> requests whose ids are still in use by the daemon, but have already
> been freed in the kernel.

Again, If I understand correctly, I think the main point
here is

wait_for_completion(&req_A->done)

which could hang due to some malicious deamon.  But I think it
should be switched to wait_for_completion_killable() instead.
It's up to users to kill the mount instance if there is a
malicious user daemon.

So in that case, hung task will not be triggered anymore, and
you don't need to care about cyclic allocation too.

Thanks,
Gao Xiang

> 
> Regards,
> Baokun
>>>>> +            xas_clear_mark(&xas, XA_FREE_MARK);
>>>>> +            xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>> +        }
>>>>>            xas_unlock(&xas);
>>>>>        } while (xas_nomem(&xas, GFP_KERNEL));
>>>>>