[PATCH] erofs: support direct IO for ondemand mode

Hongbo Li lihongbo22 at huawei.com
Sat Jul 20 13:39:08 AEST 2024



On 2024/7/18 16:14, Gao Xiang wrote:
> 
> 
> On 2024/7/18 15:11, Hongbo Li wrote:
>>
>>
>> On 2024/7/18 10:40, Gao Xiang wrote:
>>> Hi Hongbo,
>>>
>>> I'd like to request Jingbo's review too.
>>>
>>> On 2024/7/18 09:05, Hongbo Li wrote:
>>>> erofs over fscache cannot handle the direct read io. When the file
>>>> is opened with O_DIRECT flag, -EINVAL will reback. We support the
>>>> DIO in erofs over fscache by bypassing the erofs page cache and
>>>> reading target data into ubuf from fscache's file directly.
>>>
>>> Could you give more hints in the commit message on the target user
>>> of fscache DIO?
>>>
>> To be honest, I haven't come across such containers using direct I/O 
>> yet. I've just run fio and some other tests for the direct mode in 
>> containers, and they failed during open. If a traditional container 
>> start using direct I/O when it's then migrated to the erofs over 
>> fscache solution (ie. Nydus), it won't run properly. This is because 
>> the current on-demand mode of erofs does not support direct I/O. 
>> Currently, I thought there are two approaches to solve this: 1. direct 
>> I/O can fallback to buffered I/O (simpler but seems non-reasonable); 
>> 2. implement the direct I/O process like this way.
> 
> I think benchmark might be a clear use case, personally
> I'm fine to add direct I/O for unencoded I/Os like this.
> 
>>
>> Thanks,
>> Hongbo
>>
>>> For Android use cases, direct I/O support is mainly used for loop
>>> device direct mode.
>>>
>>>>
>>>> The alignment for buffer memory, offset and size now is restricted
>>>> by erofs, since `i_blocksize` is enough for the under filesystems.
>>>>
>>>> Signed-off-by: Hongbo Li <lihongbo22 at huawei.com>
>>>> ---
>>>>   fs/erofs/data.c    |  3 ++
>>>>   fs/erofs/fscache.c | 95 
>>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>>   2 files changed, 93 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 8be60797ea2f..dbfafe358de4 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -391,6 +391,9 @@ static ssize_t erofs_file_read_iter(struct kiocb 
>>>> *iocb, struct iov_iter *to)
>>>>                iov_iter_alignment(to)) & blksize_mask)
>>>>               return -EINVAL;
>>>> +        if (erofs_is_fscache_mode(inode->i_sb))
>>>> +            return generic_file_read_iter(iocb, to);
>>>> +
>>>>           return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
>>>>                       NULL, 0, NULL, 0);
>>>>       }
>>>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>>>> index fda16eedafb5..f5a09b168539 100644
>>>> --- a/fs/erofs/fscache.c
>>>> +++ b/fs/erofs/fscache.c
>>>> @@ -35,6 +35,8 @@ struct erofs_fscache_io {
>>>>   struct erofs_fscache_rq {
>>>>       struct address_space    *mapping;    /* The mapping being 
>>>> accessed */
>>>> +    struct iov_iter        *iter;        /* dst buf for direct io */
>>>> +    struct completion    done;        /* for synced direct io */
>>>>       loff_t            start;        /* Start position */
>>>>       size_t            len;        /* Length of the request */
>>>>       size_t            submitted;    /* Length of submitted */
>>>> @@ -76,7 +78,11 @@ static void erofs_fscache_req_put(struct 
>>>> erofs_fscache_rq *req)
>>>>   {
>>>>       if (!refcount_dec_and_test(&req->ref))
>>>>           return;
>>>> -    erofs_fscache_req_complete(req);
>>>> +
>>>> +    if (req->iter)
>>>> +        complete(&req->done);
>>>> +    else
>>>> +        erofs_fscache_req_complete(req);
>>>>       kfree(req);
>>>>   }
>>>> @@ -88,6 +94,7 @@ static struct erofs_fscache_rq 
>>>> *erofs_fscache_req_alloc(struct address_space *ma
>>>>       if (!req)
>>>>           return NULL;
>>>>       req->mapping = mapping;
>>>> +    req->iter = NULL;
>>>>       req->start = start;
>>>>       req->len = len;
>>>>       refcount_set(&req->ref, 1);
>>>> @@ -253,6 +260,55 @@ static int erofs_fscache_meta_read_folio(struct 
>>>> file *data, struct folio *folio)
>>>>       return ret;
>>>>   }
>>>> +static int erofs_fscache_data_dio_read(struct erofs_fscache_rq *req)
> 
> Is it possible to merge this helper into erofs_fscache_data_read_slice?
Yeah, intuitively, it seems like just adding some branches to handle 
this case in erofs_fscache_data_read_slice. I think that we just need to 
distinguish whether the destination buffer is from the page or the 
iov_iter structure.

Thanks,
Hongbo

> Also it seems that it doesn't handle tailpacking inline files
> (with EROFS_MAP_META set) although Nydus itself doesn't generate such
> files but later I will add a new fscache backend in erofs-utils too.
> 
> Thanks,
> Gao Xiang


More information about the Linux-erofs mailing list