[PATCH] erofs: support direct IO for ondemand mode

Gao Xiang hsiangkao at linux.alibaba.com
Thu Jul 18 18:14:13 AEST 2024



On 2024/7/18 15:11, Hongbo Li wrote:
> 
> 
> On 2024/7/18 10:40, Gao Xiang wrote:
>> Hi Hongbo,
>>
>> I'd like to request Jingbo's review too.
>>
>> On 2024/7/18 09:05, Hongbo Li wrote:
>>> erofs over fscache cannot handle the direct read io. When the file
>>> is opened with O_DIRECT flag, -EINVAL will reback. We support the
>>> DIO in erofs over fscache by bypassing the erofs page cache and
>>> reading target data into ubuf from fscache's file directly.
>>
>> Could you give more hints in the commit message on the target user
>> of fscache DIO?
>>
> To be honest, I haven't come across such containers using direct I/O yet. I've just run fio and some other tests for the direct mode in containers, and they failed during open. If a traditional container start using direct I/O when it's then migrated to the erofs over fscache solution (ie. Nydus), it won't run properly. This is because the current on-demand mode of erofs does not support direct I/O. Currently, I thought there are two approaches to solve this: 1. direct I/O can fallback to buffered I/O (simpler but seems non-reasonable); 2. implement the direct I/O process like this way.

I think benchmark might be a clear use case, personally
I'm fine to add direct I/O for unencoded I/Os like this.

> 
> Thanks,
> Hongbo
> 
>> For Android use cases, direct I/O support is mainly used for loop
>> device direct mode.
>>
>>>
>>> The alignment for buffer memory, offset and size now is restricted
>>> by erofs, since `i_blocksize` is enough for the under filesystems.
>>>
>>> Signed-off-by: Hongbo Li <lihongbo22 at huawei.com>
>>> ---
>>>   fs/erofs/data.c    |  3 ++
>>>   fs/erofs/fscache.c | 95 +++++++++++++++++++++++++++++++++++++++++++---
>>>   2 files changed, 93 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>> index 8be60797ea2f..dbfafe358de4 100644
>>> --- a/fs/erofs/data.c
>>> +++ b/fs/erofs/data.c
>>> @@ -391,6 +391,9 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>                iov_iter_alignment(to)) & blksize_mask)
>>>               return -EINVAL;
>>> +        if (erofs_is_fscache_mode(inode->i_sb))
>>> +            return generic_file_read_iter(iocb, to);
>>> +
>>>           return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
>>>                       NULL, 0, NULL, 0);
>>>       }
>>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>>> index fda16eedafb5..f5a09b168539 100644
>>> --- a/fs/erofs/fscache.c
>>> +++ b/fs/erofs/fscache.c
>>> @@ -35,6 +35,8 @@ struct erofs_fscache_io {
>>>   struct erofs_fscache_rq {
>>>       struct address_space    *mapping;    /* The mapping being accessed */
>>> +    struct iov_iter        *iter;        /* dst buf for direct io */
>>> +    struct completion    done;        /* for synced direct io */
>>>       loff_t            start;        /* Start position */
>>>       size_t            len;        /* Length of the request */
>>>       size_t            submitted;    /* Length of submitted */
>>> @@ -76,7 +78,11 @@ static void erofs_fscache_req_put(struct erofs_fscache_rq *req)
>>>   {
>>>       if (!refcount_dec_and_test(&req->ref))
>>>           return;
>>> -    erofs_fscache_req_complete(req);
>>> +
>>> +    if (req->iter)
>>> +        complete(&req->done);
>>> +    else
>>> +        erofs_fscache_req_complete(req);
>>>       kfree(req);
>>>   }
>>> @@ -88,6 +94,7 @@ static struct erofs_fscache_rq *erofs_fscache_req_alloc(struct address_space *ma
>>>       if (!req)
>>>           return NULL;
>>>       req->mapping = mapping;
>>> +    req->iter = NULL;
>>>       req->start = start;
>>>       req->len = len;
>>>       refcount_set(&req->ref, 1);
>>> @@ -253,6 +260,55 @@ static int erofs_fscache_meta_read_folio(struct file *data, struct folio *folio)
>>>       return ret;
>>>   }
>>> +static int erofs_fscache_data_dio_read(struct erofs_fscache_rq *req)

Is it possible to merge this helper into erofs_fscache_data_read_slice?
Also it seems that it doesn't handle tailpacking inline files
(with EROFS_MAP_META set) although Nydus itself doesn't generate such
files but later I will add a new fscache backend in erofs-utils too.

Thanks,
Gao Xiang


More information about the Linux-erofs mailing list