[PATCH] erofs: support direct IO for ondemand mode
Hongbo Li
lihongbo22 at huawei.com
Sat Jul 20 13:39:08 AEST 2024
On 2024/7/18 16:14, Gao Xiang wrote:
>
>
> On 2024/7/18 15:11, Hongbo Li wrote:
>>
>>
>> On 2024/7/18 10:40, Gao Xiang wrote:
>>> Hi Hongbo,
>>>
>>> I'd like to request Jingbo's review too.
>>>
>>> On 2024/7/18 09:05, Hongbo Li wrote:
>>>> erofs over fscache cannot handle the direct read io. When the file
>>>> is opened with O_DIRECT flag, -EINVAL will reback. We support the
>>>> DIO in erofs over fscache by bypassing the erofs page cache and
>>>> reading target data into ubuf from fscache's file directly.
>>>
>>> Could you give more hints in the commit message on the target user
>>> of fscache DIO?
>>>
>> To be honest, I haven't come across such containers using direct I/O
>> yet. I've just run fio and some other tests for the direct mode in
>> containers, and they failed during open. If a traditional container
>> start using direct I/O when it's then migrated to the erofs over
>> fscache solution (ie. Nydus), it won't run properly. This is because
>> the current on-demand mode of erofs does not support direct I/O.
>> Currently, I thought there are two approaches to solve this: 1. direct
>> I/O can fallback to buffered I/O (simpler but seems non-reasonable);
>> 2. implement the direct I/O process like this way.
>
> I think benchmark might be a clear use case, personally
> I'm fine to add direct I/O for unencoded I/Os like this.
>
>>
>> Thanks,
>> Hongbo
>>
>>> For Android use cases, direct I/O support is mainly used for loop
>>> device direct mode.
>>>
>>>>
>>>> The alignment for buffer memory, offset and size now is restricted
>>>> by erofs, since `i_blocksize` is enough for the under filesystems.
>>>>
>>>> Signed-off-by: Hongbo Li <lihongbo22 at huawei.com>
>>>> ---
>>>> fs/erofs/data.c | 3 ++
>>>> fs/erofs/fscache.c | 95
>>>> +++++++++++++++++++++++++++++++++++++++++++---
>>>> 2 files changed, 93 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 8be60797ea2f..dbfafe358de4 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -391,6 +391,9 @@ static ssize_t erofs_file_read_iter(struct kiocb
>>>> *iocb, struct iov_iter *to)
>>>> iov_iter_alignment(to)) & blksize_mask)
>>>> return -EINVAL;
>>>> + if (erofs_is_fscache_mode(inode->i_sb))
>>>> + return generic_file_read_iter(iocb, to);
>>>> +
>>>> return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
>>>> NULL, 0, NULL, 0);
>>>> }
>>>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>>>> index fda16eedafb5..f5a09b168539 100644
>>>> --- a/fs/erofs/fscache.c
>>>> +++ b/fs/erofs/fscache.c
>>>> @@ -35,6 +35,8 @@ struct erofs_fscache_io {
>>>> struct erofs_fscache_rq {
>>>> struct address_space *mapping; /* The mapping being
>>>> accessed */
>>>> + struct iov_iter *iter; /* dst buf for direct io */
>>>> + struct completion done; /* for synced direct io */
>>>> loff_t start; /* Start position */
>>>> size_t len; /* Length of the request */
>>>> size_t submitted; /* Length of submitted */
>>>> @@ -76,7 +78,11 @@ static void erofs_fscache_req_put(struct
>>>> erofs_fscache_rq *req)
>>>> {
>>>> if (!refcount_dec_and_test(&req->ref))
>>>> return;
>>>> - erofs_fscache_req_complete(req);
>>>> +
>>>> + if (req->iter)
>>>> + complete(&req->done);
>>>> + else
>>>> + erofs_fscache_req_complete(req);
>>>> kfree(req);
>>>> }
>>>> @@ -88,6 +94,7 @@ static struct erofs_fscache_rq
>>>> *erofs_fscache_req_alloc(struct address_space *ma
>>>> if (!req)
>>>> return NULL;
>>>> req->mapping = mapping;
>>>> + req->iter = NULL;
>>>> req->start = start;
>>>> req->len = len;
>>>> refcount_set(&req->ref, 1);
>>>> @@ -253,6 +260,55 @@ static int erofs_fscache_meta_read_folio(struct
>>>> file *data, struct folio *folio)
>>>> return ret;
>>>> }
>>>> +static int erofs_fscache_data_dio_read(struct erofs_fscache_rq *req)
>
> Is it possible to merge this helper into erofs_fscache_data_read_slice?
Yeah, intuitively, it seems like just adding some branches to handle
this case in erofs_fscache_data_read_slice. I think that we just need to
distinguish whether the destination buffer is from the page or the
iov_iter structure.
Thanks,
Hongbo
> Also it seems that it doesn't handle tailpacking inline files
> (with EROFS_MAP_META set) although Nydus itself doesn't generate such
> files but later I will add a new fscache backend in erofs-utils too.
>
> Thanks,
> Gao Xiang
More information about the Linux-erofs
mailing list