[PATCH] erofs: support direct IO for ondemand mode
Gao Xiang
hsiangkao at linux.alibaba.com
Thu Jul 18 18:14:13 AEST 2024
On 2024/7/18 15:11, Hongbo Li wrote:
>
>
> On 2024/7/18 10:40, Gao Xiang wrote:
>> Hi Hongbo,
>>
>> I'd like to request Jingbo's review too.
>>
>> On 2024/7/18 09:05, Hongbo Li wrote:
>>> erofs over fscache cannot handle the direct read io. When the file
>>> is opened with O_DIRECT flag, -EINVAL will reback. We support the
>>> DIO in erofs over fscache by bypassing the erofs page cache and
>>> reading target data into ubuf from fscache's file directly.
>>
>> Could you give more hints in the commit message on the target user
>> of fscache DIO?
>>
> To be honest, I haven't come across such containers using direct I/O yet. I've just run fio and some other tests for the direct mode in containers, and they failed during open. If a traditional container start using direct I/O when it's then migrated to the erofs over fscache solution (ie. Nydus), it won't run properly. This is because the current on-demand mode of erofs does not support direct I/O. Currently, I thought there are two approaches to solve this: 1. direct I/O can fallback to buffered I/O (simpler but seems non-reasonable); 2. implement the direct I/O process like this way.
I think benchmark might be a clear use case, personally
I'm fine to add direct I/O for unencoded I/Os like this.
>
> Thanks,
> Hongbo
>
>> For Android use cases, direct I/O support is mainly used for loop
>> device direct mode.
>>
>>>
>>> The alignment for buffer memory, offset and size now is restricted
>>> by erofs, since `i_blocksize` is enough for the under filesystems.
>>>
>>> Signed-off-by: Hongbo Li <lihongbo22 at huawei.com>
>>> ---
>>> fs/erofs/data.c | 3 ++
>>> fs/erofs/fscache.c | 95 +++++++++++++++++++++++++++++++++++++++++++---
>>> 2 files changed, 93 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>> index 8be60797ea2f..dbfafe358de4 100644
>>> --- a/fs/erofs/data.c
>>> +++ b/fs/erofs/data.c
>>> @@ -391,6 +391,9 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>> iov_iter_alignment(to)) & blksize_mask)
>>> return -EINVAL;
>>> + if (erofs_is_fscache_mode(inode->i_sb))
>>> + return generic_file_read_iter(iocb, to);
>>> +
>>> return iomap_dio_rw(iocb, to, &erofs_iomap_ops,
>>> NULL, 0, NULL, 0);
>>> }
>>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>>> index fda16eedafb5..f5a09b168539 100644
>>> --- a/fs/erofs/fscache.c
>>> +++ b/fs/erofs/fscache.c
>>> @@ -35,6 +35,8 @@ struct erofs_fscache_io {
>>> struct erofs_fscache_rq {
>>> struct address_space *mapping; /* The mapping being accessed */
>>> + struct iov_iter *iter; /* dst buf for direct io */
>>> + struct completion done; /* for synced direct io */
>>> loff_t start; /* Start position */
>>> size_t len; /* Length of the request */
>>> size_t submitted; /* Length of submitted */
>>> @@ -76,7 +78,11 @@ static void erofs_fscache_req_put(struct erofs_fscache_rq *req)
>>> {
>>> if (!refcount_dec_and_test(&req->ref))
>>> return;
>>> - erofs_fscache_req_complete(req);
>>> +
>>> + if (req->iter)
>>> + complete(&req->done);
>>> + else
>>> + erofs_fscache_req_complete(req);
>>> kfree(req);
>>> }
>>> @@ -88,6 +94,7 @@ static struct erofs_fscache_rq *erofs_fscache_req_alloc(struct address_space *ma
>>> if (!req)
>>> return NULL;
>>> req->mapping = mapping;
>>> + req->iter = NULL;
>>> req->start = start;
>>> req->len = len;
>>> refcount_set(&req->ref, 1);
>>> @@ -253,6 +260,55 @@ static int erofs_fscache_meta_read_folio(struct file *data, struct folio *folio)
>>> return ret;
>>> }
>>> +static int erofs_fscache_data_dio_read(struct erofs_fscache_rq *req)
Is it possible to merge this helper into erofs_fscache_data_read_slice?
Also it seems that it doesn't handle tailpacking inline files
(with EROFS_MAP_META set) although Nydus itself doesn't generate such
files but later I will add a new fscache backend in erofs-utils too.
Thanks,
Gao Xiang
More information about the Linux-erofs
mailing list