[PATCH 04/16] erofs: introduce bufvec to store decompressed buffers

Gao Xiang hsiangkao at linux.alibaba.com
Fri Jul 15 16:36:05 AEST 2022


Hi Yue,

On Fri, Jul 15, 2022 at 02:29:30PM +0800, Yue Hu wrote:
> On Thu, 14 Jul 2022 21:20:39 +0800
> Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
> 
> > For each pcluster, the total compressed buffers are determined in
> > advance, yet the number of decompressed buffers actually vary.  Too
> > many decompressed pages can be recorded if one pcluster is highly
> > compressed or its pcluster size is large.  That takes extra memory
> > footprints compared to uncompressed filesystems, especially a lot of
> > I/O in flight on low-ended devices.
> > 
> > Therefore, similar to inplace I/O, pagevec was introduced to reuse
> > page cache to store these pointers in the time-sharing way since
> > these pages are actually unused before decompressing.
> > 
> > In order to make it more flexable, a cleaner bufvec is used to
> > replace the old pagevec stuffs so that
> > 
> >  - Decompressed offsets can be stored inline, thus it can be used
> >    for the upcoming feature like compressed data deduplication;
> > 
> >  - Towards supporting large folios for compressed inodes since
> >    our final goal is to completely avoid page->private but use
> >    folio->private only for all page cache pages.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao at linux.alibaba.com>
> > ---
> >  fs/erofs/zdata.c | 177 +++++++++++++++++++++++++++++++++++------------
> >  fs/erofs/zdata.h |  26 +++++--
> >  2 files changed, 153 insertions(+), 50 deletions(-)
> > 
> > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > index c183cd0bc42b..f52c54058f31 100644
> > --- a/fs/erofs/zdata.c
> > +++ b/fs/erofs/zdata.c
> > @@ -2,6 +2,7 @@
> >  /*
> >   * Copyright (C) 2018 HUAWEI, Inc.
> >   *             https://www.huawei.com/
> > + * Copyright (C) 2022 Alibaba Cloud
> >   */
> >  #include "zdata.h"
> >  #include "compress.h"
> > @@ -26,6 +27,82 @@ static struct z_erofs_pcluster_slab pcluster_pool[] __read_mostly = {
> >  	_PCLP(Z_EROFS_PCLUSTER_MAX_PAGES)
> >  };
> >  
> > +struct z_erofs_bvec_iter {
> > +	struct page *bvpage;
> > +	struct z_erofs_bvset *bvset;
> > +	unsigned int nr, cur;
> > +};
> > +
> > +static struct page *z_erofs_bvec_iter_end(struct z_erofs_bvec_iter *iter)
> > +{
> > +	if (iter->bvpage)
> > +		kunmap_local(iter->bvset);
> > +	return iter->bvpage;
> > +}
> > +
> > +static struct page *z_erofs_bvset_flip(struct z_erofs_bvec_iter *iter)
> > +{
> > +	unsigned long base = (unsigned long)((struct z_erofs_bvset *)0)->bvec;
> > +	/* have to access nextpage in advance, otherwise it will be unmapped */
> > +	struct page *nextpage = iter->bvset->nextpage;
> > +	struct page *oldpage;
> > +
> > +	DBG_BUGON(!nextpage);
> > +	oldpage = z_erofs_bvec_iter_end(iter);
> > +	iter->bvpage = nextpage;
> > +	iter->bvset = kmap_local_page(nextpage);
> > +	iter->nr = (PAGE_SIZE - base) / sizeof(struct z_erofs_bvec);
> > +	iter->cur = 0;
> > +	return oldpage;
> > +}
> > +
> > +static void z_erofs_bvec_iter_begin(struct z_erofs_bvec_iter *iter,
> > +				    struct z_erofs_bvset_inline *bvset,
> > +				    unsigned int bootstrap_nr,
> > +				    unsigned int cur)
> > +{
> > +	*iter = (struct z_erofs_bvec_iter) {
> > +		.nr = bootstrap_nr,
> > +		.bvset = (struct z_erofs_bvset *)bvset,
> > +	};
> > +
> > +	while (cur > iter->nr) {
> > +		cur -= iter->nr;
> > +		z_erofs_bvset_flip(iter);
> > +	}
> > +	iter->cur = cur;
> > +}
> > +
> > +static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
> > +				struct z_erofs_bvec *bvec,
> > +				struct page **candidate_bvpage)
> > +{
> > +	if (iter->cur == iter->nr) {
> > +		if (!*candidate_bvpage)
> > +			return -EAGAIN;
> > +
> > +		DBG_BUGON(iter->bvset->nextpage);
> > +		iter->bvset->nextpage = *candidate_bvpage;
> > +		z_erofs_bvset_flip(iter);
> > +
> > +		iter->bvset->nextpage = NULL;
> > +		*candidate_bvpage = NULL;
> > +	}
> > +	iter->bvset->bvec[iter->cur++] = *bvec;
> > +	return 0;
> > +}
> > +
> > +static void z_erofs_bvec_dequeue(struct z_erofs_bvec_iter *iter,
> > +				 struct z_erofs_bvec *bvec,
> > +				 struct page **old_bvpage)
> > +{
> > +	if (iter->cur == iter->nr)
> > +		*old_bvpage = z_erofs_bvset_flip(iter);
> > +	else
> > +		*old_bvpage = NULL;
> > +	*bvec = iter->bvset->bvec[iter->cur++];
> > +}
> > +
> 
> Touch a new file to include bufvec related code? call it zbvec.c/h?

Thanks for the suggestion. The new implementation is simple enough,
so I tend to directly leave it in zdata.c instead of making more
churn to add more new files and zpvec.h will be completely removed
in the following patch.

Thanks,
Gao Xiang



More information about the Linux-erofs mailing list