Feature request: erofs-utils mkfs: Efficient way to pipe only file metadata
Gao Xiang
hsiangkao at linux.alibaba.com
Mon Feb 19 15:44:12 AEDT 2024
Hi Mike,
On 2024/2/19 11:37, Mike Baynton wrote:
> Hello erofs developers,
> I am integrating erofs with overlayfs in a manner similar to what
> composefs is doing. So, I am interested in making erofs images
> containing only file metadata and extended attributes, but no file
> data, as in $ mkfs.erofs --tar=i (thanks for that!)
Thanks for your interest in EROFS too.
>
> However, I would like to construct the erofs image from a set of files
> selected dynamically by another program. This leads me to prefer
> sending an unseekable stream to mkfs.erofs so that file selection and
> image generation can run concurrently, instead of first making a
> complete tarball and then making the erofs image. In this case, it
> becomes necessary to transfer each file's worth of data through the
> stream after each header only so that the tarball reader in tar.c does
> not become desynchronized with the expected offset of the next tar
> header.
I wonder if it's possible to use a modified prototype-like [1] format
which mkfs.xfs [2] currently supports with "-p". This prototype can
be passed with a pipe instead.
[1] http://uw714doc.sco.com/en/man/html.4/prototype.4.html
[2] https://man7.org/linux/man-pages/man8/mkfs.xfs.8.html
>
> A very straightforward solution that seems to be working just fine for
> me is to simply introduce a new optarg for --tar that indicates the
> input data will be simply a series of tar headers / metadata without
> actual file data. This implies index mode and additionally prevents
> the skipping of inode.size worth of bytes after each header:
>
> diff --git a/include/erofs/tar.h b/include/erofs/tar.h
> index a76f740..3d40a0f 100644
> --- a/include/erofs/tar.h
> +++ b/include/erofs/tar.h
> @@ -46,7 +46,7 @@ struct erofs_tarfile {
>
> int fd;
> u64 offset;
> - bool index_mode, aufs;
> + bool index_mode, headeronly_mode, aufs;
> };
>
> void erofs_iostream_close(struct erofs_iostream *ios);
> diff --git a/lib/tar.c b/lib/tar.c
> index 8204939..e916395 100644
> --- a/lib/tar.c
> +++ b/lib/tar.c
> @@ -584,7 +584,7 @@ static int tarerofs_write_file_index(struct
> erofs_inode *inode,
> ret = tarerofs_write_chunkes(inode, data_offset);
> if (ret)
> return ret;
> - if (erofs_iostream_lskip(&tar->ios, inode->i_size))
> + if (!tar->headeronly_mode && erofs_iostream_lskip(&tar->ios, inode->i_size))
> return -EIO;
> return 0;
> }
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 6d2b700..a72d30e 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -122,7 +122,7 @@ static void usage(void)
> " --max-extent-bytes=# set maximum decompressed extent size #
> in bytes\n"
> " --preserve-mtime keep per-file modification time strictly\n"
> " --aufs replace aufs special files with
> overlayfs metadata\n"
> - " --tar=[fi] generate an image from tarball(s)\n"
> + " --tar=[fih] generate an image from tarball(s) or
> tarball header data\n"
> " --ovlfs-strip=[01] strip overlayfs metadata in the target
> image (e.g. whiteouts)\n"
> " --quiet quiet execution (do not write anything
> to standard output.)\n"
> #ifndef NDEBUG
> @@ -514,11 +514,13 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
> cfg.c_extra_ea_name_prefixes = true;
> break;
> case 20:
> - if (optarg && (!strcmp(optarg, "i") ||
> - !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2))) {
> + if (optarg && (!strcmp(optarg, "i") || (!strcmp(optarg, "h") ||
> + !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2)))) {
> erofstar.index_mode = true;
> if (!memcmp(optarg, "0,", 2))
> erofstar.mapfile = strdup(optarg + 2);
> + if (!strcmp(optarg, "h"))
> + erofstar.headeronly_mode = true;
> }
> tar_mode = true;
> break;
>
> Using this requires generation of tarball-ish streams that can be
> slightly difficult to cajole tar libraries into creating, but it does
> work if you do it. I can imagine much more complex alternative ways to
> do this too, such as supporting sparse tar files or supporting some
> whole new input format.
I think you could just fill zero to use the current index mode now.
But yes, it could be inefficient if some files are huge.
>
> Would some version of this feature be interesting and useful? If so,
> is the simple way good enough? It wouldn't preclude future addition of
> things like a sparse tar reader.
Yes, I think it's useful to support a simple prototype-like format, but
it might take time on my own since there are some other ongoing stuffs
to be landed (like multi-threading mkfs support.)
Thanks,
Gao Xiang
>
> Regards,
> Mike
More information about the Linux-erofs
mailing list