Feature request: erofs-utils mkfs: Efficient way to pipe only file metadata

Gao Xiang hsiangkao at linux.alibaba.com
Mon Feb 19 15:44:12 AEDT 2024


Hi Mike,

On 2024/2/19 11:37, Mike Baynton wrote:
> Hello erofs developers,
> I am integrating erofs with overlayfs in a manner similar to what
> composefs is doing. So, I am interested in making erofs images
> containing only file metadata and extended attributes, but no file
> data, as in $ mkfs.erofs --tar=i (thanks for that!)

Thanks for your interest in EROFS too.

> 
> However, I would like to construct the erofs image from a set of files
> selected dynamically by another program. This leads me to prefer
> sending an unseekable stream to mkfs.erofs so that file selection and
> image generation can run concurrently, instead of first making a
> complete tarball and then making the erofs image. In this case, it
> becomes necessary to transfer each file's worth of data through the
> stream after each header only so that the tarball reader in tar.c does
> not become desynchronized with the expected offset of the next tar
> header.

I wonder if it's possible to use a modified prototype-like [1] format
which mkfs.xfs [2] currently supports with "-p".  This prototype can
be passed with a pipe instead.

[1] http://uw714doc.sco.com/en/man/html.4/prototype.4.html
[2] https://man7.org/linux/man-pages/man8/mkfs.xfs.8.html

> 
> A very straightforward solution that seems to be working just fine for
> me is to simply introduce a new optarg for --tar that indicates the
> input data will be simply a series of tar headers / metadata without
> actual file data. This implies index mode and additionally prevents
> the skipping of inode.size worth of bytes after each header:
> 
> diff --git a/include/erofs/tar.h b/include/erofs/tar.h
> index a76f740..3d40a0f 100644
> --- a/include/erofs/tar.h
> +++ b/include/erofs/tar.h
> @@ -46,7 +46,7 @@ struct erofs_tarfile {
> 
>    int fd;
>    u64 offset;
> - bool index_mode, aufs;
> + bool index_mode, headeronly_mode, aufs;
>   };
> 
>   void erofs_iostream_close(struct erofs_iostream *ios);
> diff --git a/lib/tar.c b/lib/tar.c
> index 8204939..e916395 100644
> --- a/lib/tar.c
> +++ b/lib/tar.c
> @@ -584,7 +584,7 @@ static int tarerofs_write_file_index(struct
> erofs_inode *inode,
>    ret = tarerofs_write_chunkes(inode, data_offset);
>    if (ret)
>    return ret;
> - if (erofs_iostream_lskip(&tar->ios, inode->i_size))
> + if (!tar->headeronly_mode && erofs_iostream_lskip(&tar->ios, inode->i_size))
>    return -EIO;
>    return 0;
>   }
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 6d2b700..a72d30e 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -122,7 +122,7 @@ static void usage(void)
>          " --max-extent-bytes=#  set maximum decompressed extent size #
> in bytes\n"
>          " --preserve-mtime      keep per-file modification time strictly\n"
>          " --aufs                replace aufs special files with
> overlayfs metadata\n"
> -       " --tar=[fi]            generate an image from tarball(s)\n"
> +       " --tar=[fih]           generate an image from tarball(s) or
> tarball header data\n"
>          " --ovlfs-strip=[01]    strip overlayfs metadata in the target
> image (e.g. whiteouts)\n"
>          " --quiet               quiet execution (do not write anything
> to standard output.)\n"
>   #ifndef NDEBUG
> @@ -514,11 +514,13 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
>    cfg.c_extra_ea_name_prefixes = true;
>    break;
>    case 20:
> - if (optarg && (!strcmp(optarg, "i") ||
> - !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2))) {
> + if (optarg && (!strcmp(optarg, "i") || (!strcmp(optarg, "h") ||
> + !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2)))) {
>    erofstar.index_mode = true;
>    if (!memcmp(optarg, "0,", 2))
>    erofstar.mapfile = strdup(optarg + 2);
> + if (!strcmp(optarg, "h"))
> + erofstar.headeronly_mode = true;
>    }
>    tar_mode = true;
>    break;
> 
> Using this requires generation of tarball-ish streams that can be
> slightly difficult to cajole tar libraries into creating, but it does
> work if you do it. I can imagine much more complex alternative ways to
> do this too, such as supporting sparse tar files or supporting some
> whole new input format.

I think you could just fill zero to use the current index mode now.
But yes, it could be inefficient if some files are huge.

> 
> Would some version of this feature be interesting and useful? If so,
> is the simple way good enough? It wouldn't preclude future addition of
> things like a sparse tar reader.

Yes, I think it's useful to support a simple prototype-like format, but
it might take time on my own since there are some other ongoing stuffs
to be landed (like multi-threading mkfs support.)

Thanks,
Gao Xiang

> 
> Regards,
> Mike


More information about the Linux-erofs mailing list