Feature request: erofs-utils mkfs: Efficient way to pipe only file metadata
Gao Xiang
hsiangkao at linux.alibaba.com
Mon Feb 19 18:46:38 AEDT 2024
On 2024/2/19 12:44, Gao Xiang wrote:
> Hi Mike,
>
> On 2024/2/19 11:37, Mike Baynton wrote:
>> Hello erofs developers,
>> I am integrating erofs with overlayfs in a manner similar to what
>> composefs is doing. So, I am interested in making erofs images
>> containing only file metadata and extended attributes, but no file
>> data, as in $ mkfs.erofs --tar=i (thanks for that!)
>
> Thanks for your interest in EROFS too.
>
>>
>> However, I would like to construct the erofs image from a set of files
>> selected dynamically by another program. This leads me to prefer
>> sending an unseekable stream to mkfs.erofs so that file selection and
>> image generation can run concurrently, instead of first making a
>> complete tarball and then making the erofs image. In this case, it
>> becomes necessary to transfer each file's worth of data through the
>> stream after each header only so that the tarball reader in tar.c does
>> not become desynchronized with the expected offset of the next tar
>> header.
>
> I wonder if it's possible to use a modified prototype-like [1] format
> which mkfs.xfs [2] currently supports with "-p". This prototype can
> be passed with a pipe instead.
>
> [1] http://uw714doc.sco.com/en/man/html.4/prototype.4.html
> [2] https://man7.org/linux/man-pages/man8/mkfs.xfs.8.html
.. mkfs.xfs protofile uses the following syntax originally instead:
https://man.cat-v.org/unix-6th/8/mkfs
>
>>
>> A very straightforward solution that seems to be working just fine for
>> me is to simply introduce a new optarg for --tar that indicates the
>> input data will be simply a series of tar headers / metadata without
>> actual file data. This implies index mode and additionally prevents
>> the skipping of inode.size worth of bytes after each header:
>>
>> diff --git a/include/erofs/tar.h b/include/erofs/tar.h
>> index a76f740..3d40a0f 100644
>> --- a/include/erofs/tar.h
>> +++ b/include/erofs/tar.h
>> @@ -46,7 +46,7 @@ struct erofs_tarfile {
>>
>> int fd;
>> u64 offset;
>> - bool index_mode, aufs;
>> + bool index_mode, headeronly_mode, aufs;
>> };
>>
>> void erofs_iostream_close(struct erofs_iostream *ios);
>> diff --git a/lib/tar.c b/lib/tar.c
>> index 8204939..e916395 100644
>> --- a/lib/tar.c
>> +++ b/lib/tar.c
>> @@ -584,7 +584,7 @@ static int tarerofs_write_file_index(struct
>> erofs_inode *inode,
>> ret = tarerofs_write_chunkes(inode, data_offset);
>> if (ret)
>> return ret;
>> - if (erofs_iostream_lskip(&tar->ios, inode->i_size))
>> + if (!tar->headeronly_mode && erofs_iostream_lskip(&tar->ios, inode->i_size))
>> return -EIO;
>> return 0;
>> }
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index 6d2b700..a72d30e 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -122,7 +122,7 @@ static void usage(void)
>> " --max-extent-bytes=# set maximum decompressed extent size #
>> in bytes\n"
>> " --preserve-mtime keep per-file modification time strictly\n"
>> " --aufs replace aufs special files with
>> overlayfs metadata\n"
>> - " --tar=[fi] generate an image from tarball(s)\n"
>> + " --tar=[fih] generate an image from tarball(s) or
>> tarball header data\n"
>> " --ovlfs-strip=[01] strip overlayfs metadata in the target
>> image (e.g. whiteouts)\n"
>> " --quiet quiet execution (do not write anything
>> to standard output.)\n"
>> #ifndef NDEBUG
>> @@ -514,11 +514,13 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
>> cfg.c_extra_ea_name_prefixes = true;
>> break;
>> case 20:
>> - if (optarg && (!strcmp(optarg, "i") ||
>> - !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2))) {
>> + if (optarg && (!strcmp(optarg, "i") || (!strcmp(optarg, "h") ||
>> + !strcmp(optarg, "0") || !memcmp(optarg, "0,", 2)))) {
>> erofstar.index_mode = true;
>> if (!memcmp(optarg, "0,", 2))
>> erofstar.mapfile = strdup(optarg + 2);
>> + if (!strcmp(optarg, "h"))
>> + erofstar.headeronly_mode = true;
>> }
>> tar_mode = true;
>> break;
>>
>> Using this requires generation of tarball-ish streams that can be
>> slightly difficult to cajole tar libraries into creating, but it does
>> work if you do it. I can imagine much more complex alternative ways to
>> do this too, such as supporting sparse tar files or supporting some
>> whole new input format.
>
> I think you could just fill zero to use the current index mode now.
> But yes, it could be inefficient if some files are huge.
>
>>
>> Would some version of this feature be interesting and useful? If so,
>> is the simple way good enough? It wouldn't preclude future addition of
>> things like a sparse tar reader.
>
> Yes, I think it's useful to support a simple prototype-like format, but
> it might take time on my own since there are some other ongoing stuffs
> to be landed (like multi-threading mkfs support.)
>
> Thanks,
> Gao Xiang
>
>>
>> Regards,
>> Mike
More information about the Linux-erofs
mailing list