[PATCH] mkfs: Fix input offset counting in headerball mode

Mike Baynton mike at mbaynton.com
Fri Oct 25 06:58:02 AEDT 2024


When using --tar=headerball, most files included in the headerball are
not included in the EROFS image. mkfs.erofs typically exits prematurely,
having processed non-USTAR blocks as USTAR and believing they are
end-of-archive markers. (Other failure modes are probably also possible
if the input stream doesn't look like end-of-archive markers at the
locations that are being read.)

This is because we lost correct count of bytes that are read from the
input stream when in headerball (or ddtaridx) modes. We were assuming that
in these modes no data would be read following the ustar block, but in
case of things like PAX headers, lots more data may be read without
incrementing tar->offset.

This corrects by always incrementing the offset counter, and then
decrementing it again in the one case where headerballs differ -
regular file data blocks are not present.

Signed-off-by: Mike Baynton <mike at mbaynton.com>
---

I suspect headerball mode may be unfamiliar, since it exists due to my
prior contribution and my site is probably the only active user. As a recap,
the idea of a "headerball" is to offer a relatively easy to produce,
optimally efficient input format for constructing EROFS filesystems of only 
inode metadata. A headerball is a PAX tarball, except that all 512-byte
blocks containing regular file data are not present.

For convenience, a small example "headerball" can be downloaded from
https://gist.github.com/mbaynton/bae60bf1044d83985956c6dc5b199cc3

This produces a 67-inode EROFS image with this patch, and as I recall
about 3 inodes without it.

This also changes the behavior of ddtaridx format, which sounds like a
similar use case but the user is Alibaba. It _looks_ like this format
would also be affected, but without sample input I can't confirm.

 lib/tar.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/tar.c b/lib/tar.c
index b32abd4..726e565 100644
--- a/lib/tar.c
+++ b/lib/tar.c
@@ -808,8 +808,7 @@ out_eot:
 	}
 
 	dataoff = tar->offset;
-	if (!(tar->headeronly_mode || tar->ddtaridx_mode))
-		tar->offset += st.st_size;
+	tar->offset += st.st_size;
 	switch(th->typeflag) {
 	case '0':
 	case '7':
@@ -1022,8 +1021,10 @@ new_inode:
 			memcpy(inode->i_link, eh.link, inode->i_size + 1);
 		} else if (inode->i_size) {
 			if (tar->headeronly_mode) {
+				tar->offset -= st.st_size; // assumed wrong earlier
 				ret = erofs_write_zero_inode(inode);
 			} else if (tar->ddtaridx_mode) {
+				tar->offset -= st.st_size; // assumed wrong earlier
 				dataoff = le64_to_cpu(*(__le64 *)(th->devmajor));
 				if (tar->rvsp_mode) {
 					inode->datasource = EROFS_INODE_DATA_SOURCE_RESVSP;
-- 
2.34.1



More information about the Linux-erofs mailing list