<div dir="ltr">Thanks for the direction, Gao Xiang.<br><br>Understood — switching to ZSTD streaming APIs (ZSTD_decompressStream) <br>would eliminate the ZSTD_getFrameContentSize() / <br>ZSTD_getDecompressedSize() dependency entirely and align <br>erofs-utils with the kernel implementation.<br><br>I'll work on a v3 using the streaming approach.<br><br>- Utkal</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, 17 Mar 2026 at 15:23, Gao Xiang <<a href="mailto:hsiangkao@linux.alibaba.com">hsiangkao@linux.alibaba.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
On 2026/3/17 12:55, Utkal Singh wrote:<br>
> ZSTD_getFrameContentSize() reads the content size from the ZSTD<br>
> frame header in the compressed data. This is untrusted on-disk<br>
> metadata, independent from the extent map that provides<br>
> rq->decodedlength via z_erofs_map_blocks_iter().<br>
> <br>
> A crafted EROFS image can set the extent map to claim a decoded<br>
> length larger than the actual ZSTD frame content size. When this<br>
> happens, a buffer of the (smaller) frame content size is allocated<br>
> and decompressed into, but the subsequent memcpy copies<br>
> rq->decodedlength bytes from it -- a potential out-of-bounds read.<br>
> <br>
> Additionally, the ZSTD_getDecompressedSize() legacy fallback<br>
> returns 0 for frames without a content size field. This leads to<br>
> malloc(0) followed by out-of-bounds access on the returned pointer.<br>
> <br>
> Reject frames where the reported content size is zero or smaller<br>
> than the expected decoded length.<br>
> <br>
> Reproducer:<br>
> mkdir testdir<br>
> python3 -c "open('testdir/f','wb').write(b'A'*131072)"<br>
> mkfs.erofs -zzstd test.erofs testdir/<br>
> python3 -c "d=bytearray(open('test.erofs','rb').read());\<br>
> p=d.find(b'\x28\xb5\x2f\xfd');d[p+4]=0x20;d[p+5]=0x01;\<br>
> open('test.erofs','wb').write(d)"<br>
> fsck.erofs --extract=out test.erofs<br>
> # Expected: ZSTD frame content size 1 < decoded length 131072<br>
> <br>
> Signed-off-by: Utkal Singh <<a href="mailto:singhutkal015@gmail.com" target="_blank">singhutkal015@gmail.com</a>><br>
> ---<br>
> lib/decompress.c | 7 +++++++<br>
> 1 file changed, 7 insertions(+)<br>
> <br>
> diff --git a/lib/decompress.c b/lib/decompress.c<br>
> index 3e7a173..fb81039 100644<br>
> --- a/lib/decompress.c<br>
> +++ b/lib/decompress.c<br>
> @@ -48,7 +48,14 @@ static int z_erofs_decompress_zstd(struct z_erofs_decompress_req *rq)<br>
> #else<br>
> total = ZSTD_getDecompressedSize(src + inputmargin,<br>
> rq->inputsize - inputmargin);<br>
> + if (!total)<br>
> + return -EFSCORRUPTED;<br>
<br>
hmm, that is the difference between the kernel and erofs-utils<br>
implementation.<br>
<br>
the kernel uses zstd streaming APIs, so it won't malloc()<br>
a new buffer in advance, actually I think erofs-utils should<br>
switch to streaming APIs too, in order to avoid<br>
<br>
ZSTD_getFrameContentSize() and ZSTD_getDecompressedSize()<br>
<br>
dependencies as you said in the commit message.<br>
<br>
Thanks,<br>
Gao Xiang<br>
<br>
</blockquote></div>