[PATCH v2] erofs-utils: lib: converted division to shift in z_erofs_load_compact_lcluster
Yifan Zhao
yifan.yfzhao at foxmail.com
Thu Feb 26 13:32:33 AEDT 2026
Hi Lee,
A quick search shows that x86 kernel implementation also use `div`
instruction
under Linux v6.19 and GCC 15.2.1, add GCC correctly generate shift
instruction
in my arm64 machine with GCC 14.2.0.
Could you also consider evaluate this optimization in kernel?
Thanks,
Yifan
On 2/26/2026 1:30 AM, Ashley Lee wrote:
> perf on fsck.erofs in gcc reports that z_erofs_load_compact_lcluster
> was spending 20% of its time doing the div instruction. While the
> function itself is ~40% of user runtime. In the source code, it seems
> that dividing by vcnt doesn't optimize to a shift despite the two
> possible states being powers of 2.
>
> Changing the division into a ilog2() function call encourages the
> compiler to recognize it as a power of 2. Thus performing a shift.
>
> Running a benchmark on lzma compressed freebsd code on x86, shows
> there is a ~4% increase in performance in gcc. While clang shows
> virtually no regression in performance. The tradeoff is slightly
> obfuscated source code.
>
> The following command was run locally on x86.
>
> $ hyperfine -w 5 -m 30 "./fsck.erofs ./bsd.erofs.lzma"
>
> patch on gcc 15.2.1
> Time (mean ± σ): 354.8 ms ± 6.0 ms \
> [User: 227.8 ms, System: 126.1 ms]
> Range (min … max): 345.8 ms … 366.2 ms 30 runs
>
> dev on gcc 15.2.1
> Time (mean ± σ): 370.7 ms ± 6.7 ms \
> [User: 246.5 ms, System: 123.4 ms]
> Range (min … max): 362.7 ms … 390.7 ms 30 runs
>
> patch on clang 21.1.8
> Time (mean ± σ): 371.9 ms ± 2.4 ms \
> [User: 247.2 ms, System: 123.9 ms]
> Range (min … max): 369.1 ms … 380.0 ms 30 runs
>
> dev on clang 21.1.8
> Time (mean ± σ): 371.0 ms ± 1.9 ms \
> [User: 245.5 ms, System: 124.5 ms]
> Range (min … max): 368.4 ms … 377.7 ms 30 runs
>
> Signed-off-by: Ashley Lee <yester1324 at gmail.com>
> ---
> v2: changed vdiv to ilog2 call
>
> lib/zmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/zmap.c b/lib/zmap.c
> index baec278..3ac7fe9 100644
> --- a/lib/zmap.c
> +++ b/lib/zmap.c
> @@ -160,7 +160,7 @@ static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m,
> m->nextpackoff = round_down(pos, vcnt << amortizedshift) +
> (vcnt << amortizedshift);
> lobits = max(lclusterbits, ilog2(Z_EROFS_LI_D0_CBLKCNT) + 1U);
> - encodebits = ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt;
> + encodebits = (((vcnt << amortizedshift) - sizeof(__le32)) * 8) >> ilog2(vcnt);
> bytes = pos & ((vcnt << amortizedshift) - 1);
> in -= bytes;
> i = bytes >> amortizedshift;
More information about the Linux-erofs
mailing list