[PATCH v2] erofs-utils: lib: converted division to shift in z_erofs_load_compact_lcluster
Gao Xiang
hsiangkao at linux.alibaba.com
Thu Feb 26 13:40:16 AEDT 2026
On 2026/2/26 10:32, Yifan Zhao wrote:
> Hi Lee,
>
>
> A quick search shows that x86 kernel implementation also use `div` instruction
>
> under Linux v6.19 and GCC 15.2.1, add GCC correctly generate shift instruction
Could we try more GCC versions and find the impacts?
Does it a recent GCC regression?
>
> in my arm64 machine with GCC 14.2.0.
>
> Could you also consider evaluate this optimization in kernel?
Yes, but I wonder if `div` is already used for these years on x86,
could you check this?
Thanks,
Gao Xiang
>
>
> Thanks,
>
> Yifan
>
> On 2/26/2026 1:30 AM, Ashley Lee wrote:
>> perf on fsck.erofs in gcc reports that z_erofs_load_compact_lcluster
>> was spending 20% of its time doing the div instruction. While the
>> function itself is ~40% of user runtime. In the source code, it seems
>> that dividing by vcnt doesn't optimize to a shift despite the two
>> possible states being powers of 2.
>>
>> Changing the division into a ilog2() function call encourages the
>> compiler to recognize it as a power of 2. Thus performing a shift.
>>
>> Running a benchmark on lzma compressed freebsd code on x86, shows
>> there is a ~4% increase in performance in gcc. While clang shows
>> virtually no regression in performance. The tradeoff is slightly
>> obfuscated source code.
>>
>> The following command was run locally on x86.
>>
>> $ hyperfine -w 5 -m 30 "./fsck.erofs ./bsd.erofs.lzma"
>>
>> patch on gcc 15.2.1
>> Time (mean ± σ): 354.8 ms ± 6.0 ms \
>> [User: 227.8 ms, System: 126.1 ms]
>> Range (min … max): 345.8 ms … 366.2 ms 30 runs
>>
>> dev on gcc 15.2.1
>> Time (mean ± σ): 370.7 ms ± 6.7 ms \
>> [User: 246.5 ms, System: 123.4 ms]
>> Range (min … max): 362.7 ms … 390.7 ms 30 runs
>>
>> patch on clang 21.1.8
>> Time (mean ± σ): 371.9 ms ± 2.4 ms \
>> [User: 247.2 ms, System: 123.9 ms]
>> Range (min … max): 369.1 ms … 380.0 ms 30 runs
>>
>> dev on clang 21.1.8
>> Time (mean ± σ): 371.0 ms ± 1.9 ms \
>> [User: 245.5 ms, System: 124.5 ms]
>> Range (min … max): 368.4 ms … 377.7 ms 30 runs
>>
>> Signed-off-by: Ashley Lee <yester1324 at gmail.com>
>> ---
>> v2: changed vdiv to ilog2 call
>>
>> lib/zmap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/zmap.c b/lib/zmap.c
>> index baec278..3ac7fe9 100644
>> --- a/lib/zmap.c
>> +++ b/lib/zmap.c
>> @@ -160,7 +160,7 @@ static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m,
>> m->nextpackoff = round_down(pos, vcnt << amortizedshift) +
>> (vcnt << amortizedshift);
>> lobits = max(lclusterbits, ilog2(Z_EROFS_LI_D0_CBLKCNT) + 1U);
>> - encodebits = ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt;
>> + encodebits = (((vcnt << amortizedshift) - sizeof(__le32)) * 8) >> ilog2(vcnt);
>> bytes = pos & ((vcnt << amortizedshift) - 1);
>> in -= bytes;
>> i = bytes >> amortizedshift;
More information about the Linux-erofs
mailing list