[PATCH 2/4] erofs-utils: add a built-in DEFLATE compressor

Mon Jul 10 13:01:16 AEST 2023

Hi Eric,

On 2023/7/10 10:33, Eric Biggers wrote:
> Hi Gao,
> 
> On Mon, Jul 10, 2023 at 02:25:09AM +0800, Gao Xiang wrote:
>>
>> Since there is _no fixed-sized output DEFLATE compression appoach_
>> available in public (fitblk is somewhat ineffective) and the original
>> zlib is quite slowly developping, let's work out one for our use cases.
>> Fortunately, it's only less than 1.5kLOC with lazy matching to match
>> the basic zlib ability.  Besides, near-optimal block splitting (based
>> on price function) doesn't support for now since it's not rush for us.
>>
>> In the future, there may be more built-in optimizers landed to fulfill
>> our needs even further.  In addition, I'd be quite happy to see more
>> popular encoders to support fixed-sized output compression too.
>>
>> [1] https://developer.apple.com/documentation/compression/compression_algorithm
>> [2] https://datatracker.ietf.org/doc/html/rfc1951
>> Signed-off-by: Gao Xiang <hsiangkao at linux.alibaba.com>
>> ---
>>   lib/Makefile.am    |    2 +
>>   lib/kite_deflate.c | 1267 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 1269 insertions(+)
>>   create mode 100644 lib/kite_deflate.c
> 
> Have you considered simply running any standard compressor multiple times to
> find the maximum input size that compresses to less than or equal to the desired
> output size?  It may sound too slow, but it can be made to do the search in a

Thanks for your interest and reply!

Actually I've tried several ways before (just like fitblk.c or binary search),
but the compression ratios is sliently impacted and compression speed is really
impacted (maybe we need to develop segment-splitted multi-threadded compression
as well, and it's developping by a college student now but I'm not sure the
progress.)

I think we could use this way first for the new Zstd support (if they don't have
bandwidth to add a offical fixed-sized output approach), but considering deflate
algorithm is quite easy for me to understand so at least I could have a simple
built-in one to avoid external dependency (mainly considering zlib since it's
quite outdated).  Since EROFS is actually offline compression so the compressed
data correctness can always be checked in advance before image release.

Also I have more ideas to optimize the last symbol from matchfinders (even lazy
matching) for fixed-sized output approaches to get even better compression
ratios (especially for smaller filesystem cluster like 4kb.)

> smart way (binary search + heuristics).  Also, it would easily work with every
> compressor, and it would always produce the optimal input size.
> 
> I wrote a proof-of-concept patch that implements this idea in the benchmark
> program in libdeflate.  It finds the optimal input size in about 8 attempts on
> average for a target output size of 4096, or about 13 for a target output size
> of 65536.  Note, if an optimal solution is not needed, it could be sped up
> slightly by stopping when the output size is at, or just under (if desired), the
> target output size.  Also keep in mind that any compression level can be used.
> 
> The full code I tested is currently at
> https://github.com/ebiggers/libdeflate/tree/fixed-output-size, but below is the
> actual algorithm:

If libdeflate officically supports this mode, I'm very happy to integrate
libdeflate in this way as an alternative encoder (if you could merge this), and
if the compression speed is improved, I could switch it to the default encoder
then. In the other words, I really hope there could be an official deflate
encoder to support this mode as well.  Also I think the in-kernel zlib
decompression is somewhat slow just due to some non-technical reasons, I might
need to improve this eveutually but it's no rush for us compared to support
DEFLATE format first as well.

> 
> /*
>   * Compresses the largest prefix of 'in', with maximum size 'in_nbytes', whose
>   * compressed output is at most 'target_output_size' bytes.  The compressed size
>   * is returned as the return value, and the uncompressed size is returned in
>   * *actual_in_nbytes_ret.  The output buffer 'out' has size 'out_nbytes_avail',
>   * which should be at least 'target_output_size + 9'.  'tmpbuf' must be a buffer
>   * the same size as 'out'.
>   */
> static size_t
> do_compress_withfixedoutputsize(struct compressor *c,
> 				const u8 *in, size_t in_nbytes,
> 				u8 *out, size_t out_nbytes_avail,
> 				size_t target_output_size,
> 				size_t last_uncompressed_size,
> 				u8 *tmpbuf,
> 				size_t *actual_in_nbytes_ret)
> {
> 	size_t l = 0; /* largest input that fits so far */
> 	size_t l_csize = 0;
> 	size_t r = in_nbytes + 1; /* smallest input that doesn't fit so far */
> 	size_t m;
> 
> 	if (last_uncompressed_size)
> 		m = last_uncompressed_size * 15 / 16;
> 	else
> 		m = target_output_size * 4;
> 	for (;;) {
> 		size_t csize;
> 
> 		m = MAX(m, l + 1);
> 		m = MIN(m, r - 1);
> 
> 		csize = do_compress(c, in, m, tmpbuf, out_nbytes_avail);
> 		if (csize > 0 && csize <= target_output_size) {
> 			/* Fits */
> 			memcpy(out, tmpbuf, csize);
> 			l = m;
> 			l_csize = csize;
> 			if (r <= l + 1)
> 				break;
> 			/*
> 			 * Estimate needed input prefix size based on current
> 			 * compression ratio.
> 			 */
> 			m = (target_output_size * m) / csize;
> 		} else {
> 			/* Doesn't fit */
> 			r = m;
> 			if (r <= l + 1)
> 				break;
> 			m = (l + r) / 2;
> 		}
> 	}
> 	*actual_in_nbytes_ret = l;
> 	return l_csize;
> }
> 
> What do you think?

Thank you for the time and your efforts, I much appreciate if it could be
supported so I could add a alternative encoder for this.

Thanks,
Gao Xiang

> 
> - Eric