[PATCH 2/4] erofs-utils: add a built-in DEFLATE compressor

Eric Biggers ebiggers at kernel.org
Mon Jul 10 12:33:00 AEST 2023


Hi Gao,

On Mon, Jul 10, 2023 at 02:25:09AM +0800, Gao Xiang wrote:
> 
> Since there is _no fixed-sized output DEFLATE compression appoach_
> available in public (fitblk is somewhat ineffective) and the original
> zlib is quite slowly developping, let's work out one for our use cases.
> Fortunately, it's only less than 1.5kLOC with lazy matching to match
> the basic zlib ability.  Besides, near-optimal block splitting (based
> on price function) doesn't support for now since it's not rush for us.
> 
> In the future, there may be more built-in optimizers landed to fulfill
> our needs even further.  In addition, I'd be quite happy to see more
> popular encoders to support fixed-sized output compression too.
> 
> [1] https://developer.apple.com/documentation/compression/compression_algorithm
> [2] https://datatracker.ietf.org/doc/html/rfc1951
> Signed-off-by: Gao Xiang <hsiangkao at linux.alibaba.com>
> ---
>  lib/Makefile.am    |    2 +
>  lib/kite_deflate.c | 1267 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 1269 insertions(+)
>  create mode 100644 lib/kite_deflate.c

Have you considered simply running any standard compressor multiple times to
find the maximum input size that compresses to less than or equal to the desired
output size?  It may sound too slow, but it can be made to do the search in a
smart way (binary search + heuristics).  Also, it would easily work with every
compressor, and it would always produce the optimal input size.

I wrote a proof-of-concept patch that implements this idea in the benchmark
program in libdeflate.  It finds the optimal input size in about 8 attempts on
average for a target output size of 4096, or about 13 for a target output size
of 65536.  Note, if an optimal solution is not needed, it could be sped up
slightly by stopping when the output size is at, or just under (if desired), the
target output size.  Also keep in mind that any compression level can be used.

The full code I tested is currently at
https://github.com/ebiggers/libdeflate/tree/fixed-output-size, but below is the
actual algorithm:

/*
 * Compresses the largest prefix of 'in', with maximum size 'in_nbytes', whose
 * compressed output is at most 'target_output_size' bytes.  The compressed size
 * is returned as the return value, and the uncompressed size is returned in
 * *actual_in_nbytes_ret.  The output buffer 'out' has size 'out_nbytes_avail',
 * which should be at least 'target_output_size + 9'.  'tmpbuf' must be a buffer
 * the same size as 'out'.
 */
static size_t
do_compress_withfixedoutputsize(struct compressor *c,
				const u8 *in, size_t in_nbytes,
				u8 *out, size_t out_nbytes_avail,
				size_t target_output_size,
				size_t last_uncompressed_size,
				u8 *tmpbuf,
				size_t *actual_in_nbytes_ret)
{
	size_t l = 0; /* largest input that fits so far */
	size_t l_csize = 0;
	size_t r = in_nbytes + 1; /* smallest input that doesn't fit so far */
	size_t m;

	if (last_uncompressed_size)
		m = last_uncompressed_size * 15 / 16;
	else
		m = target_output_size * 4;
	for (;;) {
		size_t csize;

		m = MAX(m, l + 1);
		m = MIN(m, r - 1);

		csize = do_compress(c, in, m, tmpbuf, out_nbytes_avail);
		if (csize > 0 && csize <= target_output_size) {
			/* Fits */
			memcpy(out, tmpbuf, csize);
			l = m;
			l_csize = csize;
			if (r <= l + 1)
				break;
			/*
			 * Estimate needed input prefix size based on current
			 * compression ratio.
			 */
			m = (target_output_size * m) / csize;
		} else {
			/* Doesn't fit */
			r = m;
			if (r <= l + 1)
				break;
			m = (l + r) / 2;
		}
	}
	*actual_in_nbytes_ret = l;
	return l_csize;
}

What do you think?

- Eric


More information about the Linux-erofs mailing list