LZCL: An embedded compression library
Copyright Chris Lomont, Sept 2016
For a recent project, I needed to design a low resource compression scheme. Over the years I’ve designed many, and this time I decided to unify some code, clean it up, and make some tools to ease development of such algorithms (mostly for my own use). The result this particular time is several very low memory decoders, necessarily slow, including Huffman, Arithmetic, LZ77, and my own new invention, LZCL, which outperforms the others quite well. All are designed for minimal RAM usage.
LZ77 style compression creates a list of decisions (literal or token), a list of literals, and a list of tokens (each a distance and length pair). The lengths are skewed to short, and the distances slightly less so, so are amenable to compression. Often Huffman is used for one or both.
My method, LZCL, tries many types of compression on each stream (decisions,literal,distance,length) as well as on packed tokens (token = (length-minlength)*(maxdistance+1)+distance) and on decision runs (sort of a run-length-encoding for decisions), and selects the shortest for each. Compressors include Huffman, arithmetic, Golomb, and Binary Adaptice Sequential Coding (BASC). All header values are universal coded to avoid worrying about varying symbol lengths or other sizes.
All the choices are made for low memory decoding situations.
The library is a single C file,
Decompressor.c, with a single header file
Decompressor.h, that implements multiple compressors in around 750 lines.
<inttypes.h> is used to make integer sizes fixed by using, e.g.,
uint32_t wherever possible. Four main compressors, Huffman, Arithmetic, LZ77 (actually a LZSS variant), and my LZCL are implements. Each can be
#ifdef‘d out in the header.
All decompressors use variants of my Lomont1 universal coder to store varying length items such as data sizes, bit lengths, etc., since testing this against other universal coders showed mine worked better for common data uses.
Each decompressor includes a one pass to buffer form, and an incremental form using as low memory as possible.
Compression is generally not very fast, since the algorithms are designed to be easily extended, munged, and used to create more formats if needed. Decompression code is designed to be small, not fast, for the use case I wrote this code for.
Decompressor.c fits all four decompression routines into a self contained 750ish line C file (~100 for Huffman, ~200 for arithmetic, ~75 for LZ77, ~250 LZCL, rest support).
The library implements a few utility routines, such as a function to compute bits required to store a non-negative integer $N$, $\beta(N)=\lfloor\log_2 N\rfloor+1$. There is a few functions to read bits from a bitstream, and a Lomont1 universal codec decoder.
The Huffman coder is non-adaptive, in order to make low memory usage possible. Instead the symbol table is stored (in canonical format, see my compression format), and is parsed to decode each symbol. This, of course, makes the decompression slower than using a memory table, but this was the choice I needed.
The header is stored in fields, each adaptively stored to handle nay size
- bits per symbol (usually 8, not always)
- bits per codelength (usually 4-6)
- Min length of a codeword (usually 1-4)
- delta codeword length (usually 9-16)
Then the symbol table, which stores the number $N_j$ of codewords of length $L_j$, then $N_j$ symbols for these codewords. This is enough to decode.
Similar to the Huffman version, this is designed to use as little RAM as possible, so the symbol table (which keeps symbols and frequencies needed for arithmetic decoding) in a special table format. This table stores the min and max symbols present, then for each in range, stores a Binary Adaptive Sequential Coded string of the values. The first value is Lomont1 encoded. When a value is needed to be looked up for decoding, the BASC sequence is walked till it is found.
Another problem with arithmetic decoders is knowing when to stop, and a common solution is to add a new End Of File token to the probabilities, which increases the overhead a little. I chose instead to store the length of the output, and the decoder outputs symbols until enough are decoded.
The LZ77 coder is a LZSS variant, using a bit to decide if the next item is a literal or a (distance,length) pair.
After the header, a bit is read, and if 0 a symbol is decoded (of fixed minimal length), else if the bit was 1, a token is decoded, which is an integer, also of minimal length over the entire stream. The distance and length are decoded via
uint32_t length = token / (actualMaxDistance + 1) + actualMinLength;
uint32_t distance = token % (actualMaxDistance + 1);
Then a run of length bytes are output starting distance back in the buffer.
For slow memory, a user-supplied buffer of sufficient length is used cyclically.
LZCL is a LZ77 variant. As such, it has to deal with decisions, literals, and (distance,length) tokens.
- Decisions: 0/1 to tell next token type
- Literals: single bytes encoded
- Tokens: each is a distance, length pair, encoded in one of two methods
The compressor separates each into different streams, tries to compress each variant with each available compressor, and stores the minimal version.
The compression algorithms are currently
- Fixed length
- Golomb (see my Compression Notes article) with truncated coding
Decisions are considered either as 1 bit per decision, or as a list of runs of consecutive same values (basic run length encoding). Each of these is tested across each compressor.
Tokens, which are (distance,length) pairs, are considered as two separate streams or as one with the pair encoded as
length*(maxDistance+1)+distance. All these are tested against each compressor.
Literals are similarly tested across each compressor.
The result is the minimal stream obtainable from the current methods.
All formats contribute to the LZCL results on the corpus tests below.
On the Calgary and Cantebury corpus (selected since I am using smaller files for the embedded work – for larger corpii/corpuses/corp[.]+ use larger datasets), here are the results
|Filename||File size||Arithmetic ratio||Huffman ratio||Lz77 ratio||LZCL ratio|
Note that the LZ77 and LZCL output sizes can be further reduced by selecting an optimal max distance, max run length pair, but doing so optimally takes a bit of time, which I did not do for these tables. Additional savings can be on order of 10% or so. The command line tools allow exploring this.
For these 28 files , LZCL made the following choices
// choices and codec counts
codec used: decision runs ArithmeticCodec => 5
codec used: decision runs HuffmanCodec => 3
codec used: decisions ArithmeticCodec => 20
codec used: distances ArithmeticCodec => 20
codec used: distances FixedSizeCodec => 8
codec used: lengths ArithmeticCodec => 18
codec used: lengths GolombCodec => 7
codec used: lengths HuffmanCodec => 3
codec used: literals ArithmeticCodec => 19
codec used: literals FixedSizeCodec => 5
codec used: literals HuffmanCodec => 4
// bits saved compared to a fixed size coded (high) and against second performing codec (low)
codec win ArithmeticCodec saved high => 8269290
codec win ArithmeticCodec saved low => 895235
codec win FixedSizeCodec saved high => 32582974
codec win FixedSizeCodec saved low => 991038
codec win GolombCodec saved high => 90887
codec win GolombCodec saved low => 747
codec win HuffmanCodec saved high => 18069716
codec win HuffmanCodec saved low => 865932
I also tested this on a PIC32, the desired platform, and obtained the following decompression performance:
The PIC32 sample code, running on a PIC32MX150F128B at 40MHZ, compiled with x32-gcc v1.31, -O3 optimized code, obtains the following decompression rates
This was tested on a compressed version of the 26217 byte file
Decompressor.c, with both LZ variants set to allow decoding incrementally into a 257 byte buffer.
The result was a success – I was able to fit all the magic into our upcoming product for my company Hypnocube, LLC! Please drop me a line is this code is useful to you.