LZCL: An embedded compression library

LZCL: An embedded compression library

Copyright Chris Lomont, Sept 2016
Version 1.0

Background

See my Compression Notes article for compression background on useful for this article. The README at the GitHub page has usage details .

For a recent project, I needed to design a low resource compression scheme. Over the years I’ve designed many, and this time I decided to unify some code, clean it up, and make some tools to ease development of such algorithms (mostly for my own use). The result this particular time is several very low memory decoders, necessarily slow, including Huffman, Arithmetic, LZ77, and my own new invention, LZCL, which outperforms the others quite well. All are designed for minimal RAM usage.

Implementation

LZ77 style compression creates a list of decisions (literal or token), a list of literals, and a list of tokens (each a distance and length pair). The lengths are skewed to short, and the distances slightly less so, so are amenable to compression. Often Huffman is used for one or both.

My method, LZCL, tries many types of compression on each stream (decisions,literal,distance,length) as well as on packed tokens (token = (length-minlength)*(maxdistance+1)+distance) and on decision runs (sort of a run-length-encoding for decisions), and selects the shortest for each. Compressors include Huffman, arithmetic, Golomb, and Binary Adaptice Sequential Coding (BASC). All header values are universal coded to avoid worrying about varying symbol lengths or other sizes.

All the choices are made for low memory decoding situations.

Decompression

The library is a single C file, Decompressor.c, with a single header file Decompressor.h, that implements multiple compressors in around 750 lines. <inttypes.h> is used to make integer sizes fixed by using, e.g., uint32_t wherever possible. Four main compressors, Huffman, Arithmetic, LZ77 (actually a LZSS variant), and my LZCL are implements. Each can be #ifdef‘d out in the header.

All decompressors use variants of my Lomont1 universal coder to store varying length items such as data sizes, bit lengths, etc., since testing this against other universal coders showed mine worked better for common data uses.

Each decompressor includes a one pass to buffer form, and an incremental form using as low memory as possible.

Compression is generally not very fast, since the algorithms are designed to be easily extended, munged, and used to create more formats if needed. Decompression code is designed to be small, not fast, for the use case I wrote this code for. Decompressor.c fits all four decompression routines into a self contained 750ish line C file (~100 for Huffman, ~200 for arithmetic, ~75 for LZ77, ~250 LZCL, rest support).

Common code

The library implements a few utility routines, such as a function to compute bits required to store a non-negative integer $N$, $\beta(N)=\lfloor\log_2 N\rfloor+1$. There is a few functions to read bits from a bitstream, and a Lomont1 universal codec decoder.

Huffman

The Huffman coder is non-adaptive, in order to make low memory usage possible. Instead the symbol table is stored (in canonical format, see my compression format), and is parsed to decode each symbol. This, of course, makes the decompression slower than using a memory table, but this was the choice I needed.

The header is stored in fields, each adaptively stored to handle nay size

  • bits per symbol (usually 8, not always)
  • bits per codelength (usually 4-6)
  • Min length of a codeword (usually 1-4)
  • delta codeword length (usually 9-16)

Then the symbol table, which stores the number $N_j$ of codewords of length $L_j$, then $N_j$ symbols for these codewords. This is enough to decode.

Arithmetic

Similar to the Huffman version, this is designed to use as little RAM as possible, so the symbol table (which keeps symbols and frequencies needed for arithmetic decoding) in a special table format. This table stores the min and max symbols present, then for each in range, stores a Binary Adaptive Sequential Coded string of the values. The first value is Lomont1 encoded. When a value is needed to be looked up for decoding, the BASC sequence is walked till it is found.

Another problem with arithmetic decoders is knowing when to stop, and a common solution is to add a new End Of File token to the probabilities, which increases the overhead a little. I chose instead to store the length of the output, and the decoder outputs symbols until enough are decoded.

LZ77

The LZ77 coder is a LZSS variant, using a bit to decide if the next item is a literal or a (distance,length) pair.

After the header, a bit is read, and if 0 a symbol is decoded (of fixed minimal length), else if the bit was 1, a token is decoded, which is an integer, also of minimal length over the entire stream. The distance and length are decoded via

Then a run of length bytes are output starting distance back in the buffer.

For slow memory, a user-supplied buffer of sufficient length is used cyclically.

LZCL

LZCL is a LZ77 variant. As such, it has to deal with decisions, literals, and (distance,length) tokens.

  • Decisions: 0/1 to tell next token type
  • Literals: single bytes encoded
  • Tokens: each is a distance, length pair, encoded in one of two methods

The compressor separates each into different streams, tries to compress each variant with each available compressor, and stores the minimal version.

The compression algorithms are currently

Decisions are considered either as 1 bit per decision, or as a list of runs of consecutive same values (basic run length encoding). Each of these is tested across each compressor.

Tokens, which are (distance,length) pairs, are considered as two separate streams or as one with the pair encoded as length*(maxDistance+1)+distance. All these are tested against each compressor.

Literals are similarly tested across each compressor.

The result is the minimal stream obtainable from the current methods.

All formats contribute to the LZCL results on the corpus tests below.

Results

On the Calgary and Cantebury corpus (selected since I am using smaller files for the embedded work – for larger corpii/corpuses/corp[.]+ use larger datasets), here are the results

Filename File size Arithmetic ratio Huffman ratio Lz77 ratio LZCL ratio
Calgary\bib 111261 0.651 0.655 0.574 0.435
Calgary\book1 768771 0.566 0.570 0.661 0.486
Calgary\book2 610856 0.599 0.603 0.576 0.420
Calgary\geo 102400 0.709 0.711 0.898 0.645
Calgary\news 377109 0.649 0.654 0.648 0.454
Calgary\obj1 21504 0.755 0.759 0.648 0.527
Calgary\obj2 246814 0.784 0.787 0.484 0.361
Calgary\paper1 53161 0.626 0.629 0.535 0.426
Calgary\paper2 82199 0.577 0.580 0.571 0.448
Calgary\paper3 46526 0.586 0.588 0.568 0.474
Calgary\paper4 13286 0.596 0.598 0.543 0.477
Calgary\paper5 11954 0.627 0.629 0.564 0.478
Calgary\paper6 38105 0.630 0.633 0.561 0.431
Calgary\pic 513216 0.152 0.208 0.153 0.103
Calgary\progc 39611 0.653 0.657 0.502 0.417
Calgary\progl 71646 0.598 0.601 0.362 0.277
Calgary\progp 49379 0.611 0.614 0.351 0.275
Calgary\trans 93695 0.693 0.697 0.412 0.312
Cantebury\alice29.txt 152089 0.572 0.577 0.595 0.439
Cantebury\asyoulik.txt 125179 0.602 0.606 0.652 0.471
Cantebury\cp.html 24603 0.660 0.662 0.501 0.423
Cantebury\fields.c 11150 0.637 0.638 0.389 0.322
Cantebury\grammar.lsp 3721 0.603 0.604 0.435 0.394
Cantebury\lcet10.txt 426754 0.584 0.587 0.576 0.419
Cantebury\plrabn12.txt 481861 0.567 0.572 0.655 0.481
Cantebury\ptt5 513216 0.152 0.208 0.153 0.103
Cantebury\sum 38240 0.673 0.678 0.562 0.412
Cantebury\xargs.1 4227 0.634 0.633 0.517 0.474
Summary 5032533 0.518 0.533 0.510 0.374

Note that the LZ77 and LZCL output sizes can be further reduced by selecting an optimal max distance, max run length pair, but doing so optimally takes a bit of time, which I did not do for these tables. Additional savings can be on order of 10% or so. The command line tools allow exploring this.

For these 28 files , LZCL made the following choices

I also tested this on a PIC32, the desired platform, and obtained the following decompression performance:

The PIC32 sample code, running on a PIC32MX150F128B at 40MHZ, compiled with x32-gcc v1.31, -O3 optimized code, obtains the following decompression rates

Codec Rate KB/s Ratio
Huffman 74 67%
Arithmetic 3 66%
LZ77 327 53%
LZCL 16 42%
Incremental Huffman 74 67%
Incremental Arithmetic 3 66%
Incremental LZ77 285 53%
Incremental LZCL 16 42%

This was tested on a compressed version of the 26217 byte file Decompressor.c, with both LZ variants set to allow decoding incrementally into a 257 byte buffer.

Conclusion

The result was a success – I was able to fit all the magic into our upcoming product for my company Hypnocube, LLC! Please drop me a line is this code is useful to you.

Happy hacking!
Chris Lomont