84 lines
4.1 KiB
Plaintext
84 lines
4.1 KiB
Plaintext
![]() |
////////////////////////////////////////////////////////////////////////////
|
||
|
// **** LZW-AB **** //
|
||
|
// Adjusted Binary LZW Compressor/Decompressor //
|
||
|
// Copyright (c) 2016-2020 David Bryant //
|
||
|
// All Rights Reserved //
|
||
|
// Distributed under the BSD Software License (see license.txt) //
|
||
|
////////////////////////////////////////////////////////////////////////////
|
||
|
|
||
|
This is an implementation of the Lempel-Ziv-Welch general-purpose data
|
||
|
compression algorithm. It is targeted at embedded applications that require
|
||
|
high speed compression or decompression facilities where lots of RAM for
|
||
|
large dictionaries might not be available. I have used this in several
|
||
|
projects for storing compressed firmware images, and once I even coded the
|
||
|
decompressor in Z-80 assembly language for speed! Depending on the maximum
|
||
|
symbol size selected, the implementation can require from 2368 to 335616
|
||
|
bytes of RAM for decoding (and about half again more for encoding).
|
||
|
|
||
|
This is a streaming compressor in that the data is not divided into blocks
|
||
|
and no context information like dictionaries or Huffman tables are sent
|
||
|
ahead of the compressed data (except for one byte to signal the maximum
|
||
|
bit depth). This limits the maximum possible compression ratio compared to
|
||
|
algorithms that significantly preprocess the data, but with the help of
|
||
|
some enhancements to the LZW algorithm (described below) it is able to
|
||
|
compress better than the UNIX "compress" utility (which is also LZW) and
|
||
|
is in fact closer to and sometimes beats the compression level of "gzip".
|
||
|
|
||
|
The symbols are stored in "adjusted binary" which provides somewhat better
|
||
|
compression (with virtually no speed penalty) compared to the fixed word
|
||
|
sizes normally used. Once the dictionary is full, the encoder returns to
|
||
|
the beginning and recycles string codes that have not been used yet for
|
||
|
longer strings. In this way the dictionary constantly "churns" based on the
|
||
|
the incoming stream, thereby improving and adapting to optimal compression.
|
||
|
The compression performance is constantly monitored and a dictionary flush
|
||
|
is forced on stretches of negative compression which limits worst-case
|
||
|
performance to about 8% inflation.
|
||
|
|
||
|
LZW-AB consists of three standard C files: the library, a command-line
|
||
|
filter demo using pipes, and a command-line test harness. Each program
|
||
|
builds with a single command on most platforms. It has been designed with
|
||
|
maximum portability in mind and should work correctly on big-endian as well
|
||
|
as little-endian machines.
|
||
|
|
||
|
Linux:
|
||
|
% gcc -O3 lzwfilter.c lzwlib.c -o lzwfilter
|
||
|
% gcc -O3 lzwtester.c lzwlib.c -o lzwtester
|
||
|
|
||
|
Darwin/Mac:
|
||
|
% clang -O3 lzwfilter.c lzwlib.c -o lzwfilter
|
||
|
% clang -O3 lzwtester.c lzwlib.c -o lzwtester
|
||
|
|
||
|
MS Visual Studio:
|
||
|
cl -O2 lzwfilter.c lzwlib.c
|
||
|
cl -O2 lzwtester.c lzwlib.c
|
||
|
|
||
|
There are Windows binaries (built on MinGW) for the filter and the tester on the
|
||
|
GitHub release page (v3). The "help" display for the filter looks like this:
|
||
|
|
||
|
Usage: lzwfilter [-options] [< infile] [> outfile]
|
||
|
|
||
|
Operation: compression is default, use -d to decompress
|
||
|
|
||
|
Options: -d = decompress
|
||
|
-h = display this "help" message
|
||
|
-1 = maximum symbol size = 9 bits
|
||
|
-2 = maximum symbol size = 10 bits
|
||
|
-3 = maximum symbol size = 11 bits
|
||
|
-4 = maximum symbol size = 12 bits
|
||
|
-5 = maximum symbol size = 13 bits
|
||
|
-6 = maximum symbol size = 14 bits
|
||
|
-7 = maximum symbol size = 15 bits
|
||
|
-8 = maximum symbol size = 16 bits (default)
|
||
|
-v = verbose (display ratio and checksum)
|
||
|
|
||
|
Here's the "help" display for the tester:
|
||
|
|
||
|
Usage: lzwtester [options] file [...]
|
||
|
|
||
|
Options: -1 ... -8 = test using only specified max symbol size (9 - 16)
|
||
|
-0 = cycle through all maximum symbol sizes (default)
|
||
|
-e = exhaustive test (by successive truncation)
|
||
|
-f = fuzz test (randomly corrupt compressed data)
|
||
|
-q = quiet mode (only reports errors and summary)
|
||
|
|