Introduction to Data Compression∗ Guy E. Blelloch Computer Science Department Carnegie Mellon University blellochcs.cmu.edu
September 25, 2010
Contents 1 Introduction
3
2 Information Theory 2.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Entropy of the English Language . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Conditional Entropy and Markov Chains . . . . . . . . . . . . . . . . . . . . . . .
5 5 6 7
3 Probability Coding 3.1 Prefix Codes . . . . . . . . . . . . . . . . . 3.1.1 Relationship to Entropy . . . . . . 3.2 Huffman Codes . . . . . . . . . . . . . . . 3.2.1 Combining Messages . . . . . . . . 3.2.2 Minimum Variance Huffman Codes 3.3 Arithmetic Coding . . . . . . . . . . . . . 3.3.1 Integer Implementation . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
10 10 11 13 15 15 16 19
4 Applications of Probability Coding 4.1 Run-length Coding . . . . . . 4.2 Move-To-Front Coding . . . . 4.3 Residual Coding: JPEG-LS . . 4.4 Context Coding: JBIG . . . . 4.5 Context Coding: PPM . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
22 25 26 27 28 29
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
∗
. . . . .
. . . . .
This is an early draft of a chapter of a book I’m starting to write on “algorithms in the real world”. There are surely many mistakes, and please feel free to point them out. In general the Lossless compression part is more polished than the lossy compression part. Some of the text and figures in the Lossy Compression sections are from scribe notes taken by Ben Liblit at UC Berkeley. Thanks for many comments from students that helped improve the presentation. c 2000, 2001 Guy Blelloch
1