The Digital World-Lab 4
Assigned: February 13
Due: Monday, February 23
These are some exercises on data compression. You are to perform
a number of simple experiments with the sample text, image, and audio
files that I have provided, and then try as best you can to explain the
results. Huffman coding, which we explained in detail in class,
can provide a good intuition for this and help you to explain most of
the results in Exercise 1, but the actual lossless compression method
that you have built in to your computer (zip) works on different
principles. Zip uses a dictionary-based approach and will
compress well if it encounters long sequences of bytes that occur
repeatedly in the file. For example, if the bytes in the file are
(represented here in decimal)
0 1 2 .....255 0 1 2 ......255 0 1 2 ....255
repeated over and over again, zip will compress these to a very small
file, whereas Huffman coding will not compress at all, since it
gives equal weight to all possible byte values. (On the other
hand, Huffman coding in which each pair of successive bytes is treated
as a single symbol will perform very well).
1. The sample files for this exercise consist of (a) the text of The Telltale Heart; (b) a file
consisting of ten thousand randomly-generated ASCII characters; (c)
bitmapped images of three paintings, along with a bitmapped image that
is entirely blue; (d) our wave file of a brief passage played on the
flute; (e) a square wave (the solution to one of the problems in Lab 2).
Download each of these files, compress it, note both the original file
size and the size of the compressed file, and compute the compression
ratio. Try to explain the different compression ratios obtained
for the various text files, various image files, and various audio
files.
2. (a) What happens if you try to compress a file that already
has been compressed? Neither Mac OS nor Windows will let you take
a file of type .zip and try to compress it, but you can trick the
operating system by renaming the files---for instance, just change the
'.zip' to .txt'. This doesn't change the bytes in the file at
all, but you can now go ahead and apply zip again. Do this with
several of the zip files you created in Exercise 1. Note the
results, and try to account for them.
*(b) Explain why it is not possible to devise a data compression method
that is guaranteed to reduce the size of every file.
3. Suppose we applied Huffman coding to the .bmp file that is entirely
blue. What would the size of the compressed file be? (For
purposes of this problem, you can assume that the .bmp file consists
EXCLUSIVELY of the encodings of the pixels, and ignore the 54-bit
header.)