The Digital World-Lab 4

Assigned: February 13
Due: Monday, February 23

These are some exercises on data compression. You are to perform a number of simple experiments with the sample text, image, and audio files that I have provided, and then try as best you can to explain the results. Huffman coding, which we explained in detail in class, can provide a good intuition for this and help you to explain most of the results in Exercise 1, but the actual lossless compression method that you have built in to your computer (zip) works on different principles. Zip uses a dictionary-based approach and will compress well if it encounters long sequences of bytes that occur repeatedly in the file. For example, if the bytes in the file are (represented here in decimal)

0 1 2 .....255 0 1 2 ......255 0 1 2 ....255

repeated over and over again, zip will compress these to a very small file, whereas Huffman coding will not compress at all, since it gives equal weight to all possible byte values. (On the other hand, Huffman coding in which each pair of successive bytes is treated as a single symbol will perform very well).

1. The sample files for this exercise consist of (a) the text of The Telltale Heart; (b) a file consisting of ten thousand randomly-generated ASCII characters; (c) bitmapped images of three paintings, along with a bitmapped image that is entirely blue; (d) our wave file of a brief passage played on the flute; (e) a square wave (the solution to one of the problems in Lab 2).

Download each of these files, compress it, note both the original file size and the size of the compressed file, and compute the compression ratio. Try to explain the different compression ratios obtained for the various text files, various image files, and various audio files.

2. (a) What happens if you try to compress a file that already has been compressed? Neither Mac OS nor Windows will let you take a file of type .zip and try to compress it, but you can trick the operating system by renaming the files---for instance, just change the '.zip' to .txt'. This doesn't change the bytes in the file at all, but you can now go ahead and apply zip again. Do this with several of the zip files you created in Exercise 1. Note the results, and try to account for them.

*(b) Explain why it is not possible to devise a data compression method that is guaranteed to reduce the size of every file.

3. Suppose we applied Huffman coding to the .bmp file that is entirely blue. What would the size of the compressed file be? (For purposes of this problem, you can assume that the .bmp file consists EXCLUSIVELY of the encodings of the pixels, and ignore the 54-bit header.)