CS 074 The Digital World

Fall 2007
Computer Science Department
The College of Arts and Sciences
Boston College

About Syllabus Texts Problem Sets
Staff Resources Grading Projects

Problem Set 4: Huffman Coding and Image Files


Assigned: Saturday October 6, 2007
Due: Sunday October 14, 2007
Points: 30

As with the preceding problem set, we have included a large number of problems at varying levels of difficulty. A "perfect" score is 30 points.

Uncompressed Text

  1. (3 Points) What is the following when interpreted with ASCII codes?

        01000101010000010101001101011001
       

Compression

Remember that the Huffman coding algorithm uses a frequency table and a "cut-in-line" queue to construct a Huffman coding tree which can be used to compute short strings of bits for each of the items being compressed. Remember that when items are put in such a queue, they cut in front of the first item in line of heavier weight. This means that they go behind items in line of equal weight.

You can writeup your solution to the following Huffman problems using the TextEditor.jar application. In particular, you can use the text to render Huffman coding trees as suggested in the following:

         12
        /  \
       8    4
      / \  / \
  

  1. (3 Points) Construct a frequency table for the following text.

    
       Oh, that was Manny being Manny, all right. 
      

  2. (5 Points) Show a Huffman coding tree that would encode the information in the table constructed in the previous problem.

  3. (5 Points) What is the sequence of bits representing the compressed text? What is the space-savings as a percentage of the size of the original text?

  4. (12 Points) A piece of text has been compressed using the Huffman coding algorithm. As it processed the text, the algorithm computed the following frequency table:

    ABEF GHIN OTUV Y_!
    1131 2222 2411 161

    For example, the letter 'E' occurred 3 times in the original text. Assuming that the items are initially inserted in the queue in alphabetical order (with the space, depicted as '_', before '!'), what is the uncompressed text corresponding to the following compressed representation:

    100100101010110110011101111101100110011111010111011101000011101100110101001010110000110100
     
    (NB: The start of the compressed form is on the left.)

Image Problems

Overview

The first two image problems have grayscale images. The grayscale images are not in any standard file format. They can only be viewed in BinEd, using the "Redraw in greyscale" option, and to view them you need to know the width of the image. In particular your browser does not know how to display them, so your only option when you click on the links for these images is to save them to the disk. See the special instructions for converting these into standard .jpg format below.

The color images are in standard .bmp format. Your browser will probably display these when you click on the link, but you then should save them to your disk. If you load such a file into BinEd and try to view it using the "Redraw in Color" option at the correct width, you will see a peculiar wraparound effect at the left-hand side of the image. This is due to the 54-byte header at the start of the file. You must first remove it by applying the formula:


[n+54]

to all lines before beginning subsequent processing.

To save a color image: When you are satisfied that the new image you've created is what you want, click the "Save as Image" button. This will automatically attach the appropriate header to the image and save it as a standard .bmp file. The problem is that standard .bmp files are very large, and you will need to compress these before you put them in your submission folder. To do this, open the image in the program Paint if you are using windows. (Go to Start->Programs->Accessories--> Paint), open the .bmp file, and then choose Save As... from the file menu to save it in JPEG format. Double-clicking on the original image icon may launch Paint, so you may not need to navigate to this program. On a Mac, double-clicking on the image icon launches the program Preview, and from there you can save the file in JPEG format.

To save a grayscale image: The simplest thing to do is to use the "Save As Binary File" option from the Save/Load page. Use the extension ".gs" on the name (this just helps the grader to identify the file). The grader will view your work in BinEd. It might be preferable, if you have a lot of such files to convert them to JPEG format. Unfortunately, this is a little complicated. You first must convert it to standard .bmp format: first apply the formula [N/3] to lines 0 to 3x-1, where x is the total number of lines (displayed at the bottom of the BinaryEditor page). Make sure it looks right when you "View in Color" at the correct width, and then "Save as Image". You can then convert the resulting .bmp file to JPEG as described above.

Relating Line Numbers to Pixel Coordinates

The following two facts will be useful in some of the problems below.

  • The byte in line N is in row
    N / width
    (counting the bottom as row 0), and in column
    N % width
    (counting the leftmost column as column 0).

  • The pixel in row R and column C is represented by the byte in line
    (width * R + C)
    .
To see how this works, suppose we want to stretch the image by factor of two in the vertical direction. We need to replace the pixel in row r and column c by the pixel in row r/2 and column c. So we will replace the pixel in line N by the pixel in line:

width x ((N / width) / 2) + (N % width).

Thus, if the image is 200 pixels wide, we would use the formula: [200 * ((N / 200) / 2) + (N % 200)].

Grayscale Problems

  1. (5 points) The image of Thomas Edison on the left below is linked to a grayscale (.gs) that is 200 pixels wide. Download the grayscale to your system and use the binary editor to crop the bottom 50 and the top 50 rows as shown on the right. Explain your method.

  2. (12 points) The grayscale image below is 264 pixels wide. Rotate the grayscale on the left clockwise 90 degrees as shown below. We'll leave it to you to figure out the coordinates of the pixel that replaces the one in row r and column c. Draw a picture to figure this out. Explain your method.

Color Bitmaps

  1. (8 points) Manny Ramirez hit a walk-off home run last night against the Cleveland Indians. Manny's reception at home plate is on the left. An all blue version is on the right. Describe how the blue version could be made from the regular one.

Place the text document, along with all the files you create, in a folder named PS4-Your Last Name, create the zipped archive as you did in the previous labs, and submit through WebCT.