CS 074 The Digital World

Fall 2007
Computer Science Department
The College of Arts and Sciences
Boston College

About Syllabus Texts Problem Sets
Staff Resources Grading Projects

Problem Set 4: Huffman Coding and Image Files


Assigned: Saturday October 6, 2007
Due: Sunday October 14, 2007
Points: 30

As with the preceding problem set, we have included a large number of problems at varying levels of difficulty. A "perfect" score is 30 points.

Uncompressed Text

  1. (3 Points) What is the following when interpreted with ASCII codes?

        01000101010000010101001101011001
       

    Answer: EASY -->

Compression

Remember that the Huffman coding algorithm uses a frequency table and a "cut-in-line" queue to construct a Huffman coding tree which can be used to compute short strings of bits for each of the items being compressed. Remember that when items are put in such a queue, they cut in front of the first item in line of heavier weight. This means that they go behind items in line of equal weight.

You can writeup your solution to the following Huffman problems using the TextEditor.jar application. In particular, you can use the text to render Huffman coding trees as suggested in the following:

         12
        /  \
       8    4
      / \  / \
  

  1. (3 Points) Construct a frequency table for the following text.

    
       Oh, that was Manny being Manny, all right. 
      

    Answer:

    MOab eghi lnrs twy_ ,.
    2151 1232 2511 3127 21
    -->

  2. (5 Points) Show a Huffman coding tree that would encode the information in the table constructed in the previous problem.

  3. (5 Points) What is the sequence of bits representing the compressed text? What is the space-savings as a percentage of the size of the original text?

  4. (12 Points) A piece of text has been compressed using the Huffman coding algorithm. As it processed the text, the algorithm computed the following frequency table:

    ABEF GHIN OTUV Y_!
    1131 2222 2411 161

    For example, the letter 'E' occurred 3 times in the original text. Assuming that the items are initially inserted in the queue in alphabetical order (with the space, depicted as '_', before '!'), what is the uncompressed text corresponding to the following compressed representation:

    100100101010110110011101111101100110011111010111011101000011101100110101001010110000110100
     
    (NB: The start of the compressed form is on the left.)

    Answer: The queue evolves as follows:

    A-1, B-1, F-1, U-1, V-1, Y-1, !-1, G-2, H-2, I-2, N-2, O-2, E-3, T-4, _-6
    
                                                         2
                                                        / \
    F-1, U-1, V-1, Y-1, !-1, G-2, H-2, I-2, N-2, O-2, A-1 B-1, E-3, T-4, _-6
    
                                               2         2
                                              / \       / \
    V-1, Y-1, !-1, G-2, H-2, I-2, N-2, O-2, A-1 B-1,  F-1 U-1, E-3, T-4, _-6
    
                                     2         2         2
                                    / \       / \       / \
    !-1, G-2, H-2, I-2, N-2, O-2, A-1 B-1,  F-1 U-1,  V-1 Y-1, E-3, T-4, _-6
    
                           2         2         2              3
                          / \       / \       / \            / \
    H-2, I-2, N-2, O-2, A-1 B-1,  F-1 U-1,  V-1 Y-1,  E-3, !-1 G-2, T-4, _-6
    
                 2         2         2              3              4
                / \       / \       / \            / \            / \
    N-2, O-2, A-1 B-1,  F-1 U-1,  V-1 Y-1,  E-3, !-1 G-2, T-4,  H-2 I-2, _-6
    
       2         2         2              3              4        4
      / \       / \       / \            / \            / \      / \
    A-1 B-1,  F-1 U-1,  V-1 Y-1,  E-3, !-1 G-2, T-4,  H-2 I-2, N-2 O-2, _-6
    
                                                           4
                                                         /   \
       2              3              4        4        2       2
      / \            / \            / \      / \      / \     / \
    V-1 Y-1,  E-3, !-1 G-2, T-4,  H-2 I-2, N-2 O-2, A-1 B-1 F-1 U-1, _-6
    
                                            4                5
                                          /   \            /   \
       3              4        4        2       2        2     E-3
      / \            / \      / \      / \     / \      / \   
    !-1 G-2, T-4,  H-2 I-2, N-2 O-2, A-1 B-1 F-1 U-1, V-1 Y-1     , _-6
    
                             4                5                 7
                           /   \            /   \             /   \
       4        4        2       2        2     E-3         3     T-4
      / \      / \      / \     / \      / \               / \
    H-2 I-2, N-2 O-2, A-1 B-1 F-1 U-1, V-1 Y-1     , _-6, !-3 G-2
    
           4                5                 7            8
         /   \            /   \             /   \         /  \
       2       2        2     E-3         3     T-4     4      4
      / \     / \      / \               / \           / \    / \
    A-1 B-1 F-1 U-1, V-1 Y-1     , _-6, !-3 G-2    , H-2 I-2 N-2 O-2
    
                                                       9
                                                    /     \
                                                  /         \
                7            8                 4                5     
               /  \         /  \             /   \            /   \  
             3    T-4     4      4         2       2        2     E-3
            / \          / \    / \       / \     / \      / \       
     _-6, !-3 G-2    , H-2 I-2 N-2 O-2, A-1 B-1 F-1 U-1  V-1 Y-1
    
                                    9                    13
                                 /     \                /  \
                               /         \             /    \
          8                 4                5      _-6      7
         /  \             /   \            /   \            / \
       4      4         2       2        2     E-3         3  T-4
      / \    / \       / \     / \      / \               / \
    H-2 I-2 N-2 O-2, A-1 B-1 F-1 U-1  V-1 Y-1     ,     !-3 G-2 
    
                                       17
                                     /       \
                                   /            \
          13                     /                    9
         /  \                  /                  /      \
        /    \                /                 /           \            
      _-6      7            8                4                5    
              / \         /  \             /   \            /   \  
             3  T-4     4      4         2       2        2     E-3
            / \        / \    / \       / \     / \      / \       
          !-3 G-2  , H-2 I-2 N-2 O-2, A-1 B-1 F-1 U-1  V-1 Y-1     
    
                             30
                          /      \
                       /                17
                    /                /      \
                 /                 /           \
          13                     /                  9
         /  \                  /                  /    \
        /    \               /                  /          \            
      _-6      7            8                4                5    
              / \         /  \             /   \            /   \  
             3  T-4     4      4         2       2        2     E-3
            / \        / \    / \       / \     / \      / \       
          !-3 G-2  , H-2 I-2 N-2 O-2, A-1 B-1 F-1 U-1  V-1 Y-1     
    
    So the characters are assigned these binary codes:

    ABEF GHIN OTUV Y_!
    1100011001111111010 0101100010011010 10110111101111100 11101000100

    And the message is:

    I HAVE NOT YET BEGUN TO FIGHT!
    -->

Image Problems

Overview

The first two image problems have grayscale images. The grayscale images are not in any standard file format. They can only be viewed in BinEd, using the "Redraw in greyscale" option, and to view them you need to know the width of the image. In particular your browser does not know how to display them, so your only option when you click on the links for these images is to save them to the disk. See the special instructions for converting these into standard .jpg format below.

The color images are in standard .bmp format. Your browser will probably display these when you click on the link, but you then should save them to your disk. If you load such a file into BinEd and try to view it using the "Redraw in Color" option at the correct width, you will see a peculiar wraparound effect at the left-hand side of the image. This is due to the 54-byte header at the start of the file. You must first remove it by applying the formula:


[n+54]

to all lines before beginning subsequent processing.

To save a color image: When you are satisfied that the new image you've created is what you want, click the "Save as Image" button. This will automatically attach the appropriate header to the image and save it as a standard .bmp file. The problem is that standard .bmp files are very large, and you will need to compress these before you put them in your submission folder. To do this, open the image in the program Paint if you are using windows. (Go to Start->Programs->Accessories--> Paint), open the .bmp file, and then choose Save As... from the file menu to save it in JPEG format. Double-clicking on the original image icon may launch Paint, so you may not need to navigate to this program. On a Mac, double-clicking on the image icon launches the program Preview, and from there you can save the file in JPEG format.

To save a grayscale image: The simplest thing to do is to use the "Save As Binary File" option from the Save/Load page. Use the extension ".gs" on the name (this just helps the grader to identify the file). The grader will view your work in BinEd. It might be preferable, if you have a lot of such files to convert them to JPEG format. Unfortunately, this is a little complicated. You first must convert it to standard .bmp format: first apply the formula [N/3] to lines 0 to 3x-1, where x is the total number of lines (displayed at the bottom of the BinaryEditor page). Make sure it looks right when you "View in Color" at the correct width, and then "Save as Image". You can then convert the resulting .bmp file to JPEG as described above.

Relating Line Numbers to Pixel Coordinates

The following two facts will be useful in some of the problems below.

  • The byte in line N is in row
    N / width
    (counting the bottom as row 0), and in column
    N % width
    (counting the leftmost column as column 0).

  • The pixel in row R and column C is represented by the byte in line
    (width * R + C)
    .
To see how this works, suppose we want to stretch the image by factor of two in the vertical direction. We need to replace the pixel in row r and column c by the pixel in row r/2 and column c. So we will replace the pixel in line N by the pixel in line:

width x ((N / width) / 2) + (N % width).

Thus, if the image is 200 pixels wide, we would use the formula: [200 * ((N / 200) / 2) + (N % 200)].

Grayscale Problems

  1. (5 points) The image of Thomas Edison on the left below is linked to a grayscale (.gs) that is 200 pixels wide. Download the grayscale to your system and use the binary editor to crop the bottom 50 and the top 50 rows as shown on the right. Explain your method.

    Answer: The Edison picture is 200 pixels wide and 41,800 bytes long. So apply the formula 255 to lines 0 through 200 x 50 and then to lines 41,799 - (200 x 50).

    -->

  2. (12 points) The grayscale image below is 264 pixels wide. Rotate the grayscale on the left clockwise 90 degrees as shown below. We'll leave it to you to figure out the coordinates of the pixel that replaces the one in row r and column c. Draw a picture to figure this out. Explain your method.

    Answer: This one's tricky, even with the cheat sheet, because the of the width of the new image is now the height of the original image, and vice-versa. Let's work through this carefully. Let's denote the width of the original image by w and the height by h. The pixel in line n of the new image is in row n/h and in column n%h. This will then be in column w-1-n/h of the original image and in row n%h of the orignal image, and thus at line number w*(h-1-n%h)+n/h. So our formula is:

        [w*(n%h)+w-1-n/h]
        

    and of course must be displayed at width h.

    -->

Color Bitmaps

  1. (8 points) Manny Ramirez hit a walk-off home run last night against the Cleveland Indians. Manny's reception at home plate is on the left. An all blue version is on the right. Describe how the blue version could be made from the regular one.

    Answer: The 24-bit color codes are BGR (i.e., blue, green and then red). So the following formula works:

    [n] - ((3 - (n % 3)) / 3)
    .

    -->

Place the text document, along with all the files you create, in a folder named PS4-Your Last Name, create the zipped archive as you did in the previous labs, and submit through WebCT.