CS 074 The Digital World

Fall 2007
Computer Science Department
The College of Arts and Sciences
Boston College

About Syllabus Texts Problem Sets
Staff Resources Grading Projects

Problem Set 2: Working with Bits

Assigned: Tuesday September 11, 2007
Due: Tuesday September 18, 2007
Points: 10 points.

Before beginning the exercises, review the instructions on the problem sets page on how to submit assignments, and the last section of this lab on what to submit. Don't be intimidated by the lengthy instructions for Exercise 3 --- we'll go over them in class. Exercise 4 is for dessert.

Exercise 1: File Sizes

Review the material on powers of two and the exact meaning of kilobyte, megabyte, etc. Examine the sizes of several files, both large (>1MB in size), medium (10 or 20 KB) and small (<2KB). If you wish, you can make small text files by typing a few lines with the text editor, described in the next Exercise.

In Windows you find the file size by right-clicking on the file icon and selecting "Properties" from the pop-up menu. You will see four numbers displayed--the "size" given both in units of KB or MB and as an exact number of bytes; and the "size on disk" similarly expressed in two different ways.

On a Mac, you get somewhat less information: You find the file size by clicking on the file icon, and then selecting "Get Info" from the "File Menu". You will see the size on disk given in units of KB or MB, as well as the exact size in bytes.

(a) Note down the results you see--you should get results from at least three different files. Pay close attention to the numbers you see for "size on disk".

(b) Describe as precisely as you can the relationship between (i) the exact size in bytes; (ii) the size expressed in KB or MB (if you're using Windows); (iii) the "size on disk" expressed in KB or MB.

(c) Suppose a file's exact size is 8100 bytes. What do you think will be its "size on disk" expressed in KB? What if the exact size is 8200 bytes? Explain how you found the answer.

Exercise 2: Play with the Text Editor

Review the material on ASCII and Unicode in the lecture notes. Although you don't really need it, you can find Unicode tables on the web. The page will give you tables of the two-byte Unicode encodings for many different character sets. The course software package includes a file called TextEditor.jar, a very simple text editor application. This permits you to create and edit text files in either ASCII or Unicode format. To launch the program, double-click on the icon:

When the editor window appears, type a few lines of text. Make sure to include some blank spaces, several different lines, and some tabs, but don't go overboard, as you will need to count all the characters you type. When you've finished, save the result (using the option "Save" from the "File" menu) in the format "US ASCII-Basic Latin Characters" with the name "file1".

(a) What is the exact relationship between the number of visible characters, spaces, tabs, and lines in the file, and the exact file size in bytes? Does a space count for one byte? no bytes? more? Does a tab (which looks like 6 or so spaces) count as 6 bytes? less? more? What about a blank line? You may have to experiment a little, revising the file, to be sure of the answer. (This is not a trick question; in the end, the answer is very simple and probably what you thought.)

(b) Type a file that consists only of the line of text

"a"b"c"d"e"f"g"h
and save it in the format "US-ASCII-Basic Latin Characters" with the name file2. (If you're using a Mac, or if you have Chinese fonts installed on your Windows computer, you might instead try typing OaObOcOdOeOfOgOh.)

Now select Open from the File menu, and open the very same file but in the format "UTF-16 Unicode".

What do you see? Pay particular attention to the number of characters you typed, and the number of characters that are actually displayed. How do you account for the result? Be as precise as possible.

Exercise 3: Edit the HTML Source of a Web Page

You may have already used fancy software for editing and creating Web pages. Since our goal in this course is understanding rather than productivity or efficiency, I will often have you work with plain and simple tools, so you can see in detail how things work. In this exercise you will use a text editor to edit a Web page.

Save a copy of the following web page to your CS074 folder. The "File" menu of your Web browser will offer you several alternative methods for saving it, either as a "complete Web page" or "HTML only"---exactly how the alternatives are described depends upon the browser you use. You should save in the minimal format, which will be described as something like "HTML only" or "Page Source".

You can open the saved copy of the page in your Web browser. Choose the "Open file" option from the "File" menu of the browser--again, exactly how this is phrased will depend on the browser you are using--and navigate to the copy that you saved. If you did the download correctly, you should see the web page, but without the picture.

Keep the browser window open, start up the Text Editor program, and use it to open the saved copy of the page. You'll see something completely different: The file consists entirely of text. This text file is in a language called HTML (which stands for HyperText Markup Language)---it consists of both the text that appears on the page, and instructions to the Web browser about how to display the page. These instructions are enclosed between < and >. Your Web browser will also allow you to view the HTML source of a page, but by opening it in the Text Editor, you can modify the page. In this exercise you will make a few changes to the page.

(a) Change my name to yours. Use the text editor to replace both occurrences of my name by yours and save the result. (Use the US-ASCII format.) To see the changes in the page, click the Reload button on the browser window.

What's hyper about HyperText is this business about links-the text you click on to go to another Web page. If you look at the HTML source of one of these links you'll see:

<a href="http://en.wikipedia.org/wiki/Ducks"> duck </a>

The "tag" a href tells the browser that this is a link. The thing in quotation marks is the location of the file that contains the HTML code of this simple page. The phrase contained between the start and end of the tag is of course the link text that is displayed on the page.

Now, links don't have to link to other web pages --- they can link to images or audio files or many other sorts of file. Your Web browser contains some facility for displaying or playing these. Also, what you click on doesn't have to be text. Both these things are illustrated in the link:

<a href="Duck01.jpg"><img src="Duck01.jpg" height="125" width="125"></a>

The link target is an image file (again, on the department Web server) called "Duck01.jpg". The thing you click on is the same image: the tag img src tells the browser to display the image file "Duck01.jpg" on the page, and the other stuff tells the browser to fit the image to certain dimensions.

(b) Find two .jpg files (they could be pictures of you, but they don't have to be---you can get any .jpg files you like off of the web) and save copies of them into the same folder as your saved copy of the web page. Edit the link with the picture of me so that (i) the link shown on the page is one of your two pictures and (ii) when you click on this link, the other picture is displayed.

One of the instructions on the page specifies the background color. This specifies the color as six hex digits, in this case EEEEEE. But the way to read this is as the hexadecimal representations of three separate byte values: EE EE EE. These represent, respectively, the red, green and blue components of the color. Higher means brighter, and when all three components are the same, you get some shade of grey. FF FF FF (255 255 255 in decimal) gives you white. EE EE EE (224 224 224 in decimal) is a very light grey, FF 00 00 (255 0 0) is a very rich red. (We'll have more to say in a couple of weeks about this way of representing colors.)

(c) Edit the hex encoding of the background color so that it the background becomes yellow. (It's not obvious---not to me, at any rate--how to get yellow by mixing different proportions of red, green and blue. You can come up with the answer by guessing and trying it out. Another way is to hunt around in some standard application like Microsoft Word or Power Point that contains drawing tools or ways to change the color of text. These include some way of specifying a "custom color" in terms of its red, green and blue components.)

Exercise 4: Fun and Games

Amazing Telepathic Program

Computers are now capable of reading minds, as you can see from this on-line demonstration of computer ESP. The site also contains many efforts astonished users of this program have made to explain this extraordinary phenomenon. You might want to offer your own explanation. (Try the test once, then think hard about it. This doesn't have a whole lot to do with computer science, except that it's a good workout for your powers of logical reasoning, powers that come in pretty handy in this course.)

Twenty Questions

Here is another "mind-reading" program. In class I will demonstrate a twenty questions game. Try to stump it (but don't try too hard; think about relatively familiar objects) and provide a transcript of your exchange with the program. (You can copy this directly from the website and paste it into the answers you submit.)

This one does have quite a lot to do with computer science. We'll discuss in class how a program like this might work, and return to the point later on.

There is a Chisel in My Cat

In a course filled with marvelous things computers can do, it's refreshing to see something that they're really really bad at. You'll find several pages on the Web for language translation. This one is part of Google's site:

http://www.google.com/language_tools?hl=en

Type a phrase in English, use the tool to translate it into another language, and then translate it back into English. You don't have to know a foreign language, and you don't have to type anything--just copy and paste. If you use French and German, you can do a circle of three translations English-->French-->German-->English, or the other way around. These circles give more amusing results than just translating into one language and back again.

The goal of the exercise is to find perfectly reasonable English sentences that yield ridiculous translations when processed in this way. Submit your silliest example. When I translated "My cat has fleas" into Japanese and back into English again, I got the title of this exercise.

It's worth thinking about why this is so hard for computers. If you find a language translation website that you think does a better job than this one, let me know.

What to Submit and How to Submit it

The general instructions are contained on the problem set page.For this problem set, your submission folder should contain: (a) A text file containing your answers to the questions in Exercises 1 and 2, your theory--if you have one--about the ESP program, your conversation with 20Q, and your ridiculous translation; (b) (c) the edited HTML file for Exercise 3, along with the two image files.