CS074: The Digital World
Lab 1 Solutions
Exercise 1-File Sizes
Review the material in the first lecture on powers of two and the exact
meaning of kilobyte, megabyte, etc.
(a)
Examine the sizes of several files, both large (larger than1MB in
size),
medium (10 or 20 KB) and small (less than 2KB). You can use the
TextEditor to create the small files, as described above. Note down the
results you see--you should get results from at least
three different files. Pay close attention to the numbers you see for
"size on disk". You should also indicate what the file is (e.g.,
an audio file in your iTunes library, a Microsoft Word document, the
program file that contains Microsoft Word itself, etc.)
Answer: I did this on a Mac. The
brief text file described in (c) below has 43 bytes; the same brief
text saved as a Microsoft Word .doc file has 22,528 bytes, and the file
containing the application code for Microsoft Word itself contains
62,586,055 bytes. The "size on disk" for each of these files was,
respectively, 4KB, 24KB, 63.7KB.
(b) Describe as precisely as you can the relationship between (i)
the exact size in
bytes; (ii) the size expressed in KB or MB (if you're using Windows);
(iii) the "size on disk" expressed in KB or MB. Suppose a file's exact
size is 8100 bytes. What do you think will be its
"size on disk" expressed in KB? What if the exact size is 8200
bytes? Explain how you found the answer.
Answer: The size on disk appears
to be the file size rounded up to the nearest multiple of 4KB.
(The reason is that the storage on the disk itself is divided into 4KB
sectors.) To test this theory, we should consider what happens
with a file that is exactly 8100 bytes long, and one that is 8200 bytes
long. Since 8 KB = 8192 bytes, a file with 8100 bytes should
occupy 8KB on disk, but one with 8200 bytes should occupy 12KB on
disk. I tried this experiment adapting the procedure outlined in
(d) below to make large text files, and found that the size on disk was
as predicted.
Now, if you haven't already done so, use the Text Editor to create a
small file. Make sure to include some blank spaces, several
different lines, and
some tabs, but don't go overboard, as you will need to count all the
characters you type. When you've finished, save the result (using
the option "Save" from the "File" menu) in the format "US ASCII-Basic
Latin Characters" with the name "file1".
(c) What is the exact
relationship between the number of visible
characters, spaces, tabs, and lines in the file, and the exact file
size in bytes? Does a space count for one byte? no bytes?
more? Does a tab (which looks like 6 or so spaces) count as 6
bytes? less? more? What about a blank line? You may have to experiment
a little, revising the
file, to be sure of the answer. (This is not a trick question; in the
end, the answer is very simple and probably what you thought.)
My text file looks like this:
This is a sample file.
I hope you like
it.
Answer :The exact file size is 43
bytes. The file contains 33 visible characters (the letters and
the periods), 7 spaces, 1 tab, and 3 lines altogether. The two
'Returns' I typed to advanced to the next line added one byte to the
file, and each tab and space added one byte, so the total is 2 + 1 + 7
+ 33 =43. Briefly, each character is a single byte, but
'character' includes such non-printing characters as spaces, tabs and
new lines.
(d) Use the text editor to
make a text file that is more than 1 MB in size, with as little typing
as possible. Tell me how you did it, but PLEASE don't mail me the
file!
Answer: Type a long line of text, say
100 characters. Then repeatedly select the whole document, copy,
and paste. Each time you do this, the document size doubles, so
that after n repetitions, the size of the document is 100 x 2n
bytes long. We get 1 MB when 2n is
larger than 10,000 (actually, somewhat larger even than that, since 1MB
is more than 1 million bytes), so 14 repetitions are sufficient.
If you really want to do as little typing
as possible, you can start with a single character, and do 20
repetitions.
Exercise 2 -Make a Web Page
I'll just give the answers to (d) and
(e), since these are really the only places where you have to figure
something out.
One of the instructions on the page specifies the background
color. The the color as six hex digits, in this case
EEEEEE. But the way to read this is as the hexadecimal
representations of three separate byte values: EE EE EE. These
represent, respectively, the red, green and blue components of the
color. Higher means brighter, and when all three components are the
same, you get some shade of grey. FF FF FF (255 255 255 in
decimal) gives you white. EE EE EE (224 224 224 in decimal) is a very
light grey, FF 00 00 (255 0 0) is a very rich red.
(I'll show you in class how to convert the hex digits to decimal, and
vice-versa.) We'll
have more to say in a couple of weeks about this way of representing
colors.
(d) Change the Background Color.
Edit the hex encoding of the background color so that it the
background becomes yellow.
(It's not obvious---not to me, at any
rate--how to get yellow by mixing different proportions of red, green
and blue. You can come up with the answer by guessing and trying it
out. Another way is to hunt around in some standard application
like Microsoft Word or Power Point that contains drawing tools or ways
to change the color of text. These include some way of specifying
a "custom color" in terms of its red, green and blue components.)
Answer: Yellow, somewhat
unexpectedly, is obtained by mixing green and red, so setting the text
color to FFFF00 will do it. You can reason it out like this, but you
can also find the hex encodings of many colors in applications that
contain drawing or painting tools. After poking around PowerPoint
on my Mac I found FFFF00 as the encoding of bright yellow, and FFFF33,
FFFFF9 for more muted yellows.
(e) How do you...? Suppose
you
wanted the text of your web page to contain the symbols < and >.
The problem is that if you just typed them as text, your browser
would try to interpret them as part of tags and get all fouled up (try
it and see). So how do you do it? You can find the answer
by perusing an HTML manual, which is fine, but there's also a quicker
way to figure it out. Whichever way you choose, revise the text
of your web page so that it contains these symbols. In the
writeup of your solutions, explain how you did it.
Answer: I just used the View HTML
Source option in my browser for this page that contains the lab
exercise, and looked for how the paragraph above was encoded!
What I saw was:
Suppose
you wanted the text of your web page to contain the symbols <
and >.
evidently, < is used to encode <, and >
is used to encode >. If you place these encodings in your
HTML, you'll get the desired result. (The letters lt stand for 'less
than', and 'gt' for greater than.)
I did not include 'Answers' to the Fun and Games Exercises. Some
of you had amusing theories about the ESP program!!