What follows is an extract from the short story 'The Gold Bug'
(1843), by Edgar Allan Poe, giving
a very detailed description of the cryptanalysis of a
monoalphabetic substitution cipher.
Here Legrand submitted the parchment to my inspection. The following
characters were rudely traced between the death's-head and the goat:
53‡‡†305))6*;4826)4‡.)4‡);806*;48†8
¶60))85;1‡(;:‡*8†83(88)5*†;46(;88*96
*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8
¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡
1;48†85;4)485†528806*81(‡9;48;(88;4
(‡?34;48)4‡;161;:188;‡?;
"But," said I, returning him the slip, "I am as much in the dark as
ever. Were
all the jewels of Golconda awaiting me upon my solution of this
enigma, I am
quite sure that I should be unable to earn them."
"And yet," said Legrand, "the solution is by no means so difficult
as you might
be led to imagine from the first hasty inspection of the characters.
These
characters, as any one might readily guess, form a cipher — that is
to say, they
convey a meaning; but then, from what is known of Kidd, I could not
suppose him
capable of constructing any of the more abstruse cryptographs. I
made up my
mind, at once, that this was of a simple species — such, however, as
would
appear, to the crude intellect of the sailor, absolutely insoluble
without the key."
"And you really solved it?"
"Readily; I have solved others of an abstruseness ten thousand times
greater.
Circumstances, and a certain bias of mind, have led me to take
interest in such
riddles, and it may well be doubted whether human ingenuity can
construct an
enigma of the kind which human ingenuity may not, by proper
application,
resolve. In fact, having once established connected and legible
characters, I
scarcely gave a thought to the mere difficulty of developing their
import.
"In the present case — indeed in all cases of secret writing — the
first
question regards the language of the cipher; for the principles of
solution, so far,
especially, as the more simple ciphers are concerned, depend upon,
and are
varied by, the genius of the particular idiom. In general, there is
no alternative but
experiment (directed by probabilities) of every tongue known to him
who attempts
the solution, until the true one is attained. But, with the cipher
now before us, all
difficulty was removed by the signature. The pun upon the word
'Kidd' is
appreciable in no other language than the English. But for this
consideration I
should have begun my attempts with the Spanish and French, as the
tongues in
which a secret of this kind would most naturally have been written
by a pirate of
the Spanish main. As it was, I assumed the cryptograph to be
English.
"You observe there are no divisions between the words. Had there
been
divisions, the task would have been comparatively easy. In such case
I should
have commenced with a collation and analysis of the shorter words,
and, had a
word of a single letter occurred, as is most likely, (a or I, for
example,) I should
have considered this solution as assured. But, there being no
division, my first
step was to ascertain the predominant letters, as well as the least
frequent.
Counting all, I constructed a table, thus:
Of the character 8 there are 33.
; " 26.
4 " 19.
‡ ) " 16.
* " 13.
5 " 12.
6 " 11.
†1 " 8.
0 " 6.
9 2 " 5.
: 3 " 4.
? " 3.
¶ " 2.
— " 1.
"Now, in English, the letter which most frequently occurs is e.
Afterwards, the
succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z.
E predominates
so remarkably that an individual sentence of any length is rarely
seen, in which it
is not the prevailing character.
"Here, then, we have, in the very beginning, the groundwork for
something
more than a mere guess. The general use which may be made of the
table is
obvious — but, in this particular cipher, we shall only very
partially require its aid.
As our predominant character is 8, we will commence by assuming it
as the e of
the natural alphabet. To verify the supposition, let us observe if
the 8 be seen
often in couples — for e is doubled with great frequency in English
— in such
words, for example, as 'meet,' 'fleet,' 'speed,' 'seen,' 'been,'
'agree,' &c. In the
present instance we see it doubled no less than five times, although
the
cryptograph is brief.
"Let us assume 8, then, as e. Now, of all words in the language,
'the' is most
usual; let us see, therefore, whether there are not repetitions of
any three
characters, in the same order of collocation, the last of them being
8. If we
discover repetitions of such letters, so arranged, they will most
probably
represent the word 'the.' Upon inspection, we find no less than
seven such
arrangements, the characters being ;48. We may, therefore, assume
that ;
represents t, 4 represents h, and 8 represents e — the last being
now well
confirmed. Thus a great step has been taken.
"But, having established a single word, we are enabled to establish
a vastly
important point; that is to say, several commencements and
terminations of other
words. Let us refer, for example, to the last instance, but one, in
which the
combination ;48 occurs — not far from the end of the cipher. We know
that the ;
immediately ensuing is the commencement of a word, and, of the six
characters
succeeding this 'the,' we are cognizant of no less than five. Let us
set these
characters down, thus, by the letters we know them to represent,
leaving a space
for the one unknown —
t eeth.
"Here we are enabled, at once, to discard the 'th' as forming no
portion of the
word commencing with the first t; since, by experiment of the entire
alphabet for a
letter adapted to the vacancy, we perceive that no word can be
formed of which
this th can be a part. We are thus narrowed into
t ee,
and, going through the alphabet, if necessary, as before, we arrive
at the
word 'tree,' as the sole possible reading. We thus gain another
letter, r,
represented by (, with the words 'the tree' in juxtaposition.
"Looking beyond these words, for a short distance, we again see the
combination ;48, and employ it by way of termination to what
immediately
precedes. We have thus this arrangement:
the tree ;4(‡?34 the,
or, substituting the natural letters, where known, it reads thus:
the tree thr‡?3h the.
"Now, if, in place of the unknown characters, we leave blank spaces,
or
substitute dots, we read thus:
the tree thr...h the,
when the word 'through' makes itself evident at once. But this
discovery gives
us three new letters, o, u and g, represented by ‡ ? and 3.
"Looking, now, narrowly, through the cipher for combinations of
known
characters, we find, not very far from the beginning, this
arrangement,
83(88, or egree,
which, plainly, is the conclusion of the word 'degree,' and gives us
another
letter, d, represented by †.
"Four letters beyond the word 'degree,' we perceive the combination
;46(;88.
"Translating the known characters, and representing the unknown by
dots, as
before, we read thus:
th.rtee.
an arrangement immediately suggestive of the word 'thirteen,' and
again
furnishing us with two new characters, i and n, represented by 6 and
*.
"Referring, now, to the beginning of the cryptograph, we find the
combination,
53‡‡†.
"Translating, as before, we obtain
. good,
which assures us that the first letter is A, and that the first two
words are 'A
good.'
"It is now time that we arrange our key, as far as discovered, in a
tabular
form, to avoid confusion. It will stand thus:
5 represents a
† " d
8 " e
3 " g
4 " h
6 " i
* " n
‡ " o
( " r
; " t
"We have, therefore, no less than ten of the most important letters
represented, and it will be unnecessary to proceed with the details
of the
solution. I have said enough to convince you that ciphers of this
nature are
readily soluble, and to give you some insight into the rationale of
their
development. But be assured that the specimen before us appertains
to the very
simplest species of cryptograph. It now only remains to give you the
full
translation of the characters upon the parchment, as unriddled. Here
it is:
'A good glass in the bishop's hostel in the devil's seat forty-one
degrees and
thirteen minutes northeast and by north main branch seventh limb
east side
shoot from the left eye of the death's-head a bee line from the tree
through the
shot fifty feet out.' "
"But," said I, "the enigma seems still in as bad a condition as
ever. How is it
possible to extort a meaning from all this jargon about 'devil's
seats,' 'death's
heads,' and 'bishop's hotels?' "