Let's begin with some plaintext:
p='tobeornottobethatisthequestionwhethertisnoblerinthemindtosuffertheslingsandarrowsofoutrageousfortuneortotakearmsagainstaseaoftroublesandbyopposingendthem'
print p
...and a key:
k='hamlet'
print k
The following function performs encryption:
def vigencrypt(key,plaintext):
keylength=len(key)
ciphertext=''
for j in range(len(plaintext)):
base=ord(plaintext[j])-ord('a')
shift=ord(key[j%keylength])-ord('a')
ciphertext+=chr(ord('a')+(base+shift)%26)
return ciphertext
Let's encrypt with the plaintext-key pair above:
c=vigencrypt(k,p)
print c
Decryption is the same as encryption, using the inverse key:
def inverse_key(key):
inv=''
for ch in key:
shift=ord(ch)-ord('a')
newshift=(-shift)%26
newch=chr(ord('a')+newshift)
inv+=newch
return inv
invk=inverse_key(k)
print invk
print vigencrypt(invk,c)
Let's plot the distribution of letters in the plaintext:
%matplotlib inline
import pylab
def plot_distribution(letterstring):
dist=[0]*26
for ch in letterstring:
dist[ord(ch)-ord('a')]+=1
pylab.xlim(-1,27)
pylab.ylim(0,20)
dist.sort()
pylab.stem(range(26),dist)
pylab.show()
The plot shows the frequencies of letters beginning with the least frequently occurring to the most frequently occurring. Even for this small sample, it has the characteristic skewed distribution of letters in English, with three letters accounting for 35% of the plaintext, and five letters not appearing at all.
plot_distribution(p)
We similarly plot the distribution of the letter frequencies in the ciphertext, scaled to the same height.
plot_distribution(c)
You can see that it's far more uniform. Every letter appears at least once in the ciphertext, and no letter appears more than 8% of the time. This makes a straightforward frequency analysis quite a bit more difficult to carry out.
The cryptanalysis takes place in two phases. In the first phase, we count how many ciphertext letters match the letter m positions further on, for m between 3 and 10. As argued in the notes, the value of m that gives the largest number of coincidences is the likely key length.
def count_coincidences(ciphertext,shift):
count = 0
for j in range(len(ciphertext)-11):
if ciphertext[j]==ciphertext[j+shift]:
count += 1
return count
for shift in range(3,11):
print shift,':',count_coincidences(c,shift)
There's no ambiguity about the result! The likely key length is 6.
This lets us partition the ciphertext into six subtexts. Each of the texts was obtained from the corresponding portion of the plaintext by encrypting with the same key letter, and so should have a frequency distribution resembling that of English. The function below returns a list of the subtexts:
def extract_subtexts(ciphertext,keylength):
subtexts=['']*keylength
for j in range(len(ciphertext)):
subtexts[j%keylength]+=ciphertext[j]
return subtexts
Print the subtexts. You should see that if you read the list vertically from top to bottom and then horizontally, you get back the original ciphertext.
s=extract_subtexts(c,6)
for j in range(6):
print s[j]
The second phase of cryptanalysis works on these subtexts. Each of them was obtained from the plaintext by shifting using a single letter of the key. For each subtext we want to find the key letter that gives the most English-like letter distribution to the resulting decryption.
We assign a score to a text by computing the sum
where
The first function below just computes a score for a
given text. The array
def score(text):
freq=[0.0817, 0.0153, 0.0224, 0.0470, 0.121, 0.0217, 0.0210, 0.0606, 0.0724, 0.0010, 0.0090, 0.0379, 0.0315, 0.0684, 0.0773, 0.0170, 0.0009, 0.0575, 0.0605, 0.0885, 0.0283, 0.0092, 0.0260, 0.0013, 0.0226, 0.0002]
dist=[0]*26
for ch in text:
dist[ord(ch)-ord('a')]+=1
result=0
for j in range(26):
result+=dist[j]*freq[j]
return result/len(text)
Just to see how this works, let's calculate scores for the original plaintext, and for some gibberish (the original ciphertext).
print score(p)
print score(c)
In principle, ordinary English should give a score of
about p
The next function takes a text, decrypts it under all possible one-letter keys, and returns both the maximum score and the key letter that led to the maximum score.
def maxscore(text):
maxval=0
maxletter='?'
for keyletter in 'abcdefghijklmnopqrstuvwxyz':
newscore=score(vigencrypt(inverse_key(keyletter),text))
if newscore>maxval:
maxval=newscore
maxletter=keyletter
return(maxletter,maxval)
Now let's apply this to each of our 6 subtexts:
for j in range(6):
print maxscore(s[j])
The cryptanalysis succeeded in recovering the key 'hamlet' starting only from the ciphertext.