Assignment 1 Solutions

Problem 1. It's a bunch of arithmetic, and I decided to let Python do the work for me. If you have 1 million dollars to spend, then you can buy 2000 processors and thus process 2 trillion keys a second. If the key length is k bits, then at worst you will have to examine all \(2^k\) keys, and on average, you'll have to examine half of them,or \(2^{k-1}\). I don't care which you use: the two answers will differ by a factor of 2, but this does not change things in terms of feasibility.

In [21]:
def how_long(keys_per_sec,bits_per_key):
    num_keys=2**bits_per_key
    sec=1.0*num_keys/keys_per_sec
    hou=sec/3600
    day=hou/24
    yea=day/365
    print bits_per_key," bits"
    print sec," seconds;",hou," hours;",day," days;",yea," years"
    

We'll compute these running times for both the original scenario of two trillion keys per second, and the enhanced scenario, which can test 100 times as many keys.

In [22]:
how_long(2.0e12,40)
print
how_long(2.0e12,56)
print
how_long(2.0e12,88)
print
how_long(2.0e12,128)
print
print
how_long(2.0e14,40)
print
how_long(2.0e14,56)
print
how_long(2.0e14,88)
print
how_long(2.0e14,128)
40  bits
0.549755813888  seconds; 0.000152709948302  hours; 6.36291451259e-06  days; 1.74326425003e-08  years

56  bits
36028.797019  seconds; 10.0079991719  hours; 0.416999965497  days; 0.0011424656589  years

88  bits
1.54742504911e+14  seconds; 42984029141.9  hours; 1791001214.24  days; 4906852.64176  years

128  bits
1.7014118346e+26  seconds; 4.72614398501e+22  hours; 1.96922666042e+21  days; 5.3951415354e+18  years


40  bits
0.00549755813888  seconds; 1.52709948302e-06  hours; 6.36291451259e-08  days; 1.74326425003e-10  years

56  bits
360.28797019  seconds; 0.100079991719  hours; 0.00416999965497  days; 1.1424656589e-05  years

88  bits
1.54742504911e+12  seconds; 429840291.419  hours; 17910012.1424  days; 49068.5264176  years

128  bits
1.7014118346e+24  seconds; 4.72614398501e+20  hours; 1.96922666042e+19  days; 5.3951415354e+16  years

There is a sharp divide in feasibility between 56 bits and 88 bits. The hundred-fold speedup in technology did not change this at all. (Nor would a hundred-fold slowdown.) That is the point of the problem.

Problem 2. If you know so much as a single plaintext character in the Caesar cipher then you immediately deduce the key from the corresponding ciphertext character. This observation applies to a chosen-plaintext attack as well: just ask for the encryption of 'a'.

For the Vigenere cipher, knowing a single plaintext character, is only a little help; you can launch the same attack as in the ciphertext-only case, and will have just slightly less work to do. But if you get r contiguous plaintext characters where r is the length of the key or more, than the whole key is revealed and the message decrypted. For a chosen plaintext attack, you can ask for the encryption of "AAAAAAAAAA". Since keys are assumed to be no more than ten characters long, this will reveal the entire key.

For the monoalphabetic substitution cipher, if just a small number of plaintext characters are known and they are the encryptions of rare letters, you get only a marginal advantage. For instance, you might be given that a particular letter is the encryption of the only 'Z' in the plaintext (a `Q' would be more helpful!) It is more help if the known plaintext consists of common letters. A chosen-plaintext attack completely breaks the cipher if you are able to request the encryption of 25 different letters. If you are only able to choose a few letters, ask for the encryptions of the most common ones: EATOIN.

Problem 3. This is a cookbook problem: We'll use the vigenere tools that were provided exactly as in the posted example to determine first the key length, then the key, and finally the plaintext.

In [23]:
import vigenere
ct="hdsfgvmkoowafweetcmfthskucaqbilgjofmaqlgspvatvxqbiryscpcfrmvswrvnqlszdmgaoqsakmlupsqforvtwvdfcjzvgsoaoqsacjkbrsevbelvbksarlscdcaarmnvrysywxqgvellcyluwwveoafgclazowafojdlhssfiksepsoywxafowlbfcsocylngqsyzxgjbmlvgrggokgfgmhlmejabsjvgmlnrvqzcrggcrghgeupcyfgtydycjkhqluhgxgzovqswpdvbwsffsenbxapasgazmyuhgsfhmftayjxmwznrsofrsoaopgauaaarmftqsmahvqecev"
for j in range(3,10):
    print j,vigenere.count_coincidences(ct,j)
3 15
4 25
5 13
6 13
7 15
8 23
9 15

4 and 8 appear as likely key lengths. Key length 4 makes more sense, since this would give almost all the same repetitions when we count the coincidences with a candidate key length of 8.

In [24]:
texts=vigenere.extract_subtexts(ct,4)
for t in texts:
    print vigenere.maxscore(t)
('n', 0.05290581395348836)
('o', 0.05828372093023256)
('e', 0.05823372093023257)
('s', 0.057975581395348835)

The likely key is 'noes'. We use this to recover the plaintext:

In [25]:
key=vigenere.inverse_key('noes')
print vigenere.vigencrypt(key,ct)
uponthisbasisiamgoingtoshowyouhowabunchofbrightyoungfolksdidfindachampionamanwithboysandgirlsofhisownamanofsodominatingandhappyindividualitythatyouthisdrawntohimasisaflytoasugarbowlitisastoryaboutasmalltownitisnotagossipyyarnnorisitadrymonotonousaccountfullofsuchcustomaryfillinsasromanticmoonlightcastingmurkyshadowsdownalongwindingcountryroad

Inserting some spacing and punctuation gives:

Upon this basis I am going to show you how a bunch of bright young folks did find a champion. A man with boys and girls of his own. A man of so dominating and happy individuality that youth is drawn to him as is a fly to a sugar bowl. It is a story about a small town. It is not a gossipy yarn, nor is it a dry monotonous account full of such custormary fillins as romantic moonght casting murky shadows down a long winding country road.

The key gives away what is unusual about this passage: There are no E's! The method of choosing the maximum score is robust enough that it picks out the correct decryption even when we distort the typical letter distribution profile in this way. There is a bit of story behind this: In 1937, Ernest Vincent Wright published an entire novel, 'Gadsby', that did not use the letter 'e'. The passage in the problem is, I believe, from this novel. Fifty years later, Georges Perec, a French writer, repeated this feat with 'La disparition' (The Disappearance). This is at least as difficult to do in French as in English. Perec's novel was translated into a number of different languages; the English translation, by Gilbert Adair, is called 'A Void'.

Let's do the second ciphertext:

In [26]:
ct="ocwyikoooniwugpmxwktzdwgtssayjzwyemdlbnqaaavsuwdvbrflauplooubfgqhgcscmgzlatoedcsdeidpbhtmuovpiekifpimfnoamvlpqfxejsmxmpgkccaykwfzpyuavtelwhrhmwkbbvgtguvtefjlodfefkvpxsgrsorvgtajbsauhzrzalkwuowhgedefnswmrciwcpaaavogpdnfpktdbalsisurlnpsjyeatcuceesohhdarkhwotikbroqrdfmzghgucebvgwcdqxgpbgqwlpbdaylooqdmuhbdqgmyweuik"
for j in range(3,11):
    print j,vigenere.count_coincidences(ct,j)
3 11
4 6
5 15
6 23
7 8
8 5
9 12
10 11

In [27]:
texts=vigenere.extract_subtexts(ct,6)
for t in texts:
    print vigenere.maxscore(t)
('h', 0.05613846153846155)
('o', 0.06448846153846154)
('l', 0.06032115384615385)
('m', 0.06014807692307693)
('e', 0.05932884615384615)
('s', 0.06268461538461538)

In [28]:
print vigenere.vigencrypt(vigenere.inverse_key('holmes'),ct)
holmeshadbeenseatedforsomehoursinsilencewithhislongthinbackcurvedoverachemicalvesselinwhichhewasbrewingaparticularlymalodorousproducthisheadwassunkuponhisbreastandhelookedfrommypointofviewlikeastrangelankbirdwithdullgreyplumageandablacktopknotsowatsonsaidhesuddenlyyoudonotproposetoinvestinsouthafricansecurities

Holmes had been seated for some hours in silence with his long thin back curved over a chemcal vessel in which was brewing a particularly malodorous product. His head was sunk upon his breast, and he looked from my point of view like a strange lank bird with dull grey plumage and a black topknot. 'So, Watson,' said he suddenly. 'You do not propose to invest in South African securities?"

(Evidently a Sherlock Holmes story.)

Problem 3a. The code below simply finds the highest-scoring decryption, using the score function from the Vigenere file. We use it to decrypt the sample ciphertext.

In [30]:
import caesarcipher

def caesar_crack(ciphertext):
    candidates=caesarcipher.all_decryptions(ciphertext)
    maxscore=0
    maxtext=''
    for text in candidates:
        newscore=vigenere.score(text)
        if newscore>maxscore:
            maxtext=text
            maxscore=newscore
    return maxtext
In [31]:
print caesar_crack("lzwkgdmlagfaktqfgewsfkkgvaxxaumdlskqgmeayzltwdwvlgaesyafwxjgelzwxajklzsklqafkhwulagfgxlzwuzsjsulwjk")
thesolutionisbynomeanssodifficultasyoumightbeledtoimaginefromthefirsthastyinspectionofthecharacters

The solution is by no means so difficult as you might be led to imagine from the first hasty inspection of the characters.

(An extract from 'The Gold Bug', by Edgar Allen Poe, where one of the characters describes the solution of a monoalphabetic substitution cipher.)

The following example, submitted by a student, shows how this method can fail, especially with a short ciphertext. The only possible decryption that is ordinary English is the ciphertext itself (i.e., a shift of 0), which reads, 'I need a nap'. But a different shift gives a text with a higher score.

In [32]:
print caesar_crack('ineedanap')
mriiheret

Problem 3b. Cracking Autokey cipher with known keylength.

If you know the key length, then you can proceed very much like the Vigenere cipher: Extract the subtexts as before, using the known key length 8 to determine how many subtexts to create. Now guess the first key letter. This reveals the first plaintext letter. This in turn is the key letter for the 9th character, which gives you the 9th plaintext letter, and so on for the 17th, 25th, etc. Put differently, each subtext is the encryption of the corresponding subtext of the plaintext, using an autokey cipher with a key length of 1. So we just need to try out all 26 possible first key letters for each of the 8 subtexts, decrypt it as an autokey-encrypted text with the given key letter, and find the choice that gives the best score. We can take the code for maxscore and tweak it a little bit. The details are in the code below, along with application to the sample ciphertext.

In [34]:
import autokey

def maxscore(text):
    maxval=0
    maxletter='?'
    for keyletter in 'abcdefghijklmnopqrstuvwxyz':
        newscore=vigenere.score(autokey.autokey_decrypt(text,keyletter))
        if newscore>maxval:
            maxval=newscore
            maxletter=keyletter
    return(maxletter,maxval)

def autokey_crack(ciphertext,length):
    subtexts=vigenere.extract_subtexts(ciphertext,length)
    derived_key=''
    for text in subtexts:
        v=maxscore(text)
        derived_key+=v[0]
    return autokey.autokey_decrypt(ciphertext,derived_key)
In [35]:
ct="craoshsvtsnvfjirmigvhzzijmbbvjzbomsaliipkvhfahhpvvuwiehgkmmmtmsavbcmligxohbpwhsfapypqjohtbvegkjqayvalgdrdsghuyaramjvwlvlinlqixplnxifkuflnesohimpopoyoagizlsvvostsckpgfyxyagyrjtrudoudooegvfdsebxosqelbrxzgnykkiuoukmdfwwlaswxmalswepofinlqiyvpdzvgrgzqvwfpxmpwfsglmlcfrdtullbvvhcybgnrfhjzqfobbpgaqjahffcuerkmgtwrarltwnljigkelbfprpexosfbrrftdaxeprktzapplslyyibyefm"
print autokey_crack(ct,8)
youobservetherearenodivisionsbetweenthewordshadtherebeendivisionsthetaskwouldhavebeencomparativelyeasyinsuchcaseishouldhavecommencedwithacollationandanalysisoftheshorterwordsandhadawordofasingleletteroccurredasismostlikelyishouldhaveconsideredthissolutionasassuredbuttherebeingnodivisionmyfirststepwastoascertainthepredominantlettersaswellastheleastfrequent

You observe there are no divisions between the words. Had there been divisions, the task would have been comparatively easy. In such case I should have commenced with a collation and analysis of the shorter words, and had a word of a single letter occurred, as is most likely, I should have considered this solution as assured. But there being no division, my first step was to ascertain the predominant letters as well as the least frequent.

More "Gold Bug"

Probem 3c. There's little more here than in 3b. Assuming a key length of no more than 10, we can apply the preceding method for every length key, and choose the solution that gives the best score.

In [36]:
def autokey_crack2(ciphertext):
    candidates=[autokey_crack(ciphertext,length) for length in range(1,11)]
    maxscore=0
    maxtext=''
    for text in candidates:
        newscore=vigenere.score(text)
        if newscore>maxscore:
            maxtext=text
            maxscore=newscore
    return maxtext
In [37]:
ct="ajametwsvzhghszvvcyhrrtgyrhsffxjmliaxzgnwngkhceefjkmmxwxtmrvydubrpwqktnqcgwrgdipfztswsnqnvvfhmyhkmzxzxdwibbmebiagbrjyuazvvxnljiqgtxealzmexbczsfmuigkbfmlhnoyfhrzuaqnhqgegnhkhlutewqgsgvhkzmnssbnqkwswdlskukqtippuaqnhqgeqnsvhrnynecifppgdamtaixzvtelwziv"
print autokey_crack2(ct)
ihavesolvedothersofanabstrusenesstenthousandtimesgreatercircumstancesandacertainbiasofmindhaveledmetotakeinterestinsuchriddlesanditmaywellbedoubtedwhetherhumaningenuitycanconstructanenigmaofthekindwhichhumaningenuitymaynotbyproperapplicationresolve

Still more "Gold Bug": I have solved others of an abstruseness ten thousand times greater. Circumstances, and a certain bias of mind, have led me to take and interest in such riddles, and it may well be doubted whether human ingenuity can construct an enigma of the kind which human ingenuity may not, by proper application, resolve.