Binary Representation of Data

Henceforth we view data not as sequences of letters, but as sequences of bytes. Our standard representation of a sequence of bytes in Python will be a list of integers in the range 0 to 255. Here is a way to generate a random sequence of ten bytes in Python.

In [2]:
import random
somebytes = [random.randint(0,255) for j in range(10)]
print somebytes
[202, 127, 23, 112, 13, 199, 25, 252, 58, 152]

Every printable character has a standard representation as a single byte (its ASCII code). The following function (defined in the file bytestuff.py) converts an character string into its standard representation.

In [3]:
def ascii_to_bytes(instring):
    return [ord(c) for c in instring]
In [4]:
print ascii_to_bytes("What's up, Doc?")
[87, 104, 97, 116, 39, 115, 32, 117, 112, 44, 32, 68, 111, 99, 63]

The package provides an inverse function to ascii_to_bytes. Keep in mind that not every byte is the ASCII code of a printable character, so if you apply this to an arbitrary byte sequence, the result will look a bit strange.

In [5]:
def bytes_to_ascii(bytelist):
    s=''
    for b in bytelist:
        s += chr(b)
    return s
In [6]:
bytes_to_ascii(somebytes)
Out[6]:
'\xca\x7f\x17p\r\xc7\x19\xfc:\x98'

We have better ways of writing arbitrary strings of bytes. We can represent the individual bits. This function returns a string of 0's and 1's, grouped into blocks of 8.

In [10]:
def bin8(val):
    s=bin(val)
    leadingz='0'*(10-len(s))
    return leadingz+s[2:]

def bytes_to_bin(bytelist):
    result=''
    for b in bytelist:
        result+=bin8(b)+' '
    return result
In [11]:
print bytes_to_bin(somebytes)
11001010 01111111 00010111 01110000 00001101 11000111 00011001 11111100 00111010 10011000 

It's actually quite hard to look at a lot of 0's and 1's and keep them straight. A more human-readable version is to represent each block of 4 bits by a single hex digit (0,1,..,9,a,b,c,d,e,f).

In []:
def hex2(val):
    s=hex(val)
    leadingz='0'*(4-len(s))
    return leadingz+s[2:]
    result=''
    

def bytes_to_hex(bytelist):
    result=''
    for b in bytelist:
        result+=hex2(b)+' '
    return result
In []:
print bytes_to_hex(somebytes)

A more compact string representation of an arbitrary byte sequence is base 64 encoding, where each of the lower- and upper-case letters, along with + and / (64 symbols in all) is used to represent a block of 6 bits. Additional padding symbols = or == are added if the total number of bits in the byte sequence is not divisible by 6.

In []:
import base64
def bytes_to_b64(bytelist):
    return base64.b64encode(bytes_to_ascii(bytelist))
In []:
def b64_to_bytes(b64string):
    return ascii_to_bytes(base64.b64decode(b64string))
In []:
u=bytes_to_b64(somebytes)
print u
b64_to_bytes(u)

We can also treat a sequence of bytes as the representation in binary of a single large integer. The module bytestuff contains functions for converted a byte sequence to a long integer, and a long integer back into a byte sequence.

One-time Pad revisited

Addition of letters mod 26 is replaced by addition of bits mod 2. This is the exclusive-or operation. Bitwise exclusive-or is a built-in operation (^) in Python.

In []:
def xor(bytelist1,bytelist2):
    length=len(bytelist1)
    return [bytelist1[j]^bytelist2[j] for j in range(length)]
      

Let's encrypt a string of text using the one-time pad. We'll generate a random sequence of bytes of the same length a the plaintext as the key, and display the result in base64 encoding. (You should be aware, however, that built-in random number generators like the one in Python should NOT be used in cryptographic applications, because their output is predictable. So this is not really an illustration of a perfectly secret one-time pad.)

In []:
plain="This is just to say:  I have eaten the plums that were in the icebox, and which you were probably saving for breakfast. Forgive me.  They were delicious, so sweet and so cold."
key = [random.randint(0,255) for j in range(len(plain))]
bytestring=ascii_to_bytes(plain)
cipher=xor(bytestring,key)
cipher64=bytes_to_b64(cipher)
print cipher64

Because addition and subtraction mod 2 are the same thing, decryption is identical to encryption: just xor with the key. Here we assume that the recipient has started with the base64 encoding of the ciphertext, so the decryption includes function calls to convert between the various representations.

In []:
decrypted=bytes_to_ascii(xor(b64_to_bytes(cipher64),key))
print decrypted