|The Tale of the|
Suppose someone (let's call her Alice) wants to send a message to her friend (let's call him Bob), but is worried that someone else (an eavesdropper, let's call her Eve) might intercept that message. Cryptography is all about how Alice can send messages (or data) that is encoded so that Bob can recover the original message, but Eve cannot (or else it would take Eve a very long time to decode the message). (This is a slightly vague and imprecise definition of cryptography: professional cryptographers (people who study and design new methods of encryption) and cryptanalysts (people who try to decipher messages without knowing how they've been encrypted) would talk about the difference between cryptography, cryptology, codes, ciphers, etc. But we're not going to worry about these details here!)
There are lots of situations where you want to send messages or data securely. If you are buying something on the internet from, say, Amazon, then your (or your parents!) credit card details have to be transmitted from your computer to Amazon's computer. These details are encoded whilst they are being transmitted so that, should some criminals intercept this data, then they cannot easily decrypt it to obtain your credit card details. In this case, the form of encryption used is often something called RSA encryption and uses some rather advanced mathematics.
However, there are many other places where a simpler form of encryption might be used. For example, there are several film-review forums on the internet where people encrypt possible plot spoilers so as not to spoil the film for people who haven't seen it, but is easily decipherable for others. One way of doing this is called ROT13, where every letter of the alphabet is advanced 13 places along (so A becomes N, B becomes O, etc); this is an example of what is called a Caesar cipher
Just to establish two important pieces of terminology: the original message that Alice wants to send is called the plaintext and she enciphers it as the ciphertext. Bob receives the ciphertext and wants to decipher it to obtain the plaintext.
The Atbash cipher
The Atbash cipher is one of the most well-known (and easiest) methods of encryption. Each `A' in the plaintext is replaced by `Z' in the ciphertext, each `B' in the plaintext in replaced by `Y' in the ciphertext, etc; i.e. the plaintext alphabet is reversed to give the ciphertext alphabet. (Sometimes the Atbash cipher is called the reversing cipher.)
The flyer for the competition had the following ciphertext on it:
DVOO WLMV LM WVXRKSVIRMT GSRH SRWWVM NVHHZTV! DSB MLG ULIN Z GVZN DRGS BLFI UIRVMWH ZMW GZPV KZIG RM GSRH BVZI'H XIBKGLTIZKSB XLNKVGRGRLM?
This is enciphered using the Atbash cipher. Now that you know what type of cipher is used, why not try to decipher the message yourself before you read the next paragraph.
The Atbash cipher is called a substitution cipher: each letter of the plaintext is replaced by a different letter in the ciphertext:
How could we have cracked this cipher if we didn't know that the Atbash cipher was being used?
The first thing a cryptanalyst should do is to look at the ciphertext and see if there are any patterns or any features that allow a way in to break the code. In the above text, it looks like each block of letters represent words and the spaces are, as normal, spaces between words. Note that the ciphertext contains the word `Z'. There are only two words in English that have one letter in: `I' and `A'. Let's guess that `Z' corresponds to `A'. Further in the text is the three-letter word `ZMW'. As `Z' stands for `A', this means that `ZMW' is a three-letter word that starts with `A'. A good guess for this would be that `ZMW' stands for `AND'.
This is starting to look like the Atbash cipher, and one might try to see if this guess is correct.
Alternatively, instead of looking at words, one could look at letters and use a method known as `frequency analysis'. In the ciphertext, the letters `V', `G' and `Z' frequently occur. The most commonly occuring letters in English are E, T and A. If we try V, G and Z standing for E, T and A, respectively, then we have:
This suggests that ZMW is probably AND, and this allows us to fill in more of the plaintext. GVZN is TEA? and the only common words of this form are TEAM and TEAR; hence Z is either M or R. The OO in the first word is a repeated letter that appears at the end of a word, so is probably LL. Continuing in this way we can eventually decode the message from the competition flyer as
Well done on deciphering this hidden message! Why not form a team with your friends and take part in this year's cryptography competition?
Other substitution ciphers
The Atbash cipher is very easy to break. Instead of reversing the alphabet, we could use a more general substitution cipher. For example, we could encrypt a message using the following substitutions
So for example the message `ATTACK AT DAWN' is enciphered as `QSSQTZ QS UQHM'.
This type of encryption is called a `monoalphabetic substitution cipher'. There is a very large number of different substitution ciphers (can you say how many?), far too many to be able to check each one. Instead, one needs to look for common words or use frequency analysis, as we did above.
There are other, more devious, variants of substitution ciphers. One was used in the Sherlock Holmes story `The Adventure of the Dancing Men' (which you can read for free on Project Gutenberg). Here Sherlock Holmes came across secret messages like the following:
(image reproduced from Project Gutenberg). This is encoded using a substitution cipher, where the ciphertext uses images of dancing men to represent letters of the alphabet. Instead of using a space, the end of a word is denoted by a man holding a flag.
There are other ways to make substitution ciphers harder to break. The main weakness in a substitution cipher is that (if the message is long enough) it can usually be broken by frequency analysis. This is because we are just using one symbol in the ciphertext to represent one letter in the plaintext, hence the frequencies of symbols in the ciphertext are the same as the frequencies of letters in the plaintext. If instead, one allowed different symbols to represent the same letter, then this makes frequency analysis much harder. For example:
|A||00 27 38||N||10 32 42|
|B||18||O||02 33 40|
|E||17 19 36 44||R||11 43|
|G||21||T||24 30 39|
|H||06 29||U||12 41|
then we could encipher `ELLIE' as `19 34 23 16 44'.
Simon Singh's The codebook is an excellent, and highly readable, account of cryptography and codebreaking.
A very thorough history of codebreaking up until the 1950s can be found in David Kahn's The Codebreakers.
You can read Sherlock Holmes and the Adventure of the Dancing Men on Project Gutenberg here.