Suppose someone (let's call her Alice) wants to send a message to her friend (let's call him Bob), but is worried that someone else (an eavesdropper, let's call her Eve) might intercept that message. Cryptography is all about how Alice can send messages (or data) that is encoded so that Bob can recover the original message, but Eve cannot (or else it would take Eve a very long time to decode the message). (This is a slightly vague and imprecise definition of cryptography: professional cryptographers (people who study and design new methods of encryption) and cryptanalysts (people who try to decipher messages without knowing how they've been encrypted) would talk about the difference between cryptography, cryptology, codes, ciphers, etc. But we're not going to worry about these details here!)
There are lots of situations where you want to send messages or data securely. If you are buying something on the internet from, say, Amazon, then your (or your parents!) credit card details have to be transmitted from your computer to Amazon's computer. These details are encoded whilst they are being transmitted so that, should some criminals intercept this data, then they cannot easily decrypt it to obtain your credit card details. In this case, the form of encryption used is often something called RSA encryption and uses some rather advanced mathematics.
However, there are many other places where a simpler form of encryption might be used. For example, there are several film-review forums on the internet where people encrypt possible plot spoilers so as not to spoil the film for people who haven't seen it, but is easily decipherable for others. One way of doing this is called ROT13, where every letter of the alphabet is advanced 13 places along (so A becomes N, B becomes O, etc); this is an example of what is called a Caesar cipher
Just to establish two important pieces of terminology: the original message that Alice wants to send is called the plaintext and she enciphers it as the ciphertext. Bob receives the ciphertext and wants to decipher it to obtain the plaintext.
The Caesar cipher
The Caesar cipher is one of the most well-known (and easiest) methods of encryption. Each letter of the plaintext is replaced by the letter that is some fixed number of places down the alphabet. For example, in a Caesar cipher with shift 3, each `A' in the plain text is replaced by `D', each `B' replaced by `E', and so on.
The flyer for the competition had the following ciphertext on it:
SAHH ZKJA KJ ZAYELDANEJC PDEO DEZZAJ IAOOWCA! SDU JKP BKNI W PAWI SEPD UKQN BNEAJZO WJZ PWGA LWNP EJ PDEO UAWN'O YNULPKCNWLDU YKILAPEPEKJ?
This is encoded using a Caesar cipher. Now that you know what the method of encryption is, why not try to decipher the ciphertext yourself before you read the next paragraph.
The message above is encrypted with a Caesar cipher with shift 22. Thus we have the following table:
There are 26 different Caesar ciphers (corresponding to each possible shift). We could have tried all 26 possible shifts and found the one that worked - but could we have been smarter? Or how could we have made a start if we did not know that a Caesar cipher had been used?
The first thing a cryptanalyst should do is to look at the ciphertext and see if there are any patterns or any features that allow a way in to break the code. In the above text, it looks like each block of letters represent words and the spaces are, as normal, spaces between words. Note that the ciphertext contains `W' and `WJZ' so these are likely to be common words. Now there are only two common English words with 1 letter in: `a' and `I'. So `W' probably represents `A' or `I'. Common 3 letter words in English include: `and' and `the'. So if `W' represent `A' then `WJZ' could be `AND'. This suggests:
This is starting to look like a Caesar cipher with shift 22, and one might try to see if this guess is correct.
How could we have cracked this cipher if we didn't know that a Caesar cipher was being used?
Alternatively, instead of looking at words, one could look at letters and use a method known as `frequency analysis'. In the ciphertext, the letters `A', `P' and `W' frequently occur. The most commonly occuring letters in English are E, T and A. If we try A, P and W standing for E, T and A, respectively, then we have:
This suggests that WJZ is probably AND and this allows us to fill in more of the plaintext. PAWI is TEA? and the only common words of this form are TEAM and TEAR; hence I is either M or R. OO occurs in the ciphertext and so probably represents SS, EE, TT, FF or LL as these are the most commonly repeated letters. Continuing in this way we can eventually decode the message from the competition advert as
Well done on deciphering this hidden message! Why not form a team with your friends and take part in this year's cryptography competition?
Other substitution ciphers
Caesar ciphers are very easy to break. Instead, we could use a more general substitution cipher. For example, we could encrypt a message using the following substitutions
So for example the message `ATTACK AT DAWN' is enciphered as `QSSQTZ QS UQHM'.
This type of encryption is called a `monoalphabetic substitution cipher'. There is a very large number of different substitution ciphers (can you say how many?), far too many to be able to check each one. Instead, one needs to look for common words or use frequency analysis, as we did above.
There are other, more devious, variants of substitution ciphers. One was used in the Sherlock Holmes story `The Adventure of the Dancing Men' (which you can read for free on Project Gutenberg). Here Sherlock Holmes came across secret messages like the following:
(image reproduced from Project Gutenberg). This is encoded using a substitution cipher, where the ciphertext uses images of dancing men to represent letters of the alphabet. Instead of using a space, the end of a word is denoted by a man holding a flag.
There are other ways to make substitution ciphers harder to break. The main weakness in a substitution cipher is that (if the message is long enough) it can usually be broken by frequency analysis. This is because we are just using one symbol in the ciphertext to represent one letter in the plaintext, hence the frequencies of symbols in the ciphertext are the same as the frequencies of letters in the plaintext. If instead, one allowed different symbols to represent the same letter, then this makes frequency analysis much harder. For example:
|A||00 27 38||N||10 32 42|
|B||18||O||02 33 40|
|E||17 19 36 44||R||11 43|
|G||21||T||24 30 39|
|H||06 29||U||12 41|
then we could encipher `ELLIE' as `19 34 23 16 44'.
Simon Singh's The Codebook is an excellent, and highly readable, account of cryptography and codebreaking.
A very thorough history of codebreaking up until the 1950s can be found in David Kahn's The Codebreakers.