# What is cryptography?

Cryptography can be defined as the study of how to send messages or data in a secure way so that the intended recipient can recover the original message but an eavesdropper cannot. (This is a slightly vague and imprecise definition: professional cryptographers (people who study and design new methods of encryption) and cryptanalysts (people who try to decipher messages without knowing how they've been encrypted) would talk about the difference between cryptography, cryptology, codes, ciphers, etc. But we're not going to worry about such matters here.)

There are lots of situations where you want to send messages or data securely. If you are buying something on the internet from, say, Amazon, then your (or your parents!) credit card details have to be transmitted from your computer to Amazon's computer. These details are encoded whilst they are being transmitted so that, should some criminals intercept this data, then they cannot easily decrypt it to obtain your credit card details. In this case, the form of encryption used is often something called RSA encryption and uses some rather advanced mathematics.

However, there are many other places where a simpler form of encryption might be used. For example, there are several film-review forums on the internet where people encrypt possible plot spoilers so as not to spoil the film for people who haven't seen it, but is easily decipherable for others. One way of doing this is called ROT13, where every letter of the alphabet is advanced 13 places along (so A becomes N, B becomes O, etc). This is an example of a Caeser cipher, and is an example of what is called a substitution cipher.

#### Some methods of encryption

**The Caeser cipher**

The Caeser cipher is one of the most well-known (and easiest) methods of encryption. Each letter of the plaintext is replaced by the letter that is some fixed number of places down the alphabet. For example, in a Caeser cipher with shift 3, each `A' in the plain text is replaced by `D', each `B' replaced by `E', and so on.

The flyer for the competition had the following ciphertext on it:

PBATENGHYNGVBAF. LBH'IR SBHAQ NAQ FBYIRQ N FRPERG PBQR. JUL ABG SBEZ N GRNZ JVGU LBHE SEVRAQF, ERTVFGRE SBE GUR PBZCRGVGVBA, NAQ WBVA ZVXR NAQ RYYVR SBE N GUEVYYVAT PELCGBTENCUVP NQIRAGHER SBE GUR GHEVAT GERNFHER.

This is encoded using a Caeser cipher. Now that you know what the method of encryption is, why not try to decipher the ciphertext yourself before you read the next paragraph.

The message above is encrypted with a Caeser cipher with shift 13. Thus we have the following table:

plaintext: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |

ciphertext: | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | A | B | C | D | E | F | G | H | I | J | K | L | M |

There are 26 different Caeser ciphers (corresponding to each possible shift). We could have tried all 26 possible shifts and found the one that worked - but could we have been smarter? Or how could we have made a start if we did not know that a Caeser cipher had been used?

The first thing a cryptanalyst should do is to look at the ciphertext and see if there are any patterns or any features that allow a way in to break the code. In the above text, it looks like each block of letters represent words and the spaces are, as normal, spaces between words. Note that the ciphertext contains `N' and `NAQ' several times, so these are likely to be common words. Now there are only two common English words with 1 letter in: `a' and `I'. So `N' probably represents `A' or `I'. Common 3 letter words in English include: `and' and `the'. So if `N' represent `A' then `NAQ' could be `AND'. This suggests:

plaintext: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |

ciphertext: | N | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | A | ? | ? | D | ? | ? | ? | ? | ? | ? | ? | ? | ? |

This is starting to look like a Caeser cipher with shift 13, and one might try to see if this guess is correct.

Alternatively, instead of looking at words, one could look at letters and use a method known as `frequency analysis'. In the ciphertext, the letters `R', `G' and `N' frequently occur. The most commonly occuring letters in English are E, T and A. If we try R, G and N standing for E, T and A, respectively, then we have:

plaintext: | P | B | A | T | E | N | G | H | Y | N | G | V | B | A | F | . | L | B | H | ' | I | R | S | B | H | A | Q | N | A | Q | F | B | Y | I | R | Q | N | F | R | P | E | R | G | P | B | Q | R | . | J | U | L | A | B | G | S | B | E | Z | ||||||||||

ciphertext: | ? | ? | ? | ? | ? | A | T | ? | ? | ? | T | ? | ? | ? | ? | . | ? | ? | ? | ' | ? | E | ? | ? | ? | ? | ? | A | ? | ? | ? | ? | ? | ? | E | ? | A | ? | E | ? | ? | E | T | ? | ? | ? | E | . | ? | ? | ? | ? | ? | T | ? | ? | ? | ? |

plaintext: | N | G | R | N | Z | J | V | G | U | L | B | H | E | S | E | V | R | A | Q | F | , | E | R | T | V | F | G | R | E | S | B | E | G | U | R | P | B | Z | C | R | G | V | G | V | B | A | , | N | A | Q | W | B | V | A | Z | V | X | R | |||||||||||

ciphertext: | A | T | E | A | ? | ? | ? | T | ? | ? | ? | ? | ? | ? | ? | ? | E | ? | ? | ? | , | ? | E | ? | ? | ? | T | E | ? | ? | ? | ? | T | ? | E | ? | ? | ? | ? | E | T | ? | T | ? | ? | ? | , | A | ? | ? | ? | ? | ? | ? | ? | ? | ? | E |

plaintext: | N | A | Q | R | Y | Y | V | R | S | B | E | N | G | U | E | V | Y | Y | V | A | T | P | E | L | C | G | B | T | E | N | C | U | V | P | N | Q | I | R | A | G | H | E | R | S | B | E | G | U | R | G | H | E | V | A | T | |||||||||

ciphertext: | A | ? | ? | E | ? | ? | ? | E | ? | ? | ? | A | T | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | ? | T | ? | ? | ? | A | ? | ? | ? | ? | A | ? | ? | E | ? | T | ? | ? | E | ? | ? | ? | T | ? | E | T | ? | ? | ? | ? | ? |

plaintext: | G | E | R | N | F | H | E | R | . |

ciphertext: | T | ? | ? | ? | ? | ? | ? | E | . |

This suggests that GUR is probably THE and NAQ is probably AND, and this allows us to fill in more of the plaintext. GRNZ is TEA? and the only common words of this form are TEAM and TEAR; hence Z is either M or R. YY occurs in several places and so probably represents SS, EE, TT, FF or LL as these are the most commonly repeated letters. Continuing in this way we can eventually decode the message from the competition advert as

Congratulations. You've found and solved a secret code. Why not form a team with your friends, register for the competition, and join Mike and Ellie for a thrilling cryptographic adventure for the Turing treasure.

**Other substitution ciphers**

The Caeser cipher is very easy to break. There are only 26 different Caeser ciphers, one for each possible shift. To decrypt a message that you know is encrypted using a Caeser cipher you could just try all 26 possibilities to find the one that worked.

Instead of shifting the alphabet, we could use a more general substitution cipher. For example, we could encrypt a message using the following substitutions

plaintext: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |

ciphertext: | Q | E | T | U | O | A | D | G | J | L | Z | C | B | M | W | R | Y | I | P | S | F | H | K | X | V | N |

So for example the message `ATTACK AT DAWN' is enciphered as `QSSQTZ QS UQHM'.

This type of encryption is called a `monoalphabetic substitution cipher'. With a Caeser cipher there are only 26 possible ciphers; but with a substitution cipher there are millions (can you say how many?), far too many to be able to check each one. Instead, one needs to look for common words or use frequency analysis, as we did above.

There are other, more devious, variants of substitution ciphers. One was used in the Sherlock Holmes story `The Adventure of the Dancing Men' (which you can read for free on Project Gutenberg). Here Sherlock Holmes came across secret messages like the following:

(image reproduced from Project Gutenberg). This is encoded using a substitution cipher, where the ciphertext uses images of dancing men to represent letters of the alphabet. Instead of using a space, the end of a word is denoted by a man holding a flag.

There are other ways to make substitution ciphers harder to break. The main weakness in a substitution cipher is that (if the message is long enough) it can usually be broken by frequency analysis. This is because we are just using one symbol in the ciphertext to represent one letter in the plaintext, hence the frequencies of symbols in the ciphertext are the same as the frequencies of letters in the plaintext. If instead, one allowed different symbols to represent the same letter, then this makes frequency analysis much harder. For example:

plaintext: | ciphertext: | plaintext: | ciphertext: |

A | 00 27 38 | N | 10 32 42 |

B | 18 | O | 02 33 40 |

C | 04 37 | P | 09 |

D | 20 28 | Q | 26 |

E | 17 19 36 44 | R | 11 43 |

F | 05 | S | 01 31 |

G | 21 | T | 24 30 39 |

H | 06 29 | U | 12 41 |

I | 16 35 | V | 03 |

J | 07 | W | 14 |

K | 15 | X | 13 |

L | 23 34 | Y | 08 |

M | 22 | Z | 25 |

then we could encipher `ELLIE' as `19 34 23 16 44'.

#### Further reading

Simon Singh's The codebook is an excellent, and highly readable, account of cryptography and codebreaking.

A very thorough history of codebreaking up until the 1950s can be found in David Kahn's The Codebreakers.

You can read Sherlock Holmes and the Adventure of the Dancing Men on Project Gutenberg here.