< PREV | NEXT > | INDEX | SITEMAP | GOOGLE | UPDATES | BLOG | CONTACT | $Donate? | HOME

[1.0] Classical Codes & Ciphers

v1.1.0 / chapter 1 of 4 / 01 may 23 / greg goebel

* Understanding 21st-century cryptology requires a little background on the evolution of the art. Humans have been figuring out ways of sending secret messages for as long as they've had writing, and eventually developed highly sophisticated machines for the job.

MODERN CRYPTOLOGY


[1.1] ORIGINS & DEFINITIONS
[1.2] SUBSTITUTION CIPHERS / FREQUENCY ANALYSIS
[1.3] TRANSPOSITION CIPHERS / MULTIPLE ANAGRAMMING
[1.4] VIGINERE CIPHER / CHECKERBOARD CIPHERS
[1.5] NAVAL CODES / ONE-TIME KEY
[1.6] BAZERIES CYLINDER / ENIGMA / TELECIPHER MACHINES

[1.1] ORIGINS & DEFINITIONS

* The oldest means of sending secret messages is to simply hide them by one trick or another. The ancient Greek historian Herodotus wrote that when the Persian Emperor Xerxes moved to attack Greece in 480 BCE, the Greeks were warned by a Greek named Demaratus who was living in exile in Persia. In those days, wooden tablets covered with wax were used for writing. Demaratus wrote a message on the wooden tablet itself and then covered it with wax, allowing the vital information to be smuggled out of the country.

The science of sending concealed messages is known as "steganography", Greek for "concealed writing". Steganography has a long history, leading to inventions such as invisible ink and "microdots", or highly miniaturized microfilm images that could be hidden almost anywhere. Microdots are a common feature in old spy movies and TV shows. However, steganography is not very secure by itself: If someone finds the hidden message, all its secrets are revealed. That led to the idea of manipulating the message so that it could not be read even if it were intercepted, and the result was "cryptography", Greek for "hidden writing".

Cryptography takes two forms: "codes" and "ciphers". The distinction between codes and ciphers is commonly misunderstood. A "code" is essentially a secret language invented to conceal the meaning of a message. The simplest form of a code is the "jargon code", in which a particular arbitrary phrase, for an arbitrary example:

   The nightingale sings at dawn.

-- corresponds to a particular predefined message that may not, in fact shouldn't have, anything to do with the jargon code phrase. The actual meaning of this might be:

   The supply drop will take place 
   at 0100 hours tomorrow.

Jargon codes have been used for a long time, most significantly in World War II, when they were used to send commands over broadcast radio to resistance fighters. However, from a cryptographic point of view they're not very interesting. A proper code might run something like this:

   BOXER SEVEN SEEK TIGER5 AT RED CORAL

This uses "codewords" to report that a friendly military force codenamed BOXER SEVEN is now hunting an enemy force codenamed TIGER5 at a location codenamed RED CORAL. This particular code is weak in that the "SEEK" and "AT" words provide information to a codebreaker on the structure of the message. In practice, traditional military codes were often defined using "codenumbers" instead of codewords, listed in a codebook that provides a dictionary of code numbers and their equivalent words. For example, this message might be coded as:

   85772 24799 10090 59980 12487

Codewords and codenumbers are referred to collectively as "codegroups". The words they represent are referred to as "plaintext" or, more infrequently, "cleartext", "plaincode", or "placode".

Codes were unsurprisingly defined by "codebooks", or dictionaries of codegroups, listed with their corresponding plaintext. Codes originally had the codegroups in the same order as their plaintext. For example, in a code based on codenumbers, a word starting with "a" would have a low-value codenumber, while one starting with "z" would have a high-value codenumber. This meant that the same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into a plaintext message.

However, such "one-part" codes had a certain predictability that made it easier for outsiders to figure out the pattern and "crack" or "break" the message, revealing its secrets. In order to make life more difficult for codebreakers, codemakers then designed codes where there was no predictable relationship between the order of the codegroups and the order of the matching plaintext. This meant that two codebooks were required:

Trying to crack a code is like trying to translate a document written in an alien language, with the task basically amounting to building up a "dictionary" of the codegroups and the plaintext words they represent.

One fingerhold on a simple code is the fact, mentioned above, that some words are more common than others, such as "the" or "a" in English. In the days of telegraphic messages, the codegroup for "STOP" (end of sentence) was usually very common -- which helped define the structure of the message in terms of sentences, if not their meaning. Further progress can be made against a code by collecting many messages encrypted with the same code, and then obtaining intelligence background on the messages, such as the location from where a message is sent, and where it's being sent to; the time the message is sent; events occurring before and after the message is sent; and the normal habits of the people sending the coded messages.

The most obvious and, in principle at least, simplest way of cracking a code is to steal the codebook through bribery, burglary, or raids. Of course, once compromised by any means, a code has to be replaced. Constructing a new code is like devising a new language and writing a dictionary for it -- a big job. If a code is compromised, the whole task has to be done all over again, and that means a lot of work for both cryptographers and code users. In practice, in the heyday of codes, they were usually changed on a periodic basis. Once codes are created, their distribution is logistically clumsy, and the probability that the code will be compromised is high. Codes are also troublesome to mechanize, which is why they've been generally supplanted by ciphers.

* In contrast to a code, a cipher conceals a plaintext message by replacing or scrambling its letters. This process is known as "enciphering" and results in a "ciphertext" message. Converting a ciphertext message back to a plaintext message is known as "deciphering". Coded messages could be enciphered to improve their security, a process known as "superencipherment".

There are two classes of ciphers. A "substitution cipher" changes the letters in a message to another set of letters, or "cipher alphabet", while a "transposition cipher" shuffles the letters around. In some usages, the term "cipher" always means "substitution cipher", while "transpositions" are not referred to as ciphers at all. In this document, the term "cipher" will mean both substitution ciphers and transposition ciphers. It is useful to refer to them together, since the two approaches are often combined in the same cipher scheme. However, transposition ciphers will be referred to in specific as "transpositions" for simplicity.

Together, codes and ciphers are called "cryptosystems". "Encryption" covers both encoding and enciphering, while "decryption" covers both decoding and deciphering. This also implies the term "cryptotext" to cover both codetext and ciphertext, though the term "encicode" is sometimes seen instead. The science of creating codes and ciphers is known, as mentioned, as "cryptography", while the science of breaking them is known as "cryptanalysis", and the entire field defined as the science of "cryptology".

* As a footnote, with the development of means of encrypting documents, there was also a need to "authenticate" them, or to ensure that the documents were valid and not forgeries. Signatures were long used to that end -- as were "seals", which were stereotypically hand stamps with elaborate patterns that would leave an impression in a blob of wax used to, for example, seal an envelope.

Seals arose in effectively every culture with written documentation. Persons of importance could have "signet rings", a stamp as a ring, to validate documents sourced by them. Seals were not very secure, not always being particularly hard to copy -- though the authorities were inclined to take a very unforgiving view of forgery. Seals, and related schemes like watermarks and stamps, would remain standard for data authentication to modern times.

Indeed, some modern printers generate subtle codes, not visible to casual inspection, that mark documents printed on them to identify each specific printer. Much more to the point of this document, as discussed later, authentication technology has come a long way in the 21st century.

BACK_TO_TOP

[1.2] SUBSTITUTION CIPHERS / FREQUENCY ANALYSIS

* Simple substitution ciphers have been around for about as long as writing has. A simple substitution cipher in which the same cipher letter is always exchanged for the same plaintext letter is known as a "monoalphabetic substitution cipher". For example, we could define a cipher alphabet as follows:

   plaintext alphabet:   

     abcdefghijklmnopqrstuvwxyz

   ciphertext alphabet:  

     TDNUCBZROHLGYVFPWIXSEKAMQJ

Note that here, plaintext is printed in lowercase, while ciphertext is printed in uppercase. This convention will be followed in the rest of this document. Suppose Alice -- of the comedy team of Alice & Bob, who often show up in discussions of cryptology, physics, philosophy, and so on -- wants to send an encrypted message to Bob. Using this cipher alphabet, Alice can convert the plaintext:

   erase the tapes

-- into the ciphertext:

   CITXC SRC STPCX

This can be made even more cryptic by removing the spaces:

   CITXCSRCSTPCX

-- and it still remains more or less readable when translated back to plaintext:

   erasethetapes

A "keyphrase" or "key" can be used to define a cipher alphabet -- consisting of, say, a name, such as RICHARD MILHOUS NIXON, write it down while eliminating any redundant letters, and then complete the cipher by writing down the remaining letters of the alphabet in alphabetical order:

   plaintext alphabet:  

     abcdefghijklmnopqrstuvwxyz

   cipher alphabet:     

     RICHADMLOUSNXBEFGJKPQTVWYZ

This is a simple cipher algorithm, but even if a codebreaker knows that this general scheme was used, the message still cannot be read without the keyphrase, and a brute-force approach to cracking it is very difficult. This is a fundamental principle of cryptography, stated by a 19th-century Dutch linguist & cryptographer, Auguste Kerckhoffs von Niewenhof), and known as "Kerckhoffs' Principle": The security of a cipher should not depend on an enemy's ignorance of the enciphering algorithm, only the enemy's ignorance of the key. In fact, codebreaking is often focused on discovering keys, since the cipher scheme may be well understood.

* Given the large number of possible monoalphabetic substitution cipher alphabets, it might seem like a substitution cipher would be very hard to break. In reality, it's very easy if given a reasonably large ciphertext message to analyze, but it took over a thousand years to figure out how.

The basic approach for cracking a monoalphabetic substitution cipher was invented by a multi-talented medieval Arabic scholar named al-Kindi, and is now known as "frequency analysis". His work was an outgrowth of efforts by Arabs to perform textual analyses of religious texts to see if they actually were written by the Prophet.

Frequency analysis is a statistical method. In every language, some letters are used on the average more than others, and the percentages of letters in different languages tends to be constant. For example, in English text the three most common letters on the average are "e", "t", and "a", while the three least common are "x", "q", and "z". (Average letter frequencies will be different in different languages.) This means that if a codebreaker sees that "O", "G", and "B" are the most common letters in a ciphertext, then they are likely to represent "e", "t", and "a".

Frequencies of characters in any text may deviate from the average, so this mapping may not be perfectly accurate. However, frequency analysis can also be performed on pairs of letters, or "digraphs", in the ciphertext -- "ee" is common in English text, but not "aa". A codebreaker can also spot common triplets, or "trigraphs", or entire words -- "the" is the most common word in English text. Text may have predictable elements -- for example, Nazi correspondence might start with "Heil Hitler!" Such predictable phrases are known as "cribs". Military correspondence tends to follow standard formats and is often loaded with cribs.

Once given these clues, a codebreaker can then use educated guesswork to "fill in the blanks". For example, an incomplete word of the form "-u-m-ri-e" in text sent from a naval base is likely to be "submarine". This is the same skill that is used to solve crossword-puzzles, and is known to cryptologists as "anagramming". The usage of the word in cryptology is somewhat different from the popular usage, which refers to a scrambling of the letters of one word into a different word.

Of course, frequency analysis requires enough text to permit the construction of a useful table of letter frequencies. This is a general truth of cryptanalysis: the more cryptotext available, the easier the cryptotext is to crack; and on the other side of that coin, short or fragmentary cryptotexts can be difficult to crack even if they use a weak cryptographic scheme.

BACK_TO_TOP

[1.3] TRANSPOSITION CIPHERS / MULTIPLE ANAGRAMMING

* As for transposition ciphers, suppose Alice wants to send Bob the message:

   meet me near the clock tower 
   at twelve midnight tonite

One way to transpose this message is for Alice to "write in" the words vertically in five rows without any spaces as follows:

   m e e t m
   e n e a r 
   t h e c l 
   o c k t o
   w e r a t 
   t w e l v 
   e m i d n 
   i g h t t 
   o n i t e 

-- and then "read off" each column, top to bottom, as follows:

   metowteio  
   enhcewmgn  
   eeekreihi  
   tactaldtt  
   mrlotvnte

Bob, on the receiving end, then "writes in" the message in five parts:

   M E T O W T E I O
   E N H C E W M G N
   E E E K R E I H I
   T A C T A L D T T
   M R L O T V N T E

-- and then "reads off" the message from the columns:

   meet me near the clock tower 
   at twelve midnight tonite

A transposition of the form shown above is extremely easy to crack. A codebreaker just writes down the letters of the transposition in rows, increasing the length of the rows until he sees something that makes sense. However, Alice could make things more difficult for a codebreaker a bigger headache by reading off columns in an alternating "down" and "up" fashion, or by reading off the transposition in a "spiral" pattern -- "down" on the left side, "right" across the bottom, "up" on the right side, "left" across the top to the second-to-left column, "down" again, and so on until all letters were transposed.

Even more sophisticated transpositions use a "checkerboard" pattern -- such as a "knight's tour", a grid of numbers that specify the sequence of movements of a chess knight across the grid. As with ciphers, keys can be used to construct transpositions. One way of doing it would be to list the keyword, say "KANGAROO", numbering the letters by both their position in the keyword, and their relative rank in the alphabet:

   keyword:       K A N G A R O O
   position:      1 2 3 4 5 6 7 8
   rank:          3 1 4 2 1 6 5 5

The keyword is then transposed by re-ordering the characters according to the rank:

   scrambling:    A A G K N O 0 R
   position:      2 5 4 1 3 7 8 6
   rank:          1 1 2 3 4 5 5 6
   reposition:    1 2 3 4 5 6 7 8 

This gives a transposition pattern of:

   pattern:       25413786
   new_order:     12345678

A message is transposed in blocks of 8 letters, the 2nd letter in the block transposed to the 1st, the 5th to the 2nd, the 4th to the 3rd, the 1st to the 4th, and so on. It is best to write the blocks row-by-row, one 8-letter block per row, then scan down each of the 8 columns of the result, with the encrypted message becoming:

  column_1 column_2 column_3 ... column_8

Ciphers and transpositions can of course be combined. If the frequency analysis of a ciphertext shows a seeming match to normal text, then the cipher is likely to be a transposition. If frequency analysis seems to show a different mapping of characters from normal text, but trying to plug the proper characters back into the ciphertext gives absolutely no useful results, then the cipher is likely a combination substitution and transposition.

* A general technique for cracking transpositions called "multiple anagramming" was developed in the 1870s. It requires two messages that have been transposed in the same way -- for a simple example, consider two messages consisting only of one five-letter word each:

   TASPR
   PRSCO

A little examination shows that both of these two transpositions could be unscrambled into two different words:

   TASPR  ->  PARTS, TRAPS
   PRSCO  ->  CROPS, CORPS

Given each message on its own, there's no way to figure out which of the two unscramblings would be correct for each message. However, performing the same unscrambling on both messages in parallel shows that there's only one unscrambling that yields an intelligible answer for both messages:

   TASPR  ->  TRAPS   PRATS   PARTS
   PRSCO  ->  PORCS   CORPS   CROPS

   12345      15243   45213   42513
BACK_TO_TOP

[1.4] VIGINERE CIPHER / CHECKERBOARD CIPHERS

* The introduction of frequency analysis rendered monoalphabetic substitution ciphers insecure, and so led in turn to improved ciphers. One of the most significant developments was the "polyalphabetic substitution cipher", which was described in its definitive form in a paper published in 1586 by a French diplomat named Blaise de Vigenere. A "Vigenere cipher" uses 26 substitution ciphers, organized using a "Vigenere square" as shown below, with some spacing added here to make it more legible:

       a bcdef ghijk lmnop qrstu vwyxz

    1  A BCDEF GHIJK LMNOP QRSTU VWXYZ
    2  B CDEFG HIJKL MNOPQ RSTUV WXYZA
    3  C DEFGH IJKLM NOPQR STUVW XYZAB
    4  D EFGHI JKLMN OPQRS TUVWX YZABC
    5  E FGHIJ KLMNO PQRST UVWXY ZABCD
    6  F GHIJK LMNOP QRSTU VWXYZ ABCDE
    7  G HIJKL MNOPQ RSTUV WXYZA BCDEF
    8  H IJKLM NOPQR STUVW XYZAB CDEFG
    9  I JKLMN OPQRS TUVWX YZABC DEFGH
   10  J KLMNO PQRST UVWXY ZABCD EFGHI
   11  K LMNOP QRSTU VWXYZ ABCDE FGHIJ
   12  L MNOPQ RSTUV WXYZA BCDEF GHIJK
   13  M NOPQR STUVW XYZAB CDEFG HIJKL
   14  N OPQRS TUVWX YZABC DEFGH IJKLM
   15  O PQRST UVWXY ZABCD EFGHI JKLMN
   16  P QRSTU VWXYZ ABCDE FGHIJ KLMNO
   17  Q RSTUV WXYZA BCDEF GHIJK LMNOP
   18  R STUVW XYZAB CDEFG HIJKL MNOPQ
   19  S TUVWX YZABC DEFGH IJKLM NOPQR
   20  T UVWXY ZABCD EFGHI JKLMN OPQRS
   21  U VWXYZ ABCDE FGHIJ KLMNO PQRST
   22  V WXYZA BCDEF GHIJK LMNOP QRSTU
   23  W XYZAB CDEFG HIJKL MNOPQ RSTUV
   24  X YZABC DEFGH IJKLM NOPQR STUVW
   25  Y ZABCD EFGHI JKLMN OPQRS TUVWX
   26  Z ABCDE FGHIJ KLMNO PQRST UVWXY

       a bcdef ghijk lmnop qrstu vwyxz

This defines 26 different "shift ciphers", each of which is weak and predictable in itself, but which in combination result in a much more secure cipher. The idea in the Vigenere cipher is to use a cipher key to select different cipher alphabets in succession as letters are enciphered. Suppose Alice wants to encipher the phrase:

   use the force luke

-- with a Vigenere cipher, using the cipher keyword "WARTHOG". All she has to do is scan down the square defined above and match the cipher alphabet letter to a particular row, then select the cipher character matching the plaintext letter for that row:

   W / row 23 gives  u -> Q
   A / row 01 gives  s -> S
   R / row 18 gives  e -> V
   T / row 20 gives  t -> M
   H / row 08 gives  h -> O
   O / row 15 gives  e -> S
   G / row 07 gives  f -> L
   W / row 23 gives  o -> K
   A / row 01 gives  r -> R
   R / row 18 gives  c -> T
   T / row 20 gives  e -> X
   H / row 08 gives  l -> S
   O / row 15 gives  u -> I
   G / row 07 gives  k -> Q
   W / row 23 gives  e -> A

This gives:

   QSV MOS LKRTX SIQA

Simple frequency analysis cannot crack a Vigenere cipher, and the number of possible keys is so great that finding the right key by trial-and-error is effectively impossible. The longer and more unpredictable the keyphrase, the harder the Vigenere cipher is to crack. Despite the simplicity and elegance of the Vigenere cipher, it was mostly ignored for the next several centuries, since it was regarded as too cumbersome for general use.

The Viginere cipher could be made more manageable through the use of a "cipher disk", one of the first cipher machines, invented in the 15th century. The cipher disk consists of two nested disks, including an outer disk labeled with all the letters of the alphabet, and an inner disk also labeled with all the letters of the alphabet -- but not necessarily in the same order. The outer disk defines the plaintext alphabet, while the inner disk defines a monoalphabetic substitution cipher alphabet.

Suppose Alice takes her cipher disk and rotates the inner disk so that the letter "A" lines on the inner disk lines up with, say, the letter "q" on the outer disk. She can then encipher a message by taking each letter, looking up the plaintext letter on the outer disk, and then writing down the matching ciphertext letter next to it on the inner disk. The cipher disk can be also made in the form of a "slide", with two alphabets written on strips of cardboard, or whatever, that can be slid next to each other. Of course, each slide should have two repeating alphabets to permit them to be read conveniently when offset.

The cipher as described remains a feeble monoalphabetic substitution cipher, which is not only easy to crack, but also so easy to use that the cipher disk hardly seems very handy. The cipher disk, however, does simplify deciphering a Vigenere cipher.

For example, suppose Alice wants to encrypt a message using a Vigenere cipher with the cipher keyword "WARTHOG". To encipher the first letter, she moves the "W" on the inner disk to match up with the "a" on the outer disk, and then finds the ciphertext letter on the inner disk matching the desired plaintext letter on the outer disk. Next, Alice enciphers the second letter on the inner wheel by moving the "A" on the inner disc to match the "a" on the outer disk, then looking up the ciphertext letter on the inner disk that matches the desired plaintext letter on the outer disk. She repeats this process for the rest of the keyword, and then starts all over again at the beginning of the keyword.

Although the Vigenere cipher had been regarded as indecipherable for a long time, cryptanalysts were finally able to get a handle on it. For example, if multiple messages encrypted using a Vigenere cipher and the same cipher key were intercepted, frequency analysis could be used across the first letter of all the messages in parallel, the second letter, and so on.

Given knowledge of the length of a keyword, or at least the ability to assume the possible length of a keyword, cryptanalysts could then perform frequency analysis of the letters of a message on intervals of that length. For example, if the keyword were 8 characters long, characters 1, 9, 17, 25 ... and so on could be analyzed as a set; characters 2, 10, 18, 26 ... could be analyzed as a set; and so on, for a total of eight sets.

The Vigenere cipher, by modern standards, easy to crack with the right tools by people who know what they're doing. It is, however, also easy to implement in software, and is secure enough against casual intrusion. It becomes more secure if only small files are encrypted, each with a different keyword; and with long keywords that are hard to second-guess, but not too hard to remember. The Vigenere cipher becomes very secure, if still not impossible to crack, if a second level of encryption is added.

* From the 19th century, a number of new cipher techniques were developed based on the "Polybius square" or "checkerboard". This was a way of converting letters into pairs of numbers, devised in classical Greek times by a thinker named Polybius.

Updated into English, a Polybius square consists of the letters of the alphabet arranged as a 5-by-5 checkerboard grid with rows and columns numbered, and "I" and "J" assigned to the same grid location:

        1   2   3   4   5

   1    A   B   C   D   E
   2    F   G   H  I/J  K
   3    L   M   N   O   P
   4    Q   R   S   T   U
   5    V   W   X   Y   Z

The row-and-column index numbers are then substituted for letters. For example, "help" becomes "23 15 31 35". For added security, the arrangement of the letters in the grid can be scrambled, preferably by using a cipher keyword to determine the scrambling. For example, if we use the key "RICHARD MILHOUS NIXON", we get a checkerboard that looks like this:

        1   2   3   4   5

   1    R  I/J  C   H   A
   2    D   M   L   O   U
   3    S   N   X   B   E
   4    F   G   K   P   Q
   5    T   V   W   Y   Z

No matter how the grid is arranged, however, as stated it's just a monoalphabetic substitution cipher, with pairs of ciphertext digits in place of plaintext letters. It offers little security, though it can be used as a basis for "semaphore" codes, in which a signaler can hold two flags in five different positions each to send the full alphabet.

Such a checkerboard has much more potential. Each plaintext letter is represented by two ciphertext digits, and that gives a new option for encrypting the message. Suppose Alice has a checkerboard based on the key "RICHARD MILHOUS NIXON" as above, and the plaintext:

   down with big brother

She can then obtain row-and-column indexes for the letters in this plaintext from the grid as follows:

   d:   row = 2 / column = 1
   o:   row = 2 / column = 4
   w:   row = 5 / column = 3
   n:   row = 3 / column = 2
   w:   row = 5 / column = 3
   i:   row = 1 / column = 2
   t:   row = 5 / column = 1
   h:   row = 1 / column = 4
   b:   row = 3 / column = 4
   i:   row = 1 / column = 2
   g:   row = 4 / column = 2
   b:   row = 3 / column = 4
   r:   row = 1 / column = 1
   o:   row = 2 / column = 4
   t:   row = 5 / column = 1
   h:   row = 1 / column = 4
   e:   row = 3 / column = 5
   r:   row = 1 / column = 1

Now she divides the plaintext into blocks of, say, five letters, and writes the indexes vertically underneath the letters:

   downw ithbi gbrot her
   22535 15131 43125 131
   14323 21442 24141 451

To produce the final encryption, she concatenates the two rows on a block-by-block basis:

   2124533253 1251143412 4234112451 143511

The message is divided into blocks to make encryption and decryption more manageable; there's no particular reason to use a block size of five, any other reasonable value will work as well. This particular scheme breaks down each letter into two cipher letters, a concept known as "fractionation"; and then it breaks the message down into blocks and concatenates the two halves, a concept known as "seriation". This is a form of transposition; obviously, this greatly complicates frequency analysis.

This scheme is known as a "bifid" cipher, and was introduced by a Frenchman named Felix Marie Delastelle (1840:1902) in a book published in 1901. A number of elaborations on bifid ciphers were devised; and there were also "trifid" ciphers, in which letters were mapped to three numbers, not two. Some of these fractionating ciphers were hard to crack, even by modern standards.

BACK_TO_TOP

[1.5] NAVAL CODES / ONE-TIME KEY

* The 20th century introduced wireless communications, which had a big influence on cryptology. The First World War introduced wireless telegraphy into military operations, greatly extending the communications revolution begun by the telegraph. However, radio communications placed new demands on security, since it was so very easy to intercept messages sent over the airwaves.

To protect the communications of warships at sea, navies adopted code systems -- using number codes, usually with superencipherment. A representative naval code with superencipherment might be based on:

It was referred to as an "additive book" because these numbers were added to codegroups without using a numeric carry from digit to digit. For example:

   random number:              28506
   codegroup:                  58905
                               -----
   add without carries:        76401   -- NOT 87411!

Stripping out the superencipherment was then just a simple no-borrow subtraction:

   superenciphered code:       76401
   random number:              28506
                               -----
   subtract without borrows:   58905   -- codegroup

The superencipherment key was given as a page number and the row on the page, with successive pages used to encipher long messages. Naval codebooks were often bound with metal plates in the covers, so they could be thrown overboard in an emergency and sink. Since World War I on land quickly bogged down on land into immobile trench warfare, armies also adopted codes for frontline operations. There was relatively little problem in distributing the codebooks, the armies not being on the move, and "trench codes" came into common use.

* More sophisticated cipher systems were also developed. One of the most significant achievements in cryptography in the First World War was a cipher that was, and remains, uncrackable even in principle. As is typical of black magic, it had a major catch.

A Vigenere cipher based on a keyword becomes more secure with a longer keyword. It also becomes more secure if the keyword is made more unpredictable, to prevent a codebreaker from using guesswork to determine the keyword. That means that the most secure possible Vigenere cipher is one where the keyword is as long as the message, with the keyword made up of completely unpredictable random characters.

To implement such a cipher, sets of random keys could be printed on a pad of paper, with each page on the pad used once and then thrown away. As a result, this scheme is called a "one-time pad" or "one-time key" cipher. The interesting thing about the one-time key is that it is logically impossible to crack by analytic means. Imagine being given a page of completely random letters: it's impossible to "crack" because there's no message there, it's just noise. Taking a message and changing all the characters at random also results in a page of noise, and it's just as uncrackable. The only way the message can be extracted is by telling the recipient of the message with a one-time pad what random changes were made to the letters in the text.

A one-time key cipher has an interesting property: it is not so much true that no signal can be extracted from pure noise, but that any signal can be extracted from noise. Since there's no fixed pattern in the ciphertext or the key, a key can be easily synthesized to produce every possible message that will fit into the number of letters, such as instructions to attack the enemy, a shopping list, or dirty limericks.

The catch to the one-time key cipher is that a specific random key can only be used once to encipher one message. If two messages are enciphered using the same key, then it is no longer impossible to crack the cipher, since a (painful) exercise in multiple anagramming could be used to decipher the texts, trying every possible key on both messages until both messages are revealed. This makes the one-time key cipher logistically clumsy to deal with, worse than a code, requiring distribution of unique pads to every user -- and so it was only used in very high-security communications.

BACK_TO_TOP

[1.6] BAZERIES CYLINDER / ENIGMA / TELECIPHER MACHINES

* With the development of improved ciphers, improved cipher machines were developed as well. Late in the 19th century, a French military officer named Etienne Bazeries invented the "Bazeries cylinder". (The basic idea was actually invented by American President Thomas Jefferson a century earlier, but then forgotten.) A Bazeries cylinder consists of a set of roughly 20 to 30 numbered disks, with a different cipher alphabet on the edge of each disk, and a hole in the center of the disks to allow them to be stacked on an axle. The disks are removeable and can be mounted on the axle in any order desired. The order of the disks can be considered the cipher key for the Bazeries cylinder, with both Alice and Bob arranging the disks in the same predefined order.

To encrypt a message, Alice rotates the disks to produce the plaintext message along one "row" of the stack of disks, and then selects another row as the ciphertext. To decrypt the message, Bob rotates the disks on his cylinder to produce the ciphertext along a row. It is handy if both Alice and Bob know the offset of the row, but not really necessary since Bob can simply look around the cylinder to find a row that makes sense. It was also possible to implement the same scheme with strips containing cipher alphabets inserted on a frame.

Both schemes were used in World War I and well into World War II; they provided strong encryption, but they were hardly indecipherable, being vulnerable to cribs. Suppose the plaintext of a message corresponding to a message enciphered by a Bazeries cylinder was known, and the cipher alphabets for each of the cipher disks on the cylinder were also known. There would be matchings between a cipher character and the known plaintext character on each of the disks, with a table then assembled of all the possible matchings for all the disks. The table could be then sorted until a consistent set emerged, giving the proper order of cipher disks needed to establish the crib.

* After World War I, a number of much more sophisticated cipher machines were built, the most famous being the "Enigma", patented in 1918 by a German engineer named Arthur Scherbius.

As it emerged, the Enigma was a wooden box with a keyboard and a bank of lettered lights corresponding to the keys. To encrypt a message, a plaintext character was typed in, and after scrambling the appropriate light was turned on to give the ciphertext character. The ciphertext was then sent using Morse code over radiotelegraph. The operation of the machine was "reciprocal", in that if the "A" key was pressed and lit up the "Z" lamp, then pressing the "Z" key would light up the "A" lamp. That meant that at the receiving end, as long as the operator had an Enigma machine set up in the same way, he could just type in the ciphertext character to get the plaintext character: the procedure for encryption and decryption was the same.

The scramblings between the input and output were performed by what was called a "rotor" system. A rotor was a wheel with 26 electrical contacts on each side and scrambled wiring linking the contacts on the two sides. The scrambled wiring represented a mixed substitution cipher alphabet. There were three rotors in series fed into one another, with this assembly referred to as a "basket".

Every time a key was pressed, the first rotor would turn one position. Every time the first rotor went full circle (after 26 keypresses), the second rotor would turn one position. Every time the second rotor went full circle (after 26 * 26 = 676 keypresses), the third rotor would turn one position. When the third rotor went full circle (after 26 * 26 * 26 = 17,576 keypresses), the sequence would start all over again.

The third rotor was wired into a disk called the "reflector" that simply rerouted the connections on the output of the third disk back into themselves. In operation, the electrical signal from the input key ran up through the basket of rotors to the reflector, and then back down through the basket to the output lamp. This is how the Enigma achieved reciprocal operation, allowing both encryption and decryption using the same configuration.

Each of the rotors had a different wiring scheme. The three rotors were removeable and could be rearranged in six different ways. It was also possible in principle to have a stockpile of more than three rotors and select three from the set for use, further multiplying the possibilities. In addition, the Enigma had a plugboard that allowed patch cords to be swapped around to mix six different sets of letters. Setup of the Enigma involved selecting the order of the three rotors, selecting their starting positions, and selecting the patch cord arrangement. This setup amounted to the "key" or "daykey" for the Enigma cipher, and there were a vast number of possible daykeys.

The German military standardized on the Enigma for encryption before World War II. Despite the complexity of the Enigma cipher, Polish cryptanalysts were able to figure out patterns in the operation of the Enigma that allowed them to determine the daykey, using a set of Enigma simulators known for some obscure reason as a "bombe".

However, in 1938 the Germans improved their Enigma security by providing five rotors, increasing the number of possible configurations of three rotors selected from the set, and increased the number of pairs of letters swapped by the plugboard to 10. That greatly multiplied the possible numbers of daykey configurations, demanding a much more complicated and expensive bombe to figure out the settings. By this time, Nazi dictator Adolf Hitler was making threatening noises against Poland, and so the Poles got in touch with British intelligence to tell them how Enigma had been cracked.

The Nazis invaded Poland in September 1939, setting off World War II. They quickly overran the country. The British were hard-pressed by defeats early in the war, and to help fight back set up a sophisticated codebreaking operation at an estate north of London named Bletchley Park. Although the Germans had tightened up their procedures for Enigma operation, eliminating the loopholes that the Poles had exploited to ferret out daykeys, Bletchley Park cryptanalysts -- most significantly a brilliant Oxford mathematician named Alan Turing -- were able to develop a new, more sophisticated bombe system to once again crack Enigma. The Germans never realized Enigma had been cracked.

* The Enigma machine was not the most sophisticated cipher machine used during World War 2; the conflict also saw considerable use of "telecipher" systems for high-level secure communications.

A telecipher system was an encrypted teletypewriter. Teletypewriters used a sequence of on-off electrical signals to transmit and receive letters, using a scheme known as the "Baudot code". If the "on" signal is represented as a "1" and the "off" signal represented as a "0", then the Baudot code had the form of a set of "binary" numbers as follows:

   00011      A      -
   11001      B      ?
   01110      C      :
   ...

This scheme uses five "binary digits" or "bits" per letter; there were "shift" characters to permit use of a secondary set of numbers and punctuation characters. Incidentally, the Baudot code was one of the first binary codes used to define text characters; it has been followed by other code schemes. The best-known modern standard is known as "ASCII (American Standard Code For Information Interchange)". ASCII is an effective subset of a 16-bit scheme known as "Unicode", which more or less covers the world's character sets.

In any case, a telecipher machine scrambled the "stream" of Baudot plaindata using what was known in the old days as "modulo-two arithmetic" and is now known as an "exclusive OR" or "XOR" operation. The basic rules of the XOR operation are as follows:

   0 XOR 0  =  0
   0 XOR 1  =  1
   1 XOR 0  =  1
   1 XOR 1  =  0

In simpler terms, if two bits are XORed that have the same value, the result is a "0". If two bits are XORed that have a different value, the result is a "1". XOR can be performed with multibit values on a bit-by-bit basis -- for example, two five-bit values X and Y can be XORed to give a 5-bit value Z:

   X:  10011
   Y:  00110
       _____

   Z:  10101

The XOR operation has the interesting property of being "reversible", in that if two binary values X and Y are XORed together, then the result Z can be XORed with Y to give X again:

   Z:  10101
   Y:  00110
       _____

   X:  10011

A telecipher machine used an arrangement of electromagnetic relays to XOR the plaindata with a stream of seemingly random bits as a "mask":

   data:  10100 00001 00110 10010 00100
   mask:  01001 00101 11011 10110 00111
          _____________________________

 cipher:  11101 00100 11101 00100 00011

The enciphered binary stream could be deciphered by simply XORing it again with the same mask. The mask was generated by an arrangement of rotating "wheels", with sets of pins on them that could be set in or out. The wheels could generate very long streams of bits in an unpredictable fashion, and telecipher machine messages could be very hard to crack. In fact, the British developed one of the first electronic computers, named "Colossus" and built with vacuum tubes, to crack German telecipher messages.

BACK_TO_TOP
< PREV | NEXT > | INDEX | SITEMAP | GOOGLE | UPDATES | BLOG | CONTACT | $Donate? | HOME