[2.0] Refining The Art

v6.1.0 / chapter 2 of 10 / 01 jul 25 / greg goebel

* The invention of frequency analysis made simple monoalphabetic substitution ciphers much too easy to crack, and led cryptographers to design new and more formidable codes and ciphers over the next centuries; the contest between codemaker and codebreaker escalated to a higher level. In the meantime, the general public began to recognize the use of codes and ciphers, and simple cryptosystems came into popular use.

[2.1] CRYPTANALYSIS ARISES IN THE WEST
[2.2] POLYALPHABETIC SUBSTITUTION CIPHERS / THE VIGENERE CIPHER
[2.3] OTHER CIPHER REFINEMENTS
[2.4] CIPHERS GO PUBLIC

[2.1] CRYPTANALYSIS ARISES IN THE WEST

* The Arab world was well ahead of the West in cryptanalysis, but in European monasteries, monks engaged in analysis of Biblical texts kept interest in cryptology alive in the West. Their interest was provoked partly by the fact that the original Hebrew sources of the Old Testament actually include enciphered words, though more as a literary gimmick than to keep secrets. These enciphered words use a simple monoalphabetic substitution cipher known as "atbash", which involves a straightforward reversal of letters in the alphabet. For example, in English, atbash would involve exchanging "A" and "Z", then "B" and "Y", then "C" and "X", and so on. For a specific example, the Book of Jeremiah refers to the kingdom of "Sheshach", which in Hebrew script is the atbash ciphertext for "Babel".

By the 13th Century, the English polymath and monk Roger Bacon had written the first known European book to discuss cryptography, EPISTLE ON THE SECRET WORKS OF ART AND NOBILITY OF MAGIC, and by the next century ciphers were in common use by alchemists and scientists as a means of concealing their findings. These ciphers often used custom-designed symbol sets instead of letters, which were not particularly inconvenient in the days when documents were generally hand-written. Such symbol sets didn't provide much added security, since frequency analysis could crack ciphers based on them just as easily as ciphers based on ordinary letters. They still show up in modern puzzles.

By the 15th and 16th centuries, ciphers had become extremely important for diplomatic communications, and the art of frequency analysis had been reinvented in Europe. The first famous European codebreaker was Giovanni Soro, who was appointed as the Venetian cipher secretary in 1506. He earned a great reputation for cracking ciphers for Venice, the Vatican, and other Italian city-states.

The French codebreaking tradition began with Philibert Babou, who was followed by Francois Viete. Viete was so successful in cracking Spanish ciphers that the Spaniards asked the Vatican to put Viete on trial for being in league with the devil. The request led to widespread mockery of the backwardness of the Spaniards.

* Cracking ciphers by frequency analysis was still unusual in Europe. Many states continued to use simple monoalphabetic substitution ciphers, but there were cryptographers who realized the weakness of such ciphers and looked for something better.

They added codewords, typically a limited number, under a cipher, to replace a selected set of important words instead of all possible words. This reduced the effort of creating and distributing codebooks. This scheme is known as a "nomenclator", and at least in its original form it wasn't very secure, since the codewords could be inferred from context by a clever cryptanalyst once the rest of the message was decrypted. Nomenclators were very popular until well into the 19th century, evolving to include so many entries that they became very much like codes. Early nomenclators had a one-part organization, but in time they acquired a two-part organization for greater security.

* Codemakers also used misspelt words and "nulls" in messages. Nulls involve seeding a ciphertext with unused symbols. Suppose that a cipher uses the numbers 00 through 99 to represent text. Even if numbers and punctuation are enciphered, that leaves an unused subset of numbers, and these unused numbers or nulls could be littered through the ciphertext. They were simply ignored when the text was deciphered.

Misspelt words or nulls would slow down but not stop a cryptanalyst, and governments became increasingly aware that their ciphers were no longer secure. This lesson was brought home in 1587, when Mary, Queen of Scots, was executed by Queen Elizabeth I of England. Mary was condemned on the basis of evidence obtained from enciphered messages cracked by Thomas Phelippes, in the employ of Elizabeth's Principal Secretary, Sir Francis Walsingham. Phelippes was able to crack a cipher used by Mary and conspirators who wanted to place her on the English throne, even though the cipher contained nulls and codewords. Mary was beheaded, and the conspirators put to death by torture.

BACK_TO_TOP

[2.2] POLYALPHABETIC SUBSTITUTION CIPHERS / THE VIGENERE CIPHER

* More secure ciphers were in fact available by this time. The basic idea was devised sometime in the 1460s by a polymath from Florence, Italy, named Leon Batista Alberti. Alberti is a pivotal figure in the history of cryptology. He was the first Westerner to publish an explanation of the use of frequency analysis for breaking ciphers, and he was the first known person to suggest the concept of superencipherment, or enciphering a code. Most significantly, he was the first person to hit on the idea of enciphering a message using multiple cipher alphabets. For example, two substitution cipher alphabets could be defined and used on alternating letters of a plaintext, with all the even-numbered letters enciphered by one cipher alphabet and all the odd-numbered letters enciphered by the other.

The idea of such a "polyalphabetic substitution cipher" was refined for over a century, until a French diplomat named Blaise de Vigenere turned it into the simple and elegant cipher scheme that bears his name. A Vigenere cipher uses 26 substitution ciphers, organized using a "Vigenere square" as shown below, with some spacing added here to make it more legible:

: a bcdef ghijk lmnop qrstu vwyxz : : 1 A BCDEF GHIJK LMNOP QRSTU VWXYZ : 2 B CDEFG HIJKL MNOPQ RSTUV WXYZA : 3 C DEFGH IJKLM NOPQR STUVW XYZAB : 4 D EFGHI JKLMN OPQRS TUVWX YZABC : 5 E FGHIJ KLMNO PQRST UVWXY ZABCD : 6 F GHIJK LMNOP QRSTU VWXYZ ABCDE : 7 G HIJKL MNOPQ RSTUV WXYZA BCDEF : 8 H IJKLM NOPQR STUVW XYZAB CDEFG : 9 I JKLMN OPQRS TUVWX YZABC DEFGH : 10 J KLMNO PQRST UVWXY ZABCD EFGHI : 11 K LMNOP QRSTU VWXYZ ABCDE FGHIJ : 12 L MNOPQ RSTUV WXYZA BCDEF GHIJK : 13 M NOPQR STUVW XYZAB CDEFG HIJKL : 14 N OPQRS TUVWX YZABC DEFGH IJKLM : 15 O PQRST UVWXY ZABCD EFGHI JKLMN : 16 P QRSTU VWXYZ ABCDE FGHIJ KLMNO : 17 Q RSTUV WXYZA BCDEF GHIJK LMNOP : 18 R STUVW XYZAB CDEFG HIJKL MNOPQ : 19 S TUVWX YZABC DEFGH IJKLM NOPQR : 20 T UVWXY ZABCD EFGHI JKLMN OPQRS : 21 U VWXYZ ABCDE FGHIJ KLMNO PQRST : 22 V WXYZA BCDEF GHIJK LMNOP QRSTU : 23 W XYZAB CDEFG HIJKL MNOPQ RSTUV : 24 X YZABC DEFGH IJKLM NOPQR STUVW : 25 Y ZABCD EFGHI JKLMN OPQRS TUVWX : 26 Z ABCDE FGHIJ KLMNO PQRST UVWXY : : a bcdef ghijk lmnop qrstu vwyxz

This defines 26 different Caesar shift ciphers -- each of which is weak in itself, but which in combination result in a much more secure cipher. The idea in the Vigenere cipher is to use a cipher key to select different cipher alphabets in succession as letters are enciphered. Suppose Alice wants to encipher the phrase:

: use the force luke

-- with a Vigenere cipher, using the cipher keyword "WARTHOG". All she has to do is scan down the square defined above and look for the cipher alphabet starting with "W", which is number 23 in the list. In this cipher alphabet, the letter "u" is enciphered as "Q".

Next, she looks for the row starting with "A", which is number 01 in the list. This cipher alphabet is the same as the plaintext alphabet, so "s" remains "S". Enciphering the entire message in this way:

: W / row 23 gives u -> Q : A / row 01 gives s -> S : R / row 18 gives e -> V : T / row 20 gives t -> M : H / row 08 gives h -> O : O / row 15 gives e -> S : G / row 07 gives f -> L : W / row 23 gives o -> K : A / row 01 gives r -> R : R / row 18 gives c -> T : T / row 20 gives e -> X : H / row 08 gives l -> S : O / row 15 gives u -> I : G / row 07 gives k -> Q : W / row 23 gives e -> A

-- gives:

: QSV MOS LKRTX SIQA

Simple frequency analysis cannot crack a Vigenere cipher, and the number of possible keys is so great that finding the right key by trial-and-error is effectively impossible.

Vigenere published a description of his cipher in 1586. Despite the simplicity and elegance of the Vigenere cipher, it was generally ignored for the next several centuries, mostly because it was regarded as too cumbersome for general use. Military forces in battle need to encipher and decipher messages quickly, and a Vigenere cipher was just too troublesome under the circumstances.

BACK_TO_TOP

[2.3] OTHER CIPHER REFINEMENTS

* Although polyalphabetic substitution ciphers were regarded as troublesome, monoalphabetic substitution ciphers were clearly too weak for any important use. Several compromises were developed.

One was the "homophonic substitution cipher", where "homophonic" means "same sound". The homophonic substitution cipher is a simple extension of the monoalphabetic substitution cipher. The basic concept is to use multiple substitution symbols for each letter, with the number of substitutions proportional to the frequency of the letter.

For example, since the letter "a" has an average frequency of about 8% in English text, then eight symbols could be used to represent "a", with the different symbols used at random in the ciphertext. In principle, a homophonic substitution cipher could be designed so that none of the symbols in the ciphertext would have a frequency greater than 1%, defeating frequency analysis.

Of course, this means that individual letters can't be used as the substitution symbols. Two-digit, or if need be three-digit, numbers can be used instead. A sample homophonic substitution cipher for English is shown below:

: a: 12 29 25 43 71 80 89 95 : b: 05 92 : c: 19 37 36 : d: 23 41 61 66 : e: 16 30 47 59 72 83 90 60 69 88 99 00 : f: 17 49 : g: 02 31 : h: 04 45 55 63 76 82 : i: 15 34 56 97 77 86 : j: 03 : k: 11 : l: 24 38 48 64 : m: 65 46 : n: 26 42 53 70 73 98 : o: 10 44 50 94 78 85 91 : p: 06 39 : q: 52 : r: 21 35 54 20 74 87 : s: 01 40 57 68 79 81 : t: 13 28 51 67 75 84 33 27 22 : u: 08 62 58 : v: 07 : w: 18 32 : x: 96 : y: 09 93 : z: 14

A homophonic substitution cipher is more secure than a simple monoalphabetic substitution cipher, but it is not absolutely secure. While it does tend to obscure frequency information, the patterns that letters form in text give a codebreaker fingerholds for attack. For example, in English the letter "q" is almost always followed by a "u". "Q" is rarely used in English text, and this is reflected in the table above, with only one substitution symbol for "q". In contrast, there are three substitution symbols for "u". This means the encrypted message will have distinctive symbol pairs, with one specific symbol followed by one of three other specific symbols and no others. Similar rules of letter association can help to further decrypt a homophonic substitution cipher.

* Other improved ciphers included "digraphic" ciphers, in which the substitutions were performed on pairs of letters, not single letters, resulting in 26 * 26 = 676 distinct substitutions, or even "trigraphic" ciphers, with substitutions performed on triplets of letters, for a total of 26 * 26 * 26 = 17,576 substitutions.

Another refined cipher was developed by a 17th-century French team of father and son named Antoine and Bonaventure Rossignol, who worked as cipher experts for the French monarchs Louis XIII and Louis XIV. Their "Great Cipher", as it became known because it was so hard to crack, performed substitutions on syllables, and added various other tricks to confuse a would-be codebreaker, such as including a codegroup that meant: IGNORE THE PREVIOUS CODEGROUP.

Although these ciphers were much more secure than a simple monoalphabetic substitution cipher, they were not impregnable. As discussed in the previous chapter, digraphs and even trigraphs have distinctive frequency distributions in specific languages, as do syllables. By this time, most nations had acquired cryptologic bureaus, known as "Black Chambers", staffed with professional, full-time codebreakers who could spend all their time deciphering other people's mail. The best of the Black Chambers could crack almost any code or cipher in time.

The Rossignol's Great Cipher was very tough, but was ultimately cracked. Many of the official records of the reign of Louis XIV were written in the Great Cipher. Eventually that cipher scheme fell into disuse and was forgotten, and nobody could read any of the records any longer. In 1890, a French army cryptographic expert named Commandant Etienne Bazeries (1846:1931) decided to try to crack the Great Cipher. He succeeded after three years of agonizing work, in one of the great epics of cryptanalysis.

BACK_TO_TOP

[2.4] CIPHERS GO PUBLIC

* By the middle of the 19th century, ciphers had become popular with the public in Britain and America. Partly this was due to the straitlaced nature of the societies of the times, particularly Victorian Britain, where young lovers had to conduct their romances in secret. Publishing ciphertext messages in the personals columns of newspapers provided a useful means of communication, and though for the most part the couples used simple monoalphabetic substitution ciphers, such means of encryption were at least reasonably secure against their parents.

Ciphers also showed up in popular fiction. In 1843, the American suspense writer Edgar Allen Poe wrote a short story called THE GOLD BUG, in which the fictional hero uses frequency analysis to crack a ciphertext that leads him to the buried treasure of the pirate Captain Kidd. Poe was actually fairly knowledgeable about codes and ciphers, and had something of a reputation as a "genius" for his ability to crack monoalphabetic substitution ciphers from ciphertexts submitted to him by readers.

Sherlock Holmes, the fictional detective genius created by Sir Arthur Conan Doyle, had among his many talents a mastery of cryptanalysis, demonstrated in stories like THE ADVENTURE OF THE DANCING MEN, where he cracks a cipher based on stickman figures. French author Jules Verne also used cryptography in his novel JOURNEY TO THE CENTER OF THE EARTH, in which an ancient parchment is deciphered and tells the heroes the path to the Earth's interior.

* In most cases, these exercises in popular cryptography are of no strong technical interest, but there is one major exception. In 1885, a pamphlet titled THE BEALE CIPHERS was printed in Lynchburg, Virginia. It told a story related to the anonymous author by a Robert Morriss, who owned the Washington Hotel in Lynchburg.

In 1820 and 1822, according to the pamphlet, a fellow named Thomas Beale checked into the Washington Hotel and made friends with Morriss. When Beale left for the second time, he left a locked box with Morriss, along with written instructions that the box contained "papers of value and importance" and that it was not to be opened for ten years. The instructions also explained to Morriss that the papers would be "unintelligible without the aid of a key", and the "key" would be given to him by some unspecified third party when the ten years had passed.

1832 came and went. Morriss heard nothing from Beale, nor from the mysterious third party. Beale had judged Morriss trustworthy, and Morriss proved it by not opening the box until 1845. The box contained three ciphertext documents, consisting of lists of numbers, plus a plaintext note written by Beale.

The note explained that Beale and a number of colleagues had gone out West to Santa Fe, New Mexico. They set out north, and by accident stumbled on to a rich lode of gold. They mined a large quantity of gold, as well as some silver. Beale was entrusted to take the treasure back East and hide it for safekeeping. Beale made two such trips, burying the treasure in the Lynchburg area, and then left the enciphered documents with Morriss so that somebody would be able to arrange the distribution of the treasure to relatives if any disaster occurred to the adventurers.

Apparently such a disaster occurred, since the group vanished without a trace. Morriss puzzled with the ciphertexts for most of two decades, but in 1862 he knew he wouldn't be around much longer and told the author of the pamphlet the story. In the end, the author regretted being told of the treasure and the cipher. He spent over two decades trying to track down the Beale treasure, and reduced himself and his family to poverty in the attempt. To eliminate further temptation, he decided to write the pamphlet and make the matter public. To prevent people from harassing him, he did not name himself, and printed the pamphlet through an agent, a prominent local citizen named James B. Ward.

* While the author's efforts to find the Beale treasure failed, he did at least decipher one of the three ciphertexts. He realized that it was a "book cipher", in which the numbers in the ciphertext indicate the position or "index" of words in some external document, or "keytext", such as one of the books of the Bible. The first letter of the word is the actual plaintext letter associated with the cipher index value.

For example, suppose the keytext is some book titled A DARK AND STORMY NIGHT, and the first text in the book goes as follows:

It was a dark and stormy night. Every now and then, the ungodly quiet was broken by a crash of lightning that split the darkness outside. Inside the house, midnight approached.

This text is indexed as follows:

: index_value keytext_word first_letter : ----------- ------------ ------------ : 1 It I : 2 was w : 3 a a : 4 dark d : 5 and a : 6 stormy s : 7 night. n : 8 Every e : 9 now n : 10 and a : 11 then, t : 12 the t : 13 ungodly u : 14 quiet q : 15 was w : 16 broken b : 17 by b : 18 a a : 19 crash c : 20 of o : 21 lightning l : 22 that t : 23 split s : 24 the t : 25 darkness d : 26 outside. o : 27 Inside i : 28 the t : 29 house, h : 30 midnight m : 31 approached. a : ----------- ------------ ------------

-- and so on. Obviously this list must continue until all the letters of the alphabet are matched to an index, and of course several different indexes should be matched to one letter. If that were not the case, the book cipher would be just another simple monoalphabetic substitution cipher and extremely easy to crack, whether anyone knew what the keytext was or not. In effect, a book cipher resembles a homophonic substitution cipher.

Given this keytext, then a simple plaintext message such as:

: hide all the loot

-- becomes the ciphertext:

: 29 27 4 8 18 21 21 11 29 8 21 26 26 12

The author of the pamphlet realized was that the keytext was actually the American Declaration of Independence. The deciphered text indicated that the treasure amounted to roughly over a tonne of gold, about two tonnes of silver, plus a relatively small amount of jewelry that had been obtained for portability and barter. The treasure was stored in iron pots and buried in a stone-lined vault somewhere in the county of Bedford, Virginia.

* The Beale ciphers have become a cult. Since 1885 many people have spent their entire life hunting for the Beale treasure, or to decipher the other two Beale ciphertexts, which to some is the more interesting goal.

They have never been cracked. Every now and then somebody claims they have succeeded in deciphering them, but these proclamations have always turned out to be either self-delusion or sheer fraud, and anyone who makes such a claim without persuasive proof would have to expect to be treated with great skepticism.

The difficulty of cracking the Beale ciphers is due to the fact that the amount of ciphertext is so small, and if the ciphertexts are actually based on a book cipher the number of possible keytexts is unlimited. Some cryptographers believe that the Beale ciphers have been a real blessing to the field, inspiring a great deal of work and cleverness.

Of course, the whole story of the Beale ciphers has the neat flavor of a Hardy Boys mystery and smells of a hoax. One of the suspicious aspects of the story was pointed out by a ciphers hobbyist, who noticed that there were significant errors in the listing of the Declaration of Independence in the pamphlet compared to the actual text of the document, and that the cipher only worked properly if the listing in the pamphlet were used and not the actual text. In other words, the whole thing was "cooked". On the other hand, sketchy historical records do indicate the existence of a Thomas Beale whose movements seem to match those of the story, and analysis of the unbroken ciphertexts reveals patterns that are far from random.

The Beale ciphers have inspired both treasure hunters and conspiracy fanatics. Some suspect that the Beale ciphers did exist, but the author didn't put the true ciphertexts in the document -- hoping that the pamphlet might reach the mysterious person, who would then come forward with the key, so they could cut a deal for the treasure. Some believe the US National Security Agency (NSA), which handles America's codebreaking efforts, cracked the Beale ciphers a long time ago and picked up the loot for its own use; no doubt aliens show up in that story somewhere along the line as well.

The final mystery is: if the Beale ciphers are for real, what tragedy occurred to Beale and his comrades? At this late date, even if the ciphers are for real and are broken, it seems hard to believe that anyone will ever know.

BACK_TO_TOP