Bayes' Theorem: The Story

The Battle of the Atlantic was the longest continuous military campaign of World War II. German U-boats, guided by encrypted radio traffic, were sinking Allied convoys faster than they could be replaced. Winston Churchill later wrote that the only thing that ever truly frightened him was the U-boat peril.

Every tactical order, every fleet position, every weather report was encrypted on the Enigma machine, a device the German high command believed to be mathematically unbreakable. They were right that brute force would fail. They were wrong about everything else.

"The Enigma was not broken by brilliance alone. It was broken by the systematic application of a 178-year-old theorem about conditional probability."

The people who broke it

Five figures you should know

Alan Turing

1912 – 1954

Mathematician · Head of Hut 8

Led the effort to break Naval Enigma. Designed the Bombe machine. Independently invented sequential Bayesian inference, calling it Banburismus, years before it appeared in the academic statistics literature.

"The weight-of-evidence framework Turing built at Bletchley is now a cornerstone of modern statistical inference."

Joan Clarke

1917 – 1996

Cryptanalyst · Deputy head of Hut 8

One of the finest cryptanalysts at Bletchley Park. Worked directly alongside Turing breaking Naval Enigma. Despite outperforming many of her male colleagues, she was officially graded as a 'linguist', the only category that permitted a woman to be paid at her level.

"Turing argued personally for her promotion and later said she was indispensable to the work of Hut 8."

I.J. Good

1916 – 2009

Statistician · Turing's assistant at Hut 8

Worked directly alongside Turing on Banburismus. After the war, he was the first to publish a rigorous account of Turing's Bayesian methods. Later pioneered Bayesian analysis in academic statistics.

"Introduced the term 'weight of evidence' and formalized the deciban framework that Turing used informally."

Hugh Alexander

1909 – 1974

Cryptanalyst · Head of Hut 8 from 1942

Two-time British chess champion who became one of the most effective operational cryptanalysts of the war. Succeeded Turing as head of Hut 8 and led the day-to-day breaking of Naval Enigma for the rest of the war. Later headed cryptanalysis at GCHQ.

"While Turing built the theory and the tools, Alexander ran the operation, turning mathematical methods into military intelligence under daily deadline pressure."

Gordon Welchman

1906 – 1985

Mathematician · Head of Hut 6

Broke Army and Air Force Enigma. Invented the 'diagonal board', a crucial enhancement to the Bombe that made operational codebreaking possible at scale.

"Without Welchman's diagonal board, the Bombe would have been too slow to be operationally useful."

How it worked

The Machine

Why brute force was impossible, and what the codebreakers could exploit

An Enigma machine looks like a typewriter. Press a key and a lamp lights up somewhere else on the keyboard: that is the encrypted letter. The path the signal takes depends on three things fitted into the machine:

Rotors

Three rotors chosen from a set of five, each wired to scramble the alphabet differently. They step forward with every keypress: the rightmost turns on every letter, the middle turns when the right notch is reached, the left turns more rarely. Like an odometer, but for substitution ciphers.

Reflector

After passing through the three rotors, the signal hits a fixed reflector that bounces it back through all three rotors again along a different path. This is what makes Enigma self-reciprocal: the same settings encrypt and decrypt. It also means a letter can never encrypt to itself. The fatal flaw.

Plugboard

Before and after the rotors, the signal passes through a plugboard that swaps 10 pairs of letters. The plugboard alone contributes more than 150 trillion configurations, the majority of Enigma's total keyspace.

Settings reset at midnight

Each day, every Enigma operator in the German military received a printed settings sheet specifying which rotors to use, what ring settings to apply, and which 10 pairs of letters to connect on the plugboard. At midnight, the sheet changed. Every break Hut 8 achieved expired in eighteen hours. The work started over the next morning.

The combined effect of rotor choice, rotor order, starting positions, ring settings, and plugboard wiring produces roughly 10²³ possible configurations. A machine checking one setting per microsecond would take longer than the age of the universe to exhaust them all. Brute force was never an option.

The aha moment: "Heil Hitler"

German operators were required to end every operational message with HEIL HITLER. The codebreakers knew this. Aligned against the last eleven letters of the ciphertext, those eleven known plaintext letters became a crib: a suspected piece of plaintext at a known position.

Here is what made the crib devastating: because Enigma's reflector means a letter can never encrypt to itself, any setting where H maps to H, E maps to E, or any other crib letter maps to itself is immediately, provably impossible. An eleven-character crib typically eliminated more than 99% of all candidate settings before a single probability had been calculated.

The constraint in one sentence

For a candidate setting to survive, no letter in the crib may align with the same letter in the ciphertext. This is not a heuristic: it is an absolute consequence of the reflector's design. Settings that violate it are immediately ruled out.

The breakthrough

The Insight

Not searching the haystack. Making most of it disappear.

After the crib eliminated more than 99% of settings, what remained was a much smaller list, still large by human standards, but small enough to reason about systematically. The question shifted from "which of 10²³ is correct?" to "which of these surviving candidates looks most like German?"

Turing's method was this: for each surviving candidate setting, decrypt the message and ask how natural the result looks. German military prose uses letters in predictable patterns: E appears far more often than Q, common digrams like EN and ER appear far more often than rare ones. A setting that produces natural-looking German is far more likely to be correct than one that produces gibberish.

Each new intercepted message provided more evidence. With every letter examined, the probabilities shifted. Plausible settings became more plausible; implausible ones collapsed toward zero. Evidence accumulated, and the answer converged.

"They didn't search the haystack. They made most of it disappear."

Turing called his scoring unit a deciban, named after the bans of Banbury, where the paper sheets used in the procedure were manufactured. Every piece of evidence added or subtracted from a running score. When a setting's score fell far enough below the leader, it was ruled out. When one setting pulled far enough ahead, the operators had their answer.

He did not know, working in a requisitioned Victorian mansion in Herefordshire, that he was independently re-inventing a theorem first published by Thomas Bayes in 1763. The mathematics was 178 years old. The application was entirely new.

Why this works without a formula

The core logic is intuitive: start with every possibility being equally likely. As you gather evidence, update your confidence in each one. Evidence that strongly favors a candidate pushes it up. Evidence that strongly contradicts it pushes it down. Keep going until one candidate is far enough ahead that you are willing to bet on it. That is Bayesian inference.

What happened next

The Outcome

Historians estimate that breaking Enigma shortened the war in Europe by two to four years. The intelligence produced at Bletchley Park, codenamed Ultra, guided Allied commanders at El Alamein, D-Day, and the Battle of the Atlantic. Churchill called the codebreakers his "geese that laid the golden eggs and never cackled."

Turing's Bombe, the electromechanical machine he built to automate the crib-based elimination process, became something remarkable: a machine designed to run through logical possibilities at mechanical speed, guided by rules encoded into its wiring. It was not a computer in the modern sense. But it pointed toward one.

After the war, Turing published his famous paper asking "Can a machine think?" He proposed the test that now bears his name. The conceptual lineage is direct: from Hut 8 to the Bombe to the stored-program computer to the question of machine intelligence. Every AI system running today is, in a sense, a distant descendant of the work done in that cold, damp hut in Herefordshire.

The work was classified until 1974. Turing himself never received public recognition for what he had done. He died in 1954. The world did not learn the full story for another twenty years.

The Bayesian method Turing used did not stay classified. It re-entered academic statistics through the postwar work of his colleagues and became, over the following decades, the dominant framework for reasoning under uncertainty. It is the foundation of spam filters, medical diagnosis, and the probabilistic models at the heart of modern machine learning.

A 178-year-old theorem. A Victorian mansion in Hertfordshire. Eighteen hours a day, every day, for four years. The mathematics was not what saved the Allies, the people were. But without the mathematics, the people would have had nothing to work with.