BLETCHLEY PARK - HEREFORDSHIRE, ENGLAND - 1941
Every morning a stack of intercepted messages arrived. Thousands of letters. Meaningless. The men and women of Hut 8 had eighteen hours to find the needle, before midnight reset the haystack.
The haystack contained approximately 10²³ possible configurations.
The Battle of the Atlantic was the longest continuous military campaign of World War II. German U-boats, guided by encrypted radio traffic, were sinking Allied convoys faster than they could be replaced. Winston Churchill later wrote that the only thing that ever truly frightened him was the U-boat peril.
Every tactical order, every fleet position, every weather report was encrypted on the Enigma machine, a device the German high command believed to be mathematically unbreakable. They were right that brute force would fail. They were wrong about everything else.
"The Enigma was not broken by brilliance alone. It was broken by the systematic application of a 178-year-old theorem about conditional probability."
The people who broke it
Alan Turing
1912 – 1954
Mathematician · Head of Hut 8
Led the effort to break Naval Enigma. Designed the Bombe machine. Independently invented sequential Bayesian inference, calling it Banburismus, years before it appeared in the academic statistics literature.
"The weight-of-evidence framework Turing built at Bletchley is now a cornerstone of modern statistical inference."
Joan Clarke
1917 – 1996
Cryptanalyst · Deputy head of Hut 8
One of the finest cryptanalysts at Bletchley Park. Worked directly alongside Turing breaking Naval Enigma. Despite outperforming many of her male colleagues, she was officially graded as a 'linguist', the only category that permitted a woman to be paid at her level.
"Turing argued personally for her promotion and later said she was indispensable to the work of Hut 8."
I.J. Good
1916 – 2009
Statistician · Turing's assistant at Hut 8
Worked directly alongside Turing on Banburismus. After the war, he was the first to publish a rigorous account of Turing's Bayesian methods. Later pioneered Bayesian analysis in academic statistics.
"Introduced the term 'weight of evidence' and formalized the deciban framework that Turing used informally."
Hugh Alexander
1909 – 1974
Cryptanalyst · Head of Hut 8 from 1942
Two-time British chess champion who became one of the most effective operational cryptanalysts of the war. Succeeded Turing as head of Hut 8 and led the day-to-day breaking of Naval Enigma for the rest of the war. Later headed cryptanalysis at GCHQ.
"While Turing built the theory and the tools, Alexander ran the operation, turning mathematical methods into military intelligence under daily deadline pressure."
Gordon Welchman
1906 – 1985
Mathematician · Head of Hut 6
Broke Army and Air Force Enigma. Invented the 'diagonal board', a crucial enhancement to the Bombe that made operational codebreaking possible at scale.
"Without Welchman's diagonal board, the Bombe would have been too slow to be operationally useful."
Why brute force was impossible, and what the codebreakers could exploit
An Enigma machine looks like a typewriter. Press a key and a lamp lights up somewhere else on the keyboard — that is the encrypted letter. The path the signal takes depends on three things fitted into the machine:
Rotors
Three rotors chosen from a set of five, each wired to scramble the alphabet differently. They step forward with every keypress — the rightmost turns on every letter, the middle turns when the right notch is reached, the left turns more rarely. Like an odometer, but for substitution ciphers.
Reflector
After passing through the three rotors, the signal hits a fixed reflector that bounces it back through all three rotors again along a different path. This is what makes Enigma self-reciprocal: the same settings encrypt and decrypt. It also means a letter can never encrypt to itself — the fatal flaw.
Plugboard
Before and after the rotors, the signal passes through a plugboard that swaps 10 pairs of letters. The plugboard alone contributes more than 150 trillion configurations — the majority of Enigma's total keyspace.
Each day, every Enigma operator in the German military received a printed settings sheet specifying which rotors to use, what ring settings to apply, and which 10 pairs of letters to connect on the plugboard. At midnight, the sheet changed. Every break Hut 8 achieved expired in eighteen hours. The work started over the next morning.
The combined effect of rotor choice, rotor order, starting positions, ring settings, and plugboard wiring produces roughly 10²³ possible configurations. A machine checking one setting per microsecond would take longer than the age of the universe to exhaust them all. Brute force was never an option.
German operators were required to end every operational message with HEIL HITLER. The codebreakers knew this. Aligned against the last eleven letters of the ciphertext, those eleven known plaintext letters became a crib — a suspected piece of plaintext at a known position.
Here is what made the crib devastating: because Enigma's reflector means a letter can never encrypt to itself, any setting where H maps to H, E maps to E, or any other crib letter maps to itself is immediately, provably impossible. An eleven-character crib typically eliminated more than 99% of all candidate settings before a single probability had been calculated.
The constraint in one sentence
For a candidate setting to survive, no letter in the crib may align with the same letter in the ciphertext. This is not a heuristic — it is an absolute consequence of the reflector's design. Settings that violate it get a likelihood of zero and vanish from the posterior immediately.
What remained after constraint elimination was still a large number — but small enough to score with probability. That is where Bayes' theorem came in.
How do you find a needle in a haystack when the haystack contains a hundred sextillion pieces of hay?
You don't search it. You eliminate it, using evidence and probability.
Reframing codebreaking as a question about probability, before writing a single formula
Here is the precise question the codebreakers faced each morning:
“Given this intercepted ciphertext, what is the probability that each possible Enigma setting produced it?”
That is a question of posterior probability. It has three ingredients:
Prior
P(setting)
How plausible is each setting before we see the message? With no information: all equally likely.
Likelihood
P(message | setting)
If this setting were correct, how probable is the message we observed?
Posterior
P(setting | message)
After seeing the message: updated probability that this setting is correct.
Before we write a formula, let's build the intuition. Each square below represents a candidate Enigma setting.
Before any evidence
All ~10²³ Enigma settings are equally plausible. We have no reason to prefer any one of them.
Derived from the Enigma problem, then proven on a simpler cipher
The probability that both a setting H is correct and we observe message E can be written two ways using the product rule:
Since both expressions equal the same thing, divide both sides by :
Bayes' Theorem
where sums over all hypotheses
| Term | Name | In our problem |
|---|---|---|
| P(H) | Prior | Uniform — all possible settings equally likely before any evidence |
| P(E | H) | Likelihood | If this setting is correct, how probable is the observed ciphertext? |
| P(H | E) | Posterior | Updated probability after seeing the ciphertext |
| P(E) | Normaliser | Total probability summed across all hypotheses (makes it sum to 1) |
Enigma is too complex to trace by hand. Let's first prove the theorem works on something you can follow letter by letter: a Caesar cipher, where the keyspace is just 26 possible shifts.
We intercept "KHOOR". We suspect it's English. We know nothing else. Click through the letters and watch the posterior converge on the correct shift.
Intercept: "KHOOR". Unknown shift. Each letter we reveal updates our probability over all 26 possible shifts. The correct shift is 3 (decrypts to "HELLO").
Uniform prior — all 26 shifts equally likely.
The formula has two inputs you control: how strongly you believed the hypothesis before, and how decisively the evidence favours it.
Adjust the sliders and watch Bayes' theorem update in real time.
Prior P(H)
10.0%
Likelihood ratio
10×
Posterior P(H|E)
52.6%
The evidence has substantially updated our belief.
Turing's sequential extension, and the invention of the ban
The Caesar demo showed Bayesian updating with 26 hypotheses. Enigma has 10²³. Turing needed to combine many small pieces of evidence without multiplying chains of vanishingly small probabilities.
His solution: work in log-odds. In odds form, Bayesian updating is:
Taking of both sides:
Multiplication becomes addition. Each piece of evidence adds to the tally.
Turing called the unit a ban (named after Banbury, where the scoring sheets were printed). One ban = , a factor of 10 in the odds. A deciban is one-tenth of a ban. When a setting's tally crossed 30 decibans (3 bans = 1000:1 odds), the team accepted it. Watch that process below:
Each letter of the crib "WETTER" adds weight of evidence in decibans (Turing's unit). Addition — not multiplication. The 3-ban threshold (30 decibans) is the acceptance line.
What Banburismus actually was
Banburismus was the specific procedure for determining which day-key was used for Naval Enigma by comparing pairs of messages sent on the same settings. Each shared letter added decibans to the score. When the tally crossed the threshold, the result was fed into the Bombe. I.J. Good described it in 1979 as "the first serious application of sequential Bayesian analysis to a real problem."
From the maths to working code: three parts, progressively deeper
Bayes' theorem in two lines of NumPy, applied letter-by-letter to crack the Caesar cipher from Section 3.
import numpy as np
# English letter frequencies (A=0 ... Z=25)
ENG_FREQ = np.array([
0.082, 0.015, 0.028, 0.043, 0.127, 0.022, 0.020, 0.061,
0.070, 0.002, 0.008, 0.040, 0.024, 0.067, 0.075, 0.019,
0.001, 0.060, 0.063, 0.091, 0.028, 0.010, 0.024, 0.002,
0.020, 0.001,
])
def bayesian_update(prior: np.ndarray, likelihoods: np.ndarray) -> np.ndarray:
"""One step of Bayes: posterior = (likelihoods * prior) / sum."""
unnormalised = likelihoods * prior
return unnormalised / unnormalised.sum()
# -- Caesar cipher example --------------------------------------------------
# Intercepted: "KHOOR" (= "HELLO" encrypted with shift 3).
# Hypotheses: shift in {0, 1, ..., 25}. Prior: uniform.
prior = np.ones(26) / 26
for cipher_letter in "KHOOR":
c = ord(cipher_letter) - ord("A")
# P(cipher_letter | shift=k) = English freq of the decrypted letter
likelihoods = ENG_FREQ[[(c - k) % 26 for k in range(26)]]
prior = bayesian_update(prior, likelihoods)
best_shift = prior.argmax()
print(f"Most probable shift: {best_shift}") # -> 3
print(f"Decrypts to: {''.join(chr((ord(c)-ord('A')-best_shift)%26+ord('A')) for c in 'KHOOR')}") # -> HELLO Now in log-odds (decibans). Each letter adds to the score. This is exactly Turing's insight, translated into Python.
import math
def to_decibans(likelihood_ratio: float) -> float:
"""Turing's unit: 10 * log10(likelihood ratio)."""
return 10 * math.log10(likelihood_ratio)
def banburismus(ciphertext: str) -> list[float]:
"""
Sequential Bayesian updating in log-odds (decibans).
Returns the deciban score for each of the 26 possible Caesar shifts.
Addition, not multiplication -- that is Turing's key insight.
"""
N = 26
# Prior: uniform -> log-odds = log10(1/25) for each shift
log_odds = [math.log10(1 / (N - 1))] * N
for cipher_letter in ciphertext.upper():
c = ord(cipher_letter) - ord("A")
for shift in range(N):
plain = (c - shift) % 26
p_given_H = ENG_FREQ[plain]
p_given_not_H = 1 / N
# ADD decibans -- not multiply probabilities
log_odds[shift] += to_decibans(p_given_H / p_given_not_H)
return log_odds
scores = banburismus("KHOOR")
best = max(range(26), key=lambda k: scores[k])
print(f"Highest score: shift {best} ({scores[best]:.1f} decibans)") # -> shift 3 Real Enigma machine, real rotor search, crib-dragging with constraint elimination. The Enigma machine below lets you generate ciphertext to feed into the decoder.
from src.enigma.machine import EnigmaMachine, EnigmaConfig
from src.bayes.decoder import BayesianDecoder
# The message -- encrypted with unknown settings.
# Operators opened every weather report with "WETTER": the perfect crib.
CIPHERTEXT = "ABCXYZPQRLMNOPQDEFGHIJKSTUVWRST" # replace with real intercept
# Decoder: search rotors I-III, all 26^3 positions, fixed reflector UKW-B
decoder = BayesianDecoder(rotor_choices=["I", "II", "III"])
# Run: Enigma constraint eliminates >99% instantly, Bayesian scoring resolves the rest
results = decoder.decode(CIPHERTEXT, crib="WETTER", top_n=5, verbose=True)
# Best result
print(results[0].decrypted) # -> most probable plaintext
print(results[0].config.rotors) # -> recovered rotor order
print(results[0].window) # -> recovered starting positions Encrypt something, then paste the ciphertext into Notebook 03 and watch the Bayesian decoder recover the settings.
Click keys or type on your keyboard. Notice how the same letter never produces itself.
Notebook 01
The Enigma Machine
Build all components from scratch. Verify with real test cases.
Notebook 02
Bayes' Theorem
Derive, simulate, visualise. Caesar and Enigma worked examples.
Notebook 03
Cracking Enigma
Run the Bayesian decoder. Watch the posterior collapse.
# Get started
$ git clone https://github.com/vivekatsuperset/lutchet
$ uv sync && uv run jupyter lab
For undergrads: numerical stability, information theory, and why this is everywhere in modern ML
Enigma's prior probability for any single setting is about . Multiplying 30 such numbers together produces values that underflow to zero in floating-point arithmetic. Working in log-odds, where we add instead of multiply, sidesteps this entirely. This is why modern machine learning frameworks compute log-probabilities by default.
Claude Shannon published his theory of information in 1948, eight years after Turing built the ban system. The two frameworks measure the same thing in different units:
| Turing (1940) | Shannon (1948) |
|---|---|
| 1 ban = log10(10) | ~3.32 bits of information |
| Decibans of evidence | Bits of mutual information |
| Prior log-odds to posterior log-odds | Entropy reduction H(X) to H(X|E) |
| 30-deciban threshold | ~10 bits to identify 1 in 10^23 settings |
Naive Bayes Classifier
Bletchley: Score each Enigma setting by multiplying per-letter likelihoods
Today: Spam filters, text classification — same per-word likelihood multiplication
Logistic Regression
Bletchley: Turing's deciban tally: additive log-likelihood scores
Today: The logit function is log-odds. Learning is adding log-likelihoods.
Sequential Testing
Bletchley: Add evidence until 3-ban threshold — then accept
Today: A/B testing with early stopping. SPRT (Wald 1945) was directly inspired by Banburismus.
Language Models
Bletchley: P(next letter | rotor setting and position)
Today: P(next token | all previous tokens) — Bayesian language modelling at scale
Notebook 04: Advanced Topics
Log-odds in depth, sequential Bayesian updating, Shannon entropy, KL divergence, and implementing a Naive Bayes language classifier that descends directly from Turing's methods.
notebooks/04_advanced.ipynb Thomas Bayes died in 1761. Alan Turing died in 1954. Neither lived to see their ideas recognised as two expressions of the same truth.
"Sometimes it is the people no one imagines anything of who do the things that no one can imagine."
The theorem on your syllabus, the unit in Turing's notebook, the weights in a modern neural network, they are the same idea, found independently, by people trying to reason carefully under uncertainty.