BLETCHLEY PARK - HEREFORDSHIRE, ENGLAND - 1941

Every morning a stack of intercepted messages arrived. Thousands of letters. Meaningless. The men and women of Hut 8 had eighteen hours to find the needle, before midnight reset the haystack.

The haystack contained approximately 10²³ possible configurations.

A lesson in mathematics and code

The Battle of the Atlantic was the longest continuous military campaign of World War II. German U-boats, guided by encrypted radio traffic, were sinking Allied convoys faster than they could be replaced. Winston Churchill later wrote that the only thing that ever truly frightened him was the U-boat peril.

Every tactical order, every fleet position, every weather report was encrypted on the Enigma machine, a device the German high command believed to be mathematically unbreakable. They were right that brute force would fail. They were wrong about everything else.

"The Enigma was not broken by brilliance alone. It was broken by the systematic application of a 178-year-old theorem about conditional probability."

The people who broke it

Five figures you should know

Alan Turing

1912 – 1954

Mathematician · Head of Hut 8

Led the effort to break Naval Enigma. Designed the Bombe machine. Independently invented sequential Bayesian inference, calling it Banburismus, years before it appeared in the academic statistics literature.

"The weight-of-evidence framework Turing built at Bletchley is now a cornerstone of modern statistical inference."

Joan Clarke

1917 – 1996

Cryptanalyst · Deputy head of Hut 8

One of the finest cryptanalysts at Bletchley Park. Worked directly alongside Turing breaking Naval Enigma. Despite outperforming many of her male colleagues, she was officially graded as a 'linguist', the only category that permitted a woman to be paid at her level.

"Turing argued personally for her promotion and later said she was indispensable to the work of Hut 8."

I.J. Good

1916 – 2009

Statistician · Turing's assistant at Hut 8

Worked directly alongside Turing on Banburismus. After the war, he was the first to publish a rigorous account of Turing's Bayesian methods. Later pioneered Bayesian analysis in academic statistics.

"Introduced the term 'weight of evidence' and formalized the deciban framework that Turing used informally."

Hugh Alexander

1909 – 1974

Cryptanalyst · Head of Hut 8 from 1942

Two-time British chess champion who became one of the most effective operational cryptanalysts of the war. Succeeded Turing as head of Hut 8 and led the day-to-day breaking of Naval Enigma for the rest of the war. Later headed cryptanalysis at GCHQ.

"While Turing built the theory and the tools, Alexander ran the operation, turning mathematical methods into military intelligence under daily deadline pressure."

Gordon Welchman

1906 – 1985

Mathematician · Head of Hut 6

Broke Army and Air Force Enigma. Invented the 'diagonal board', a crucial enhancement to the Bombe that made operational codebreaking possible at scale.

"Without Welchman's diagonal board, the Bombe would have been too slow to be operationally useful."

How it worked

The Machine

Why brute force was impossible, and what the codebreakers could exploit

An Enigma machine looks like a typewriter. Press a key and a lamp lights up somewhere else on the keyboard — that is the encrypted letter. The path the signal takes depends on three things fitted into the machine:

Rotors

Three rotors chosen from a set of five, each wired to scramble the alphabet differently. They step forward with every keypress — the rightmost turns on every letter, the middle turns when the right notch is reached, the left turns more rarely. Like an odometer, but for substitution ciphers.

Reflector

After passing through the three rotors, the signal hits a fixed reflector that bounces it back through all three rotors again along a different path. This is what makes Enigma self-reciprocal: the same settings encrypt and decrypt. It also means a letter can never encrypt to itself — the fatal flaw.

Plugboard

Before and after the rotors, the signal passes through a plugboard that swaps 10 pairs of letters. The plugboard alone contributes more than 150 trillion configurations — the majority of Enigma's total keyspace.

Settings reset at midnight

Each day, every Enigma operator in the German military received a printed settings sheet specifying which rotors to use, what ring settings to apply, and which 10 pairs of letters to connect on the plugboard. At midnight, the sheet changed. Every break Hut 8 achieved expired in eighteen hours. The work started over the next morning.

The combined effect of rotor choice, rotor order, starting positions, ring settings, and plugboard wiring produces roughly 10²³ possible configurations. A machine checking one setting per microsecond would take longer than the age of the universe to exhaust them all. Brute force was never an option.

The aha moment: "Heil Hitler"

German operators were required to end every operational message with HEIL HITLER. The codebreakers knew this. Aligned against the last eleven letters of the ciphertext, those eleven known plaintext letters became a crib — a suspected piece of plaintext at a known position.

Here is what made the crib devastating: because Enigma's reflector means a letter can never encrypt to itself, any setting where H maps to H, E maps to E, or any other crib letter maps to itself is immediately, provably impossible. An eleven-character crib typically eliminated more than 99% of all candidate settings before a single probability had been calculated.

The constraint in one sentence

For a candidate setting to survive, no letter in the crib may align with the same letter in the ciphertext. This is not a heuristic — it is an absolute consequence of the reflector's design. Settings that violate it get a likelihood of zero and vanish from the posterior immediately.

What remained after constraint elimination was still a large number — but small enough to score with probability. That is where Bayes' theorem came in.

How do you find a needle in a haystack when the haystack contains a hundred sextillion pieces of hay?

You don't search it. You eliminate it, using evidence and probability.

Section 2

The Math Problem

Reframing codebreaking as a question about probability, before writing a single formula

Here is the precise question the codebreakers faced each morning:

“Given this intercepted ciphertext, what is the probability that each possible Enigma setting produced it?”

That is a question of posterior probability. It has three ingredients:

Prior

P(setting)

How plausible is each setting before we see the message? With no information: all equally likely.

Likelihood

P(message | setting)

If this setting were correct, how probable is the message we observed?

Posterior

P(setting | message)

After seeing the message: updated probability that this setting is correct.

Before we write a formula, let's build the intuition. Each square below represents a candidate Enigma setting.

candidate setting survives

Before any evidence

All ~10²³ Enigma settings are equally plausible. We have no reason to prefer any one of them.

Section 3

Bayes' Theorem

Derived from the Enigma problem, then proven on a simpler cipher

Deriving the Formula

The probability that both a setting H is correct and we observe message E can be written two ways using the product rule:

Since both expressions equal the same thing, divide both sides by :

Bayes' Theorem

where sums over all hypotheses

Term Name In our problem
P(H) Prior Uniform — all possible settings equally likely before any evidence
P(E | H) Likelihood If this setting is correct, how probable is the observed ciphertext?
P(H | E) Posterior Updated probability after seeing the ciphertext
P(E) Normaliser Total probability summed across all hypotheses (makes it sum to 1)

Worked Example: Caesar Cipher

Enigma is too complex to trace by hand. Let's first prove the theorem works on something you can follow letter by letter: a Caesar cipher, where the keyspace is just 26 possible shifts.

We intercept "KHOOR". We suspect it's English. We know nothing else. Click through the letters and watch the posterior converge on the correct shift.

Caesar Cipher — Bayesian Decoder

Intercept: "KHOOR". Unknown shift. Each letter we reveal updates our probability over all 26 possible shifts. The correct shift is 3 (decrypts to "HELLO").

?
?
?
?
?
Shift 0 (A)P(shift | evidence)Shift 25 (Z)
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Most probable shift:0 (A) — 3.8% probability

Uniform prior — all 26 shifts equally likely.

How Prior and Likelihood Shape the Posterior

The formula has two inputs you control: how strongly you believed the hypothesis before, and how decisively the evidence favours it.

Prior × Likelihood → Posterior

Adjust the sliders and watch Bayes' theorem update in real time.

10.0%
Very unlikely (1%)Very likely (99%)
10¹ = 10×
No information (L=1)Strong evidence (L=1000)

Prior P(H)

10.0%

Likelihood ratio

10×

Posterior P(H|E)

52.6%

52.6%=10×0.1 : 1(odds form)

The evidence has substantially updated our belief.

Section 4

Banburismus

Turing's sequential extension, and the invention of the ban

The Caesar demo showed Bayesian updating with 26 hypotheses. Enigma has 10²³. Turing needed to combine many small pieces of evidence without multiplying chains of vanishingly small probabilities.

His solution: work in log-odds. In odds form, Bayesian updating is:

Ω₁ = posterior odds Λ = likelihood ratio P(E|H) / P(E|¬H) Ω₀ = prior odds

Taking of both sides:

Multiplication becomes addition. Each piece of evidence adds to the tally.

Turing called the unit a ban (named after Banbury, where the scoring sheets were printed). One ban = , a factor of 10 in the odds. A deciban is one-tenth of a ban. When a setting's tally crossed 30 decibans (3 bans = 1000:1 odds), the team accepted it. Watch that process below:

Banburismus — Sequential Bayesian Updating

Each letter of the crib "WETTER" adds weight of evidence in decibans (Turing's unit). Addition — not multiplication. The 3-ban threshold (30 decibans) is the acceptance line.

W
1
E
2
T
3
T
4
E
5
R
6
II–I–III KDW
0.0 db
I–II–III AAA
0.0 db
I–III–II MQV
0.0 db
III–I–II ZAS
0.0 db
II–III–I BPK
0.0 db
III–II–I XRL
0.0 db
30 decibans = 3 bans = 1000:1 odds — Turing's acceptance threshold
Uniform prior — no evidence yet. All candidates treated equally.

What Banburismus actually was

Banburismus was the specific procedure for determining which day-key was used for Naval Enigma by comparing pairs of messages sent on the same settings. Each shared letter added decibans to the score. When the tally crossed the threshold, the result was fed into the Bombe. I.J. Good described it in 1979 as "the first serious application of sequential Bayesian analysis to a real problem."

Section 5

Python: Build It

From the maths to working code: three parts, progressively deeper

Part A: The Core Formula

Bayes' theorem in two lines of NumPy, applied letter-by-letter to crack the Caesar cipher from Section 3.

bayes_caesar.py Notebook 02
import numpy as np

# English letter frequencies (A=0 ... Z=25)
ENG_FREQ = np.array([
    0.082, 0.015, 0.028, 0.043, 0.127, 0.022, 0.020, 0.061,
    0.070, 0.002, 0.008, 0.040, 0.024, 0.067, 0.075, 0.019,
    0.001, 0.060, 0.063, 0.091, 0.028, 0.010, 0.024, 0.002,
    0.020, 0.001,
])

def bayesian_update(prior: np.ndarray, likelihoods: np.ndarray) -> np.ndarray:
    """One step of Bayes: posterior = (likelihoods * prior) / sum."""
    unnormalised = likelihoods * prior
    return unnormalised / unnormalised.sum()

# -- Caesar cipher example --------------------------------------------------
# Intercepted: "KHOOR" (= "HELLO" encrypted with shift 3).
# Hypotheses: shift in {0, 1, ..., 25}. Prior: uniform.

prior = np.ones(26) / 26

for cipher_letter in "KHOOR":
    c = ord(cipher_letter) - ord("A")
    # P(cipher_letter | shift=k) = English freq of the decrypted letter
    likelihoods = ENG_FREQ[[(c - k) % 26 for k in range(26)]]
    prior = bayesian_update(prior, likelihoods)

best_shift = prior.argmax()
print(f"Most probable shift: {best_shift}")   # -> 3
print(f"Decrypts to: {''.join(chr((ord(c)-ord('A')-best_shift)%26+ord('A')) for c in 'KHOOR')}")  # -> HELLO

Part B: Banburismus in Code

Now in log-odds (decibans). Each letter adds to the score. This is exactly Turing's insight, translated into Python.

banburismus.py Notebook 02 → 03
import math

def to_decibans(likelihood_ratio: float) -> float:
    """Turing's unit: 10 * log10(likelihood ratio)."""
    return 10 * math.log10(likelihood_ratio)

def banburismus(ciphertext: str) -> list[float]:
    """
    Sequential Bayesian updating in log-odds (decibans).
    Returns the deciban score for each of the 26 possible Caesar shifts.
    Addition, not multiplication -- that is Turing's key insight.
    """
    N = 26
    # Prior: uniform -> log-odds = log10(1/25) for each shift
    log_odds = [math.log10(1 / (N - 1))] * N

    for cipher_letter in ciphertext.upper():
        c = ord(cipher_letter) - ord("A")
        for shift in range(N):
            plain = (c - shift) % 26
            p_given_H     = ENG_FREQ[plain]
            p_given_not_H = 1 / N
            # ADD decibans -- not multiply probabilities
            log_odds[shift] += to_decibans(p_given_H / p_given_not_H)

    return log_odds

scores = banburismus("KHOOR")
best   = max(range(26), key=lambda k: scores[k])
print(f"Highest score: shift {best} ({scores[best]:.1f} decibans)")  # -> shift 3

Part C: Full Enigma Simulation

Real Enigma machine, real rotor search, crib-dragging with constraint elimination. The Enigma machine below lets you generate ciphertext to feed into the decoder.

decoder_demo.py Notebook 03
from src.enigma.machine import EnigmaMachine, EnigmaConfig
from src.bayes.decoder  import BayesianDecoder

# The message -- encrypted with unknown settings.
# Operators opened every weather report with "WETTER": the perfect crib.
CIPHERTEXT = "ABCXYZPQRLMNOPQDEFGHIJKSTUVWRST"   # replace with real intercept

# Decoder: search rotors I-III, all 26^3 positions, fixed reflector UKW-B
decoder = BayesianDecoder(rotor_choices=["I", "II", "III"])

# Run: Enigma constraint eliminates >99% instantly, Bayesian scoring resolves the rest
results = decoder.decode(CIPHERTEXT, crib="WETTER", top_n=5, verbose=True)

# Best result
print(results[0].decrypted)          # -> most probable plaintext
print(results[0].config.rotors)      # -> recovered rotor order
print(results[0].window)             # -> recovered starting positions

The Machine: Live in Your Browser

Encrypt something, then paste the ciphertext into Notebook 03 and watch the Bayesian decoder recover the settings.

Enigma Machine — Interactive Demo

Left
A
I
Middle
A
II
Right
A
III
LAMP BOARD
Q
W
E
R
T
Y
U
I
O
P
A
S
D
F
G
H
J
K
L
Z
X
C
V
B
N
M
PLAINTEXT
Type or click keys…
CIPHERTEXT
Encrypted output…
▸ Machine settings

Click keys or type on your keyboard. Notice how the same letter never produces itself.

Notebook 01

The Enigma Machine

Build all components from scratch. Verify with real test cases.

Notebook 02

Bayes' Theorem

Derive, simulate, visualise. Caesar and Enigma worked examples.

Notebook 03

Cracking Enigma

Run the Bayesian decoder. Watch the posterior collapse.

# Get started

$ git clone https://github.com/vivekatsuperset/lutchet

$ uv sync && uv run jupyter lab

Section 6: Advanced

Going Deeper

For undergrads: numerical stability, information theory, and why this is everywhere in modern ML

Why Log-Odds? Numerical Stability

Enigma's prior probability for any single setting is about . Multiplying 30 such numbers together produces values that underflow to zero in floating-point arithmetic. Working in log-odds, where we add instead of multiply, sidesteps this entirely. This is why modern machine learning frameworks compute log-probabilities by default.

Connection to Information Theory

Claude Shannon published his theory of information in 1948, eight years after Turing built the ban system. The two frameworks measure the same thing in different units:

Turing (1940) Shannon (1948)
1 ban = log10(10) ~3.32 bits of information
Decibans of evidence Bits of mutual information
Prior log-odds to posterior log-odds Entropy reduction H(X) to H(X|E)
30-deciban threshold ~10 bits to identify 1 in 10^23 settings

The Modern Descendants

Naive Bayes Classifier

Bletchley: Score each Enigma setting by multiplying per-letter likelihoods

Today: Spam filters, text classification — same per-word likelihood multiplication

Logistic Regression

Bletchley: Turing's deciban tally: additive log-likelihood scores

Today: The logit function is log-odds. Learning is adding log-likelihoods.

Sequential Testing

Bletchley: Add evidence until 3-ban threshold — then accept

Today: A/B testing with early stopping. SPRT (Wald 1945) was directly inspired by Banburismus.

Language Models

Bletchley: P(next letter | rotor setting and position)

Today: P(next token | all previous tokens) — Bayesian language modelling at scale

📓

Notebook 04: Advanced Topics

Log-odds in depth, sequential Bayesian updating, Shannon entropy, KL divergence, and implementing a Naive Bayes language classifier that descends directly from Turing's methods.

notebooks/04_advanced.ipynb

Thomas Bayes died in 1761. Alan Turing died in 1954. Neither lived to see their ideas recognised as two expressions of the same truth.

"Sometimes it is the people no one imagines anything of who do the things that no one can imagine."
- Alan Turing

The theorem on your syllabus, the unit in Turing's notebook, the weights in a modern neural network, they are the same idea, found independently, by people trying to reason carefully under uncertainty.