Soviet Rail, 1955

The Study on the Desk

Forty-four vertices, one hundred and five edges, and a classified stamp on the cover.

In late 1954, the United States Air Force asked the RAND Corporation to evaluate the capacity of the Soviet railway network to move supplies from the Soviet interior to Eastern Europe. The study was assigned to Theodore Harris, a mathematician at RAND, and Frank Ross, a retired Air Force general. They finished their report in October 1955 under a title that gave nothing away: Fundamentals of a Method for Evaluating Rail Net Capacities.

Inside was a graph. Forty-four vertices, each a major rail junction in the Soviet Union and its satellites. One hundred and five edges, each a section of track. Every edge had a number attached to it, the number of freight trains it could move per day, computed from track count, signalling capacity, grade, and the locomotives available to the district.

Harris and Ross asked two questions. First: what is the maximum tonnage per day that can move from the Soviet interior to the Eastern European border? Second, and this was the operational question: which small set of rail segments, if destroyed, would reduce that capacity the most?

The map is now public.

The Harris and Ross report was declassified in 1999. The map of the Soviet rail network they used, and the labelled "bottleneck" the study identified, are in the public domain. If you want to see the original, search for Harris Ross 1955 Soviet rail; the RAND archive hosts a PDF. The bottleneck they found ran through a corridor near the present-day Ukraine-Poland border.

The Question, Precisely

Directed edges, capacities, a source, a sink, and a conservation law at every other node.

A flow network is a directed graph in which every edge $u \to v$ carries a non-negative number $c (u, v)$ called its capacity. Two distinguished nodes are the source $s$ and the sink $t$ . A flow is an assignment of a number $f (u, v) \geq 0$ to every edge such that no edge exceeds its capacity and, at every node other than $s$ and $t$ , what flows in equals what flows out.

The maximum flow problem asks for the flow that pushes as many units as possible from $s$ to $t$ . An s-t cut is a partition of the nodes into two sets $S$ and $T$ with $s \in S$ and $t \in T$ . The capacity of a cut is the sum of the capacities of edges going from $S$ to $T$ . The minimum cut is the cut of smallest capacity.

Max flow is the question Harris and Ross asked first. Min cut is the question they cared about second, and more. The two questions turn out to have the same answer.

Find the Bottleneck

Ten cities, a westward flow, and a single question: which rails matter most?

Below is a stylised version of the Harris-Ross network. Moscow is the source, a notional Western Frontier is the sink, and supply flows westward through ten cities. Every edge shows flow / capacity. Gold edges carry some of the maximum flow; red edges are the min cut. Click an edge label to raise its capacity by five; shift-click to lower it by five. Watch the red edges move as you work.

Soviet Rail: Max Flow and Min Cut

Gold edges carry flow. Red edges are on the min cut. Click an edge label to add 5 units of capacity; shift-click to remove 5.

Maximum flow

Moscow to Frontier, summed across all paths.

Min cut (3 edges)

Warsaw → Prague25

Berlin → Frontier40

Budapest → Vienna20

Sum85

Two things worth trying. Raise the capacity of a red edge; the red set almost always shifts somewhere else, because cutting a bottleneck only helps until a new one becomes binding. Set every edge out of Moscow to zero in turn; the max flow collapses to zero. The min cut is never larger than the source's total outgoing capacity, because that is itself a cut.

The Duality: Max Flow Equals Min Cut

One of the cleanest theorems in combinatorial optimisation.

First, the easy direction. Any flow from $s$ to $t$ has to cross every s-t cut, so its value is at most the capacity of any cut:

∣ f ∣ \leq cap (S, T)

Read that inequality aloud. On the left, $∣ f ∣$ is the total value of whatever flow you happened to construct: how many units per day you successfully pushed from source to sink. On the right, $cap (S, T)$ is the total capacity of every pipe that crosses some wall you drew splitting the source side from the sink side. The claim is the obvious one: every single unit of flow you sent across the network had to, at some point, cross that wall, because the source is on one side and the sink is on the other. So your flow cannot possibly exceed the total carrying capacity of the pipes that cross it.

Different walls give different bounds. Draw a loose wall, one that crosses a wide section of the network where pipes are generous, and you get a loose, unhelpful upper bound on the flow. Draw a tight wall, one that slices through a narrow bottleneck, and you get a tight bound. Every wall you can draw is an upper bound on every flow you can push. Every flow you push is, by symmetry, a lower bound on the capacity of every possible wall. The two sides pin each other.

The miracle, sitting in the amber box below, is that the upper bounds and lower bounds actually meet. The tightest wall you can draw has the same capacity as the fattest flow you can push. Not close. Exactly the same.

Max-flow / min-cut theorem (Ford & Fulkerson, 1956)

In any flow network, the value of the maximum flow from $s$ to $t$ equals the capacity of the minimum $s$ - $t$ cut.

Why this matters: a question about shipping (how much can flow?) is answered by a question about sabotage (what is the cheapest way to break the network?). For Harris and Ross this was more than a curiosity. Their operational question was the dual of the shipping question they had been asked. The same computation answered both.

The proof is constructive. If you have a maximum flow, the algorithm below gives you a witness min cut for free: the set $S$ of nodes still reachable from $s$ in the residual graph, and the edges crossing from $S$ to its complement. That is exactly how the red edges in the demo are computed.

The Algorithm: Augmenting Paths

Push flow along a path. Repeat until there is no path left to push along.

The key construct is the residual graph. Start with a copy of the network. Whenever you push $δ$ units along an edge $u \to v$ , decrease its residual capacity by $δ$ and increase the residual capacity of the reverse edge $v \to u$ by the same amount. The reverse edge is what lets the algorithm "take back" a bad decision: a later augmenting path can route flow back through $v \to u$ , effectively cancelling part of an earlier choice.

The reverse-edge trick is the subtle part, so here is the intuition. Suppose at step three you pushed ten units through Kiev because it looked locally like a good idea. At step seven you realise that those ten units would have been better spent going through Warsaw instead. A naive algorithm would be stuck: it has already committed. The residual graph's reverse edges say: fine, push ten units backwards out of Kiev on this new path, which cancels the earlier commitment, and now you are free to route those same ten units the better way. You never actually run the rails backwards; the reverse edge is bookkeeping that lets the algorithm undo its own mistakes without restarting.

The Ford-Fulkerson method is then: find any path from $s$ to $t$ in the residual graph, push as much flow as the bottleneck edge allows, repeat. When no augmenting path exists, you are done.

Why BFS, not DFS

Ford and Fulkerson did not specify how to find the augmenting path. A depth-first search can, on networks with irrational capacities, fail to terminate; a famous three-edge example by Zwick converges to a flow value that is only part of the true maximum. The fix, from Jack Edmonds and Richard Karp in 1972, is to choose the shortest augmenting path each time, which a BFS gives you for free. The resulting Edmonds-Karp algorithm terminates in at most $O (V E^{2})$ iterations, independent of the capacity values.

Step through it on the Soviet-rail graph. Each blue path is the current BFS augmenting path. The label on each edge shows the cumulative flow. The number on the right is the total value of the flow so far. When the frontier stops expanding to the sink, the algorithm halts, and that number is the maximum.

Augmenting Paths, Step by Step

Blue highlights the current augmenting path. Gold edges already carry flow. Each edge shows flow / capacity.

Step 0 of 5

Initial state. No flow yet. BFS will look for the shortest augmenting path from Moscow to Frontier.

Total flow

Cumulative after this augmentation.

Notice that the paths BFS finds are the ones with the fewest edges, not the ones with the most capacity. That is fine: every augmentation increases the flow by a positive integer (given integer capacities), and no augmentation ever decreases it, so the algorithm converges in a bounded number of steps.

Python: Build It

A flow network in fifteen lines. Edmonds-Karp in thirty.

Part A: A flow network

A nested dictionary _cap[u][v] stores the capacity of each directed edge. Negative capacities and self-loops are rejected at insert time.

flow.py Notebook 08

from collections import defaultdict
from dataclasses import dataclass, field

@dataclass
class FlowNetwork:
    """Directed graph with non-negative edge capacities."""
    _cap: dict[str, dict[str, float]] = field(
        default_factory=lambda: defaultdict(dict)
    )

    def add_edge(self, u: str, v: str, capacity: float) -> None:
        if capacity < 0:
            raise ValueError(f"Negative capacity {capacity} not allowed.")
        if u == v:
            raise ValueError(f"Self-loop on {u} is not a flow edge.")
        self._cap[u][v] = self._cap[u].get(v, 0.0) + capacity

    def capacity(self, u: str, v: str) -> float:
        return self._cap.get(u, {}).get(v, 0.0)

    @property
    def nodes(self) -> list[str]:
        return list(self._cap.keys())

Part B: Edmonds-Karp

Build a residual graph, BFS for an augmenting path, push the bottleneck along it, repeat. The reverse-edge update is the one line that makes the whole thing correct.

flow.py Notebook 08

from collections import deque

def max_flow(net: FlowNetwork, source: str, sink: str):
    """Edmonds-Karp: Ford-Fulkerson with a BFS augmenting path."""
    residual = {u: {} for u in net.nodes}
    for u in net.nodes:
        for v, c in net._cap[u].items():
            residual[u][v] = residual[u].get(v, 0.0) + c
            residual[v].setdefault(u, 0.0)  # reverse edge, cap 0

    def bfs():
        prev = {source: None}
        q = deque([source])
        while q:
            u = q.popleft()
            if u == sink:
                break
            for v, cap in residual[u].items():
                if cap > 0 and v not in prev:
                    prev[v] = u
                    q.append(v)
        if sink not in prev:
            return None
        path, node = [], sink
        while node is not None:
            path.append(node)
            node = prev[node]
        return list(reversed(path))

    value = 0.0
    while (path := bfs()) is not None:
        bottleneck = min(residual[path[i]][path[i + 1]]
                         for i in range(len(path) - 1))
        for i in range(len(path) - 1):
            u, v = path[i], path[i + 1]
            residual[u][v] -= bottleneck
            residual[v][u] += bottleneck
        value += bottleneck
    return value

📓

Notebook 08: Max flow and min cut

Build a toy flow network. Compute its max flow and min cut. Verify they match. Then build the Soviet-rail graph and see which edges the algorithm picks out as bottlenecks.

notebooks/08_graph_theory_flow.ipynb →

Why This Was Classified

The theorem was open from the start. The answer it computed was not.

Ford and Fulkerson's 1956 paper is a public document, published in the Canadian Journal of Mathematics. Anyone with a library card could read the theorem and the algorithm. The mathematics was never secret.

What was secret was the input. The Harris-Ross report carried a SECRET classification because it named specific Soviet rail junctions, attached specific capacities to them, and then pointed at which ones mattered. The report's central finding, the bottleneck in the Soviet supply chain, was the same object that Ford-Fulkerson would call a min cut. But knowing the cut existed in the abstract was harmless. Knowing where the cut was on a real map was a targeting recommendation.

The report was declassified in 1999, forty-four years after it was written, long after its operational relevance had expired. What remains is a beautiful historical document: a real-world problem, stated in the language of graphs, solved with an algorithm that was being invented in the same building at the same time.

What Flow Became

The theorem that answers shipping questions ended up everywhere shipping is not.

Max-flow / min-cut did become the workhorse of transportation modelling. But the more striking afterlife was as a primitive in problems that have nothing obviously to do with flow. Image segmentation, where pixels are nodes and a min cut separates foreground from background. Bipartite matching, where max flow counts the largest set of compatible pairs. Network reliability, where a min cut is the smallest set of edges whose failure disconnects a service from its users. The theorem keeps showing up because it is the combinatorial form of linear-programming duality, and duality is everywhere.

Flow is deterministic: you allocate exact capacities and compute exact values. The next story asks what happens when we let go of that determinism, and think of the graph as a process that moves something around probabilistically. The shift from flow to random walk is the shift from a single planned route to an expected distribution over routes, and the mathematics that falls out of it is the reason a word-probability model, trained on enough text, can be coaxed into naming citations.

Random walks

PageRank built Google this way. Modern LLMs retrieve knowledge this way.

Back to

Flows & cuts

Image segmentation, bipartite matching, network reliability, and why LP duality keeps showing up.

All topics

Four algorithms

One insight from 1736 that made all of them thinkable.