Random Walks, 1998 and Now

A Question Without a Curator

Rank a set nobody is ranking. No editor, no librarian, no ground truth.

A library catalogue is ranked by a librarian. A medical journal is ranked by an editorial board. A restaurant guide is ranked by someone paid to eat. Every ranked list you have ever used ultimately rests on a human who decided. The web, by 1996, already did not. It had a hundred and fifty million documents, a doubling time of under a year, and no authority anyone recognised.

Early web search treated this as a text problem: score each page by keyword match and call it a day. The result was a kind of bleak democracy. Every sufficiently keyword-stuffed page was equally relevant. Every scam and link farm ranked as high as every trustworthy source. Users learned to scroll past half a screen of garbage to find anything useful.

Two Stanford graduate students noticed something else. The web was not only a collection of pages. It was also a collection of links between pages. A link is an endorsement, or a citation, or at least an acknowledgement. If you could rank pages by how they are linked to, rather than by what words they contain, you might get a ranking that text alone could not give.

Their first formulation sounds circular: a page is important if important pages link to it. That is either a perfect definition or nonsense, depending on whether it has a solution.

The Random Walk

Imagine a bored surfer, infinite time, and no sense of purpose.

A bored surfer lands on a random page, picks one of its outgoing links uniformly at random, follows it, and repeats forever. Every so often she gets bored and teleports to a different random page, just to escape whatever corner of the graph she has wandered into. The probability of following a link is called the damping factor $d$ . The probability of teleporting instead is $1 - d$ .

The long-run fraction of time she spends at page $v$ is a number between 0 and 1. Call it $π (v)$ . Then her behaviour satisfies the equation

π (v) = d \sum_{u \to v} \frac{π ( u )}{d e g _{out} ( u )} + \frac{1 - d}{N}

Read that equation aloud, one piece at a time. The thing on the left, $π (v)$ , is the importance of page $v$ : the fraction of time our bored surfer spends standing there in the long run. The right-hand side says that importance comes from exactly two places: links, plus teleport.

Start with the sum, which is the "links" part. For every page $u$ that points at $v$ , page $u$ hands over a share of its own importance. How big a share? Its total importance divided by the number of outgoing links it has, because $u$ splits its attention equally among every page it points to. A page that is linked to by one very important page inherits more rank than a page linked to by ten mediocre ones. Importance flows along arrows, diluted by the out-degree of the sender. Then the whole sum is multiplied by the damping factor $d$ because our surfer only follows a link $d$ of the time.

The second term, $(1 - d) / N$ , is the "teleport" part. The other $1 - d$ of the time, our surfer gives up on links entirely and jumps to a completely random page. That teleport probability is spread evenly over all $N$ pages in the graph, so every page, no matter how obscure, receives exactly $(1 - d) / N$ worth of mass from this route. Think of it as a democratic floor. No page ever has zero rank, because the teleport always sprinkles a little probability on everyone.

Put the two parts together and PageRank is, in one sentence: the recirculating echo of everyone who points at you, plus a small uniform baseline everyone gets for free. The walker is likely to be where important pages send her; important pages are those the walker tends to be at. The equation is the condition that says this self-referential definition has settled into a stable state.

The circular definition now has a cure. The equation above is a fixed-point equation. Start with any guess for $π$ , compute the right-hand side, use that as your new guess, repeat. For any damping factor strictly less than 1, this iteration converges to a unique solution, regardless of starting point. The Markov-chain result that makes this work is more than a century old; Brin and Page's contribution was realising that the web was an instance worth solving.

Why the teleport is essential

Without teleport, a walker can get stuck in a rank sink: a set of pages that link only to each other. The first time she wanders in, she never leaves, and all the probability mass collects there. Damping leaks a little mass back out to the rest of the graph on every step, which is enough to keep the fixed point unique and well-behaved. Brin and Page used $d = 0.85$ , a value chosen empirically for fast convergence on the 1998 web.

Half One: The Web, 1998

Eight pages, fourteen links, and a power iteration that settles in under a second.

Below is a small directed graph, loosely modelled on the eleven-node example in Brin and Page's paper. Each node is a stylised page; each arrow is a link. Step through the power iteration. Watch the ranks start uniform, then concentrate on the well-linked nodes as mass flows along the arrows. Drag the damping slider and watch the rankings rearrange: lower damping flattens the ranks (teleport dominates), higher damping sharpens them (link structure dominates).

PageRank: Power Iteration

Damping0.85

Node size and saturation track the current rank. The graph re-lays-out its rankings each iteration as mass flows along the arrows.

Iteration 0 of 56

Uniform start. Every node begins with equal rank.

Power iteration is still moving. Step forward to watch it stabilise.

Ranks (sorted)

A0.125

B0.125

C0.125

D0.125

E0.125

F0.125

G0.125

H0.125

On the real web graph of 1998, this same iteration ran on a workstation Brin and Page built out of spare parts, with hard drives stacked in a Lego frame. Convergence on a graph of a hundred million pages took roughly fifty iterations. The whole thing was embarrassingly cacheable; one PageRank vector, one index, one ranking per query.

The name was a pun. PageRank was both a rank for pages, and a rank due to Larry Page. The company formed to run it was incorporated in 1998 and, at the time this lesson is being written, is worth around two trillion dollars.

Python: Build It

A directed graph in twenty lines. PageRank in twenty more.

Part A: A directed graph

Plain adjacency list. No weights, no capacities, one edge per ordered pair. Self-loops are rejected because they would require special-casing the transition matrix for no conceptual gain.

directed.py Notebook 09

from dataclasses import dataclass, field

@dataclass
class DirectedGraph:
    """Directed graph with at most one edge in each direction per pair."""
    _adj: dict[str, list[str]] = field(default_factory=dict)

    def add_edge(self, u: str, v: str) -> None:
        if u == v:
            raise ValueError(f"Self-loop on {u} is not allowed.")
        self._adj.setdefault(u, [])
        self._adj.setdefault(v, [])
        if v not in self._adj[u]:
            self._adj[u].append(v)

    def out_neighbours(self, node: str) -> list[str]:
        return list(self._adj.get(node, []))

    def in_neighbours(self, node: str) -> list[str]:
        return [u for u, vs in self._adj.items() if node in vs]

    @property
    def nodes(self) -> list[str]:
        return list(self._adj.keys())

Part B: PageRank

Power iteration against the equation above. The dangling_mass term handles nodes with no out-edges by teleporting their mass; this is the standard fix in the literature and matches what the transition matrix would prescribe if you wrote it out.

walks.py Notebook 09

def pagerank(
    g: DirectedGraph,
    damping: float = 0.85,
    tol: float = 1e-8,
    max_iter: int = 200,
) -> dict[str, float]:
    """Power iteration with uniform teleport. Returns ranks summing to 1."""
    nodes = g.nodes
    n = len(nodes)
    ranks = {v: 1 / n for v in nodes}
    in_ns = {v: g.in_neighbours(v) for v in nodes}
    out_d = {v: len(g.out_neighbours(v)) for v in nodes}
    dangling = [v for v in nodes if out_d[v] == 0]

    for _ in range(max_iter):
        dangling_mass = sum(ranks[u] for u in dangling)
        new = {}
        for v in nodes:
            incoming = sum(ranks[u] / out_d[u] for u in in_ns[v])
            new[v] = (
                damping * (incoming + dangling_mass / n)
                + (1 - damping) / n
            )
        if sum(abs(new[v] - ranks[v]) for v in nodes) < tol:
            return new
        ranks = new
    raise RuntimeError("PageRank did not converge.")

📓

Notebook 09: Random walks and PageRank

Build the web graph. Run PageRank, inspect rankings, watch power iteration converge. Sweep the damping factor. Then build a small knowledge graph and watch personalised PageRank surface the nearest related concepts.

notebooks/09_graph_theory_random_walks.ipynb →

Half Two: Knowledge Graphs, Now

The same algorithm, a different teleport vector, and a very different problem.

A large language model, asked a factual question, has two ways to respond. It can continue the pattern of text it has seen, hoping the continuation happens to be correct. Or it can retrieve relevant information from an external source and ground its answer in that. The first approach produces confident confabulation. The second, done well, produces answers with citations.

The retrieval step is a ranking problem. Given a query, or a seed concept, find the most relevant nodes in a knowledge graph where concepts are connected by edges like is-a, causes, cites, or authored-by. Keyword matching is not enough: "entropy" and "information" almost never co-occur in the same sentence but are tightly connected conceptually. What you want is something that notices the shape of the neighbourhood around the seed.

The workhorse is personalised PageRank. Same power iteration, same damping, but instead of teleporting uniformly to any node the walker teleports back to the seed. The long-run distribution is then concentrated in whatever neighbourhood is reachable from the seed by short, dense sequences of edges. It is PageRank with a point of view.

Below is a small knowledge graph drawn from the characters and concepts you have met elsewhere on this site. Click any node to set it as the seed; the graph lights up the concepts most strongly related to it, weighted by the personalised walk. Try Shannon, then Euler, then Turing, and watch the graph interpret your query by reshaping itself around the seed.

Knowledge Graph: Personalised PageRank

Click any node to set it as the seed.

Blue is the seed. Gold intensity shows how strongly each node is surfaced by a damped random walk from the seed.

Seed

Shannon

Personalised PageRank starts every teleport from this node. The result ranks everything else by how related it is to the seed.

Most related

1Entropy

0.356

2A Mathematical Theory of Communication (1948)

0.192

3Huffman coding (1952)

0.151

4Kolmogorov complexity

0.151

5Shannon

0.150

The "GraphRAG" family of retrieval systems, and the knowledge-graph-augmented prompting used in production LLM pipelines, are built out of exactly this primitive. The thing that has changed between 1998 and now is not the mathematics. It is what counts as a node.

Why the Same Algorithm Works Twice

Pages, concepts, citations, entities. Nothing about PageRank cares which one.

The web is a directed graph with implicit endorsement semantics: a link is a soft vote of relevance. A knowledge graph is a directed graph with implicit endorsement semantics: an edge is a soft claim of relatedness. From the perspective of a random walker there is no difference. She has no idea whether she is on a 1998 web page about dog grooming or on a 2026 triple asserting that Shannon wrote a paper about entropy. She picks an outgoing edge uniformly, occasionally teleports, and lets the long-run distribution do the ranking for her.

This is the signature move of graph theory. The abstraction does not care what the nodes represent. Euler did not need to know what a bridge was, only that it connected two places. Dijkstra did not need to know what a kilometre felt like, only that it was a non-negative number. Ford and Fulkerson did not need to know what a freight car carried, only that its capacity was finite. Brin and Page did not need to know what a page said, only that it linked. Four algorithms, three centuries, one mathematics.

The Four Stories, in One Breath

A parlour puzzle, a café, a classified study, a pair of graduate students.

Königsberg, 1736. Euler throws away the geography of a city and keeps only the way its land masses connect, and accidentally invents a new branch of mathematics.

Amsterdam, 1956. Dijkstra invents shortest paths in a café because he has no paper, and twelve years later every router on the ARPANET is running it.

Santa Monica, 1955 and 1956. One floor apart at RAND, Harris and Ross ask which rails to bomb while Ford and Fulkerson prove that the answer is the same as asking how much can flow.

Stanford, 1998 and beyond. Brin and Page let a bored surfer click random links, and the same construction built Google and now grounds language models in structured knowledge.

Different centuries, different problems, one mathematics of connection.