STANFORD · 1998 · AND ALSO TODAY
The web in 1996 was a hundred and fifty million pages with no library catalogue, no curator, and no one in charge. How do you rank something nobody is ranking for you?
The answer came from imagining a bored surfer clicking random links forever, and it is the same answer that keeps modern language models from making things up.
Part of Graph Theory: four standalone stories tracing the mathematics of connection.
See all four →Rank a set nobody is ranking. No editor, no librarian, no ground truth.
A library catalogue is ranked by a librarian. A medical journal is ranked by an editorial board. A restaurant guide is ranked by someone paid to eat. Every ranked list you have ever used ultimately rests on a human who decided. The web, by 1996, already did not. It had a hundred and fifty million documents, a doubling time of under a year, and no authority anyone recognised.
Early web search treated this as a text problem: score each page by keyword match and call it a day. The result was a kind of bleak democracy. Every sufficiently keyword-stuffed page was equally relevant. Every scam and link farm ranked as high as every trustworthy source. Users learned to scroll past half a screen of garbage to find anything useful.
Two Stanford graduate students noticed something else. The web was not only a collection of pages. It was also a collection of links between pages. A link is an endorsement, or a citation, or at least an acknowledgement. If you could rank pages by how they are linked to, rather than by what words they contain, you might get a ranking that text alone could not give.
Their first formulation sounds circular: a page is important if important pages link to it. That is either a perfect definition or nonsense, depending on whether it has a solution.
Imagine a bored surfer, infinite time, and no sense of purpose.
A bored surfer lands on a random page, picks one of its outgoing links uniformly at random, follows it, and repeats forever. Every so often she gets bored and teleports to a different random page, just to escape whatever corner of the graph she has wandered into. The probability of following a link is called the damping factor . The probability of teleporting instead is .
The long-run fraction of time she spends at page is a number between 0 and 1. Call it . Then her behaviour satisfies the equation
Read that equation aloud, one piece at a time. The thing on the left, , is the importance of page : the fraction of time our bored surfer spends standing there in the long run. The right-hand side says that importance comes from exactly two places: links, plus teleport.
Start with the sum, which is the "links" part. For every page that points at , page hands over a share of its own importance. How big a share? Its total importance divided by the number of outgoing links it has, because splits its attention equally among every page it points to. A page that is linked to by one very important page inherits more rank than a page linked to by ten mediocre ones. Importance flows along arrows, diluted by the out-degree of the sender. Then the whole sum is multiplied by the damping factor because our surfer only follows a link of the time.
The second term, , is the "teleport" part. The other of the time, our surfer gives up on links entirely and jumps to a completely random page. That teleport probability is spread evenly over all pages in the graph, so every page, no matter how obscure, receives exactly worth of mass from this route. Think of it as a democratic floor. No page ever has zero rank, because the teleport always sprinkles a little probability on everyone.
Put the two parts together and PageRank is, in one sentence: the recirculating echo of everyone who points at you, plus a small uniform baseline everyone gets for free. The walker is likely to be where important pages send her; important pages are those the walker tends to be at. The equation is the condition that says this self-referential definition has settled into a stable state.
The circular definition now has a cure. The equation above is a fixed-point equation. Start with any guess for , compute the right-hand side, use that as your new guess, repeat. For any damping factor strictly less than 1, this iteration converges to a unique solution, regardless of starting point. The Markov-chain result that makes this work is more than a century old; Brin and Page's contribution was realising that the web was an instance worth solving.
Why the teleport is essential
Without teleport, a walker can get stuck in a rank sink: a set of pages that link only to each other. The first time she wanders in, she never leaves, and all the probability mass collects there. Damping leaks a little mass back out to the rest of the graph on every step, which is enough to keep the fixed point unique and well-behaved. Brin and Page used , a value chosen empirically for fast convergence on the 1998 web.
Eight pages, fourteen links, and a power iteration that settles in under a second.
Below is a small directed graph, loosely modelled on the eleven-node example in Brin and Page's paper. Each node is a stylised page; each arrow is a link. Step through the power iteration. Watch the ranks start uniform, then concentrate on the well-linked nodes as mass flows along the arrows. Drag the damping slider and watch the rankings rearrange: lower damping flattens the ranks (teleport dominates), higher damping sharpens them (link structure dominates).
Node size and saturation track the current rank. The graph re-lays-out its rankings each iteration as mass flows along the arrows.
Iteration 0 of 56
Uniform start. Every node begins with equal rank.
Power iteration is still moving. Step forward to watch it stabilise.
Ranks (sorted)
On the real web graph of 1998, this same iteration ran on a workstation Brin and Page built out of spare parts, with hard drives stacked in a Lego frame. Convergence on a graph of a hundred million pages took roughly fifty iterations. The whole thing was embarrassingly cacheable; one PageRank vector, one index, one ranking per query.
The name was a pun. PageRank was both a rank for pages, and a rank due to Larry Page. The company formed to run it was incorporated in 1998 and, at the time this lesson is being written, is worth around two trillion dollars.
A directed graph in twenty lines. PageRank in twenty more.
Plain adjacency list. No weights, no capacities, one edge per ordered pair. Self-loops are rejected because they would require special-casing the transition matrix for no conceptual gain.
from dataclasses import dataclass, field
@dataclass
class DirectedGraph:
"""Directed graph with at most one edge in each direction per pair."""
_adj: dict[str, list[str]] = field(default_factory=dict)
def add_edge(self, u: str, v: str) -> None:
if u == v:
raise ValueError(f"Self-loop on {u} is not allowed.")
self._adj.setdefault(u, [])
self._adj.setdefault(v, [])
if v not in self._adj[u]:
self._adj[u].append(v)
def out_neighbours(self, node: str) -> list[str]:
return list(self._adj.get(node, []))
def in_neighbours(self, node: str) -> list[str]:
return [u for u, vs in self._adj.items() if node in vs]
@property
def nodes(self) -> list[str]:
return list(self._adj.keys())
Power iteration against the equation above. The
dangling_mass term
handles nodes with no out-edges by teleporting their mass; this
is the standard fix in the literature and matches what the
transition matrix would prescribe if you wrote it out.
def pagerank(
g: DirectedGraph,
damping: float = 0.85,
tol: float = 1e-8,
max_iter: int = 200,
) -> dict[str, float]:
"""Power iteration with uniform teleport. Returns ranks summing to 1."""
nodes = g.nodes
n = len(nodes)
ranks = {v: 1 / n for v in nodes}
in_ns = {v: g.in_neighbours(v) for v in nodes}
out_d = {v: len(g.out_neighbours(v)) for v in nodes}
dangling = [v for v in nodes if out_d[v] == 0]
for _ in range(max_iter):
dangling_mass = sum(ranks[u] for u in dangling)
new = {}
for v in nodes:
incoming = sum(ranks[u] / out_d[u] for u in in_ns[v])
new[v] = (
damping * (incoming + dangling_mass / n)
+ (1 - damping) / n
)
if sum(abs(new[v] - ranks[v]) for v in nodes) < tol:
return new
ranks = new
raise RuntimeError("PageRank did not converge.") Notebook 09: Random walks and PageRank
Build the web graph. Run PageRank, inspect rankings, watch power iteration converge. Sweep the damping factor. Then build a small knowledge graph and watch personalised PageRank surface the nearest related concepts.
notebooks/09_graph_theory_random_walks.ipynb → The same algorithm, a different teleport vector, and a very different problem.
A large language model, asked a factual question, has two ways to respond. It can continue the pattern of text it has seen, hoping the continuation happens to be correct. Or it can retrieve relevant information from an external source and ground its answer in that. The first approach produces confident confabulation. The second, done well, produces answers with citations.
The retrieval step is a ranking problem. Given a query, or a seed concept, find the most relevant nodes in a knowledge graph where concepts are connected by edges like is-a, causes, cites, or authored-by. Keyword matching is not enough: "entropy" and "information" almost never co-occur in the same sentence but are tightly connected conceptually. What you want is something that notices the shape of the neighbourhood around the seed.
The workhorse is personalised PageRank. Same power iteration, same damping, but instead of teleporting uniformly to any node the walker teleports back to the seed. The long-run distribution is then concentrated in whatever neighbourhood is reachable from the seed by short, dense sequences of edges. It is PageRank with a point of view.
Below is a small knowledge graph drawn from the characters and concepts you have met elsewhere on this site. Click any node to set it as the seed; the graph lights up the concepts most strongly related to it, weighted by the personalised walk. Try Shannon, then Euler, then Turing, and watch the graph interpret your query by reshaping itself around the seed.
Click any node to set it as the seed.
Blue is the seed. Gold intensity shows how strongly each node is surfaced by a damped random walk from the seed.
Seed
Shannon
Personalised PageRank starts every teleport from this node. The result ranks everything else by how related it is to the seed.
Most related
The "GraphRAG" family of retrieval systems, and the knowledge-graph-augmented prompting used in production LLM pipelines, are built out of exactly this primitive. The thing that has changed between 1998 and now is not the mathematics. It is what counts as a node.
Pages, concepts, citations, entities. Nothing about PageRank cares which one.
The web is a directed graph with implicit endorsement semantics: a link is a soft vote of relevance. A knowledge graph is a directed graph with implicit endorsement semantics: an edge is a soft claim of relatedness. From the perspective of a random walker there is no difference. She has no idea whether she is on a 1998 web page about dog grooming or on a 2026 triple asserting that Shannon wrote a paper about entropy. She picks an outgoing edge uniformly, occasionally teleports, and lets the long-run distribution do the ranking for her.
This is the signature move of graph theory. The abstraction does not care what the nodes represent. Euler did not need to know what a bridge was, only that it connected two places. Dijkstra did not need to know what a kilometre felt like, only that it was a non-negative number. Ford and Fulkerson did not need to know what a freight car carried, only that its capacity was finite. Brin and Page did not need to know what a page said, only that it linked. Four algorithms, three centuries, one mathematics.
A parlour puzzle, a café, a classified study, a pair of graduate students.
Königsberg, 1736. Euler throws away the geography of a city and keeps only the way its land masses connect, and accidentally invents a new branch of mathematics.
Amsterdam, 1956. Dijkstra invents shortest paths in a café because he has no paper, and twelve years later every router on the ARPANET is running it.
Santa Monica, 1955 and 1956. One floor apart at RAND, Harris and Ross ask which rails to bomb while Ford and Fulkerson prove that the answer is the same as asking how much can flow.
Stanford, 1998 and beyond. Brin and Page let a bored surfer click random links, and the same construction built Google and now grounds language models in structured knowledge.
Different centuries, different problems, one mathematics of connection.
One conference paper, one book, one patent, and two modern papers that show where this is going.
Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine (WWW7, 1998)
The paper that launched Google. PageRank is Section 2.1. Also worth reading for the matter-of-fact description of their homemade hardware and their frank estimate that ad-funded search engines would have deep conflicts of interest, an observation that has aged interestingly.
Lawrence Page, US Patent 6,285,999: Method for Node Ranking in a Linked Database (filed 1998, granted 2001)
The patent Stanford held for a decade. More precise than the WWW7 paper on the teleport vector, and the only place to find the algorithm described in legal prose. A historical curiosity and an unusually clean piece of technical writing.
Amy Langville and Carl Meyer, Google's PageRank and Beyond (Princeton, 2006)
The textbook on PageRank. Covers the Markov-chain background, convergence proofs, accelerated computation, and the spectrum of the web matrix. The right book if you want to understand why the algorithm works rather than just run it.
Jon Kleinberg, Authoritative Sources in a Hyperlinked Environment (SODA, 1998)
The HITS algorithm, published the same year as PageRank and solving a closely related problem with a different decomposition into hubs and authorities. Reading the two together clarifies what is essential to link-based ranking and what is a specific design choice.
Darren Edge et al., From Local to Global: A Graph RAG Approach to Query-Focused Summarization (Microsoft Research, 2024)
One of the first well-engineered descriptions of using graph-structured retrieval to ground language model output. Personalised-PageRank-style traversal is one of several primitives in the pipeline. A useful snapshot of where this story continues at the time of writing.
Rada Mihalcea and Paul Tarau, TextRank: Bringing Order into Texts (EMNLP, 2004)
An earlier example of the same move: apply PageRank to a graph whose nodes are not web pages. Here the nodes are sentences, and the algorithm extracts summaries. A nice reminder that the knowledge-graph use case is not new, just larger.
The web was a graph, and PageRank read it. A knowledge graph is a graph, and personalised PageRank reads it. Whatever comes next will also be a graph, because nearly everything is.
Graph theory began as a question about seven bridges in a city that no longer exists. It is now, in the most literal sense, the invisible mathematics of the age.