RAG·Aug 10, 2025·6 minrag llm eval search graph

GraphRAG: Retrieval Over Knowledge Graphs

Ask a normal RAG system "what are the major themes across these 400 board meeting transcripts?" and watch it fail in an instructive way. It retrieves the five chunks most similar to the word "themes," which is nonsense — "themes" doesn't appear near the actual themes — and summarizes those five. The answer is confident, fluent, and drawn from 1.25% of the corpus. It never had a chance, because the question isn't asking find the passage that contains the answer. It's asking understand the whole thing and tell me what's in it. No amount of top-k similarity search answers a question whose answer isn't located in any single chunk.

That's the gap GraphRAG was built for. And it's worth being precise about that gap before you go build a knowledge graph you may not need.

Two kinds of question

Vector RAG is a needle finder. "What was Q3 revenue?" — the answer lives in one place, retrieval's whole job is to locate it. Every technique in this series so far makes the needle easier to find.

Some questions have no needle. "What are the main themes?" "How are these two business units connected?" "Summarize the relationships between the people named here." The answer isn't in the corpus as a retrievable span — it's distributed across it, and you only get it by connecting things that live in different documents. These are global, or multi-hop, or relational questions, and similarity search is structurally the wrong tool. You can't retrieve a connection that no single chunk states.

GraphRAG, the approach Microsoft Research published and open-sourced in 2024, restructures the corpus so those connections become first-class things you can traverse, instead of patterns you hope a single chunk happens to mention.

Building the graph

The expensive, clever part happens at index time, before any query shows up. Instead of (or alongside) chopping documents into chunks and embedding them, you run an LLM over the corpus to extract structure.

Building a knowledge graph: extract entities and relations, detect communities, summarize each
Index-time work: structure the corpus so connections become traversable.

Three moves, each doing work that pays off at query time:

Extract entities and relationships. The LLM reads each chunk and pulls out the nouns that matter — people, organizations, products, concepts — and the relationships between them: Acme acquired Beta, Dr. Chen reports to the CTO, Project X depends on Service Y. Nodes and edges. Do this across the whole corpus and a relationship stated once in document 12 and referenced obliquely in document 200 become the same edge, now explicitly connected. That's the magic: the graph fuses information that was scattered.

Detect communities. A graph of thousands of entities is itself too big to reason over. So you run a community-detection algorithm — the Leiden method is the standard choice — that finds clusters of densely-connected nodes. These clusters tend to correspond to actual topics: a cluster of entities all about one product line, another about one regulatory issue. The algorithm finds structure nobody labeled.

Summarize bottom-up. For each community, an LLM writes a summary. Then it summarizes clusters of communities, building a hierarchy — fine-grained summaries at the bottom, sweeping ones at the top. Now "what are the major themes?" has an actual answer to retrieve: the top-level community summaries are the themes, pre-computed from the entire corpus rather than guessed from five chunks.

Answering, two ways

Microsoft's framing splits queries into global and local, and the split is genuinely useful.

Global search is for the whole-corpus questions. To answer "what are the main themes," the system doesn't search chunks — it reads the community summaries (each already a synthesis of many documents), generates a partial answer from each, and combines them into one. You're effectively map-reducing over a structured digest of everything, which is why it can answer questions that touch the entire dataset. A plain RAG pipeline simply has no equivalent move.

Local search is for the connect-the-dots questions about specific entities. "How is person A related to project B?" starts at those nodes and walks the graph outward, gathering connected entities, their relationships, and the source chunks that mention them. Multi-hop reasoning becomes graph traversal: each hop is an edge, and the chain of edges is the answer. The retrieve-read-retrieve loop from the agentic post does this implicitly; the graph does it explicitly and reliably.

The cost nobody mentions in the demo

I'll be the one to say it: GraphRAG is expensive in a way the polished examples gloss over. Building the graph means running an LLM over every chunk to extract entities, then more LLM calls to summarize every community. On a large corpus that's a serious indexing bill — potentially many times the cost of just embedding everything — and it's not one-and-done. When documents change, you have to re-extract, re-cluster, and re-summarize the affected parts. Keeping a knowledge graph fresh is a real operational burden, not a checkbox.

There's also quality risk concentrated at the extraction step. If the LLM extracts entities inconsistently — "IBM," "I.B.M.," and "International Business Machines" as three separate nodes — your graph fragments and traversal breaks. Entity resolution, the unglamorous work of merging duplicates, is where a lot of GraphRAG projects quietly lose their accuracy. The graph is only as good as the extraction that built it, and extraction is an LLM doing a fuzzy job at scale.

The research community noticed the cost too — later 2024 variants like LazyGraphRAG pushed to defer the expensive summarization until query time, precisely because building the full hierarchy up front is so pricey. That direction tells you the cost is real enough that even its inventors went looking for ways around it.

When it earns the complexity

So here's the decision, stated plainly. Build a graph when your questions are genuinely global or relational — themes, connections, "how does all this fit together," multi-hop chains across documents — and the corpus is large enough and connected enough that those relationships actually span documents. Intelligence analysis, scientific literature, large internal knowledge bases, anything where the value is in the connections between facts more than the facts themselves: this is where GraphRAG pulls clearly ahead, and on whole-corpus sensemaking it isn't close.

Don't build a graph when your users ask needle questions. If 95% of queries are "what's the value of X" lookups, a knowledge graph is a heavy, expensive answer to a problem hybrid search and a reranker already solved for a fraction of the cost and effort. You'll have paid for relationships nobody asks about.

The most honest setups I've seen don't pick. They keep vector RAG for the needle questions and route the rare global question to a graph — which is, once again, the adaptive idea waiting in the next post: the pipeline shouldn't be one thing. It should be the right thing for the question in front of it. GraphRAG is a powerful tool that earns its keep on a specific shape of problem, and an expensive mistake on every other shape.

Leave a Reply

Your email address will not be published.