Routing

A chain assumes you already know the path. Routing is what you build when you don't.

Picture a support inbox. One message is "where's my refund," the next is "the export button does nothing," the next is "do you ship to Norway." Three different questions, three different backends, and no single linear pipeline that handles all of them without becoming a tangle of if statements pretending to be an agent. The fix is to look at the input first and send it somewhere. Classify, then dispatch. That's routing — the switch statement of an agent, except the switch can be as dumb as a keyword or as smart as a model.

Four dispatchers, four price tags

There isn't one way to route. There are four, and choosing among them is mostly a question of what you're willing to pay per request.

Mechanism	How it decides	The catch
Rule-based	`if`/`switch`, keyword or regex match	Free and instant — but blind to paraphrase and intent
Embedding-based	Embed the query, match it to the nearest route's examples	Fast, meaning-aware, no LLM call — but routes need good example utterances
LLM-based	Ask a model to output a category	Handles nuance and messy input — but it's a whole extra model call
Learned classifier	A small trained model predicts the route	Cheap and accurate at inference — but you have to collect labels and train

Notice the trap baked into that table. The most capable router — an LLM reading each query and naming the route — is also the slowest and most expensive, and it's the one people reach for first because it's the easiest to prototype. For a high-volume triage step, paying a full model call just to decide where to send a request is often the silliest line item in the system.

A router classifies an incoming query, then sends it to orders, product, support, or a clarify fallback, all merging into one reply — The router picks exactly one branch — and the "unclear" path catches low-confidence cases instead of guessing.

The router you reach for first is usually the wrong one

Before you wire up an LLM router, try the embedding router. You write a handful of example phrases per route, embed them once, and at request time you embed the incoming query and pick the closest match. No generation, no token cost per decision, single-digit milliseconds. semantic-router is the canonical implementation:

from semantic_router import Route, RouteLayer
from semantic_router.encoders import OpenAIEncoder

orders = Route(name="orders", utterances=[
    "where is my package", "track my order", "did my order ship",
])
support = Route(name="support", utterances=[
    "the app keeps crashing", "I can't log in", "reset my password",
])

rl = RouteLayer(encoder=OpenAIEncoder(), routes=[orders, support])
rl("my parcel never arrived").name   # -> "orders", and no LLM was harmed

That's it. The decision cost collapsed from a generation to a vector lookup. You graduate to an LLM router only when the routing decision genuinely needs reasoning the embedding can't capture — overlapping intents, multi-step requests, things where the meaning hinges on context rather than topic. Anthropic's Building Effective Agents frames routing as the pattern for "distinct categories that are better handled separately" — the emphasis is on distinct, because if your routes blur into each other no router will save you.

Routing isn't only between branches — it's between models

There's a second kind of routing that quietly became one of the highest-ROI patterns of the year: routing the same query to different models by difficulty. Easy questions go to a small cheap model; hard ones go to a frontier model. RouteLLM trains exactly this kind of router on human preference data and reports cutting cost by more than half while holding roughly 95% of the strong model's quality. The 2025 follow-ups push further — xRouter trains the orchestration policy with reinforcement learning to balance cost against performance, and MasRouter extends the idea to choosing which agent in a multi-agent system should take a query. Red Hat even moved routing down into the inference-serving layer, deciding before a single token is generated.

The mental shift: routing isn't only "which workflow." It's also "which brain, at what price." Most teams are overpaying because every request hits their best model whether it needs it or not.

The failure mode nobody plans for

A router that picks wrong doesn't fail loudly. It confidently sends a billing question to the troubleshooting flow, which gives a fluent, completely useless answer, and the user never knows a wrong turn was taken. Misrouting cascades — everything downstream inherits the mistake.

So build the route you hope never fires: the low-confidence fallback. When the classifier's top match is weak, don't force a guess — send it to a clarification step, or to a human, or to a general handler that at least won't pretend. An embedding router gives you a similarity score; threshold it. An LLM router can return "unsure"; let it. The teams that get burned are the ones whose router has no way to say I don't know, so it always says something.

Routing earns its place the moment your agent faces inputs that don't share a path. It stops earning it the moment there's really only one sensible thing to do — at which point a router is just an if statement with a model attached, billing you for the privilege.

Four dispatchers, four price tags

The router you reach for first is usually the wrong one

Routing isn't only between branches — it's between models

The failure mode nobody plans for

Leave a Reply