A chain assumes you already know the path. Routing is what you build when you don't.
Picture a support inbox. One message is "where's my refund," the next is "the export button does nothing," the next is "do you ship to Norway." Three different questions, three different backends, and no single linear pipeline that handles all of them without becoming a tangle of if statements pretending to be an agent. The fix is to look at the input first and send it somewhere. Classify, then dispatch. That's routing — the switch statement of an agent, except the switch can be as dumb as a keyword or as smart as a model.
Four dispatchers, four price tags
There isn't one way to route. There are four, and choosing among them is mostly a question of what you're willing to pay per request.
| Mechanism | How it decides | The catch |
|---|---|---|
| Rule-based | if/switch, keyword or regex match | Free and instant — but blind to paraphrase and intent |
| Embedding-based | Embed the query, match it to the nearest route's examples | Fast, meaning-aware, no LLM call — but routes need good example utterances |
| LLM-based | Ask a model to output a category | Handles nuance and messy input — but it's a whole extra model call |
| Learned classifier | A small trained model predicts the route | Cheap and accurate at inference — but you have to collect labels and train |
Notice the trap baked into that table. The most capable router — an LLM reading each query and naming the route — is also the slowest and most expensive, and it's the one people reach for first because it's the easiest to prototype. For a high-volume triage step, paying a full model call just to decide where to send a request is often the silliest line item in the system.
The router you reach for first is usually the wrong one
Before you wire up an LLM router, try the embedding router. You write a handful of example phrases per route, embed them once, and at request time you embed the incoming query and pick the closest match. No generation, no token cost per decision, single-digit milliseconds. semantic-router is the canonical implementation:
from semantic_router import Route, RouteLayer
from semantic_router.encoders import OpenAIEncoder
orders = Route(name="orders", utterances=[
"where is my package", "track my order", "did my order ship",
])
support = Route(name="support", utterances=[
"the app keeps crashing", "I can't log in", "reset my password",
])
rl = RouteLayer(encoder=OpenAIEncoder(), routes=[orders, support])
rl("my parcel never arrived").name # -> "orders", and no LLM was harmed
That's it. The decision cost collapsed from a generation to a vector lookup. You graduate to an LLM router only when the routing decision genuinely needs reasoning the embedding can't capture — overlapping intents, multi-step requests, things where the meaning hinges on context rather than topic. Anthropic's Building Effective Agents frames routing as the pattern for "distinct categories that are better handled separately" — the emphasis is on distinct, because if your routes blur into each other no router will save you.
Routing isn't only between branches — it's between models
There's a second kind of routing that quietly became one of the highest-ROI patterns of the year: routing the same query to different models by difficulty. Easy questions go to a small cheap model; hard ones go to a frontier model. RouteLLM trains exactly this kind of router on human preference data and reports cutting cost by more than half while holding roughly 95% of the strong model's quality. The 2025 follow-ups push further — xRouter trains the orchestration policy with reinforcement learning to balance cost against performance, and MasRouter extends the idea to choosing which agent in a multi-agent system should take a query. Red Hat even moved routing down into the inference-serving layer, deciding before a single token is generated.
The mental shift: routing isn't only "which workflow." It's also "which brain, at what price." Most teams are overpaying because every request hits their best model whether it needs it or not.
The failure mode nobody plans for
A router that picks wrong doesn't fail loudly. It confidently sends a billing question to the troubleshooting flow, which gives a fluent, completely useless answer, and the user never knows a wrong turn was taken. Misrouting cascades — everything downstream inherits the mistake.
So build the route you hope never fires: the low-confidence fallback. When the classifier's top match is weak, don't force a guess — send it to a clarification step, or to a human, or to a general handler that at least won't pretend. An embedding router gives you a similarity score; threshold it. An LLM router can return "unsure"; let it. The teams that get burned are the ones whose router has no way to say I don't know, so it always says something.
Routing earns its place the moment your agent faces inputs that don't share a path. It stops earning it the moment there's really only one sensible thing to do — at which point a router is just an if statement with a model attached, billing you for the privilege.
Leave a Reply
Your email address will not be published.