Three years ago, getting a model to reason meant tricking it. You appended "let's think step by step" and watched accuracy jump, half-amazed it worked. By early 2026 that trick is built into the models themselves — the "thinking" model variants that burn extra compute deliberating before they answer, no magic phrase required.
That shift matters, but it didn't make the old techniques obsolete. It changed where you apply them. Reasoning used to be something you coaxed out of a prompt; now part of it lives in the model and part of it lives in the scaffolding you build around it. Knowing which part is which is the difference between an agent that thinks and one that just spends tokens looking thoughtful.
The base move: make thinking visible
Chain-of-Thought was the unlock, and the insight behind it is almost embarrassingly simple. A model asked for an answer commits to the first token immediately — no room to work. Ask it to show its reasoning first, and those intermediate tokens become a scratchpad the final answer can stand on. The model isn't smarter; it just gave itself room to think out loud before committing.
It works because of how these models generate: each token conditions on the ones before it, so a chain of reasoning tokens literally builds better context for the answer token. For the trained "thinking" models, this happens internally — they've been post-trained to produce long reasoning traces before answering. The lesson generalizes either way: hard problems need intermediate steps, whether you prompt for them or the model produces them on its own.
But CoT is a single chain. It commits to one line of reasoning, and if step two goes wrong, every step after inherits the mistake. Everything more advanced is a different answer to the same question: what do you do about that single fragile chain?
Three answers to the fragile chain
Self-Consistency. Don't trust one chain — sample several, each with a bit of randomness, and take the majority answer. The intuition: there are many wrong paths but they disagree with each other, while correct reasoning tends to converge. If seven of ten independent chains land on the same answer, that agreement is a real signal. The cost is literal — you're paying for ten chains instead of one — but for problems where being right matters more than being cheap, it's one of the most reliable accuracy bumps there is.
Tree of Thoughts. Instead of committing to one chain, branch. At each step generate several possible next thoughts, evaluate how promising each looks, and explore the good ones while pruning the dead ends — search, basically, over reasoning steps. ToT shines on problems with real branching structure: puzzles, planning, anything where you need to look ahead and back out of a bad path. It's also heavy — you're running an evaluation at every node — so it's overkill for anything a straight chain solves.
ReAct. The other two reason in a closed room. ReAct opens a door to the world. It interleaves reasoning with acting: think, take an action (call a tool, run a search), observe the result, think again with that new fact in hand. This is the one that turns a reasoner into an agent, because pure reasoning can't tell you today's stock price or whether a file exists — it can only reason over what it already knows. ReAct lets reasoning and reality correct each other in a loop, which is why it underpins most tool-using agents you'll actually build.
Picking one without overthinking it
The instinct after reading the papers is to reach for the fanciest technique. Resist it. These have a cost gradient, and you climb it only when forced.
- Easy, knowledge-based question? Just answer. Modern models don't need CoT for "what's the capital of France," and forcing it wastes tokens.
- Genuinely hard reasoning — math, logic, multi-step deduction? CoT, or a thinking model, which is CoT baked in.
- Right answer matters a lot and you can afford it? Self-Consistency on top — sample and vote.
- The problem needs exploration and backtracking? Tree of Thoughts, knowing you're paying for the search.
- The agent needs information or actions from outside itself? ReAct. This is the default for tool-using agents, full stop.
The mistake I see most is reaching for Tree of Thoughts on a problem a single chain handles fine — paying for a search tree to answer a question with one obvious path. Match the machinery to the problem's actual shape, not to how sophisticated you want to look.
The contrarian bit
Now the thing the technique catalog won't tell you: more reasoning is not free, and it is not always better.
Every one of these spends compute and latency to maybe buy accuracy, and the return curve flattens fast. There's a real failure mode — overthinking — where a model handed a trivial question and told to reason at length talks itself out of the correct first instinct, second-guessing into a worse answer. The thinking models make this concrete: point one at "what's 2+2" and watch it burn a thousand tokens deliberating, latency and cost for nothing. The skill isn't applying maximum reasoning. It's calibrating reasoning to difficulty — cheap and fast on the easy stuff, heavy machinery only where the problem genuinely demands it.
Which is the same lesson the whole field keeps relearning in different clothes. The advance of the last few years wasn't a model that always thinks harder. It was getting a sense for when hard thinking pays — and these techniques are the dials, not the destination. A reasoning agent that reasons about everything is just a slow, expensive one. The good ones know when to stop and answer.
Leave a Reply
Your email address will not be published.