Series
Gen AI Foundations
Transformers, attention, prompting, and core concepts.
Cost and Latency Engineering
The cheapest, fastest token is the one you never generate.
LLM-as-Judge
Using one language model to grade another feels like asking the fox to audit the henhouse and trust
Custom Evals Are the Moat
Your model scores 89 on MMLU.
Multimodal by Default
"A picture is worth a thousand words" is wrong by about an order of magnitude.
Why LLMs Hallucinate
Hallucination is not the model malfunctioning.
Structured Output and Constrained Decoding
There are two ways to get JSON out of a language model.
Context Windows and Lost-in-the-Middle
You bought a model with a giant context window.
The Transformer, Intuitively
Most explanations of the transformer open with a wall of matrices.
How LLMs Actually Generate Text
A language model does not write a sentence.