Series
Fine-tuning
Adapting models: LoRA, RLHF, DPO, and when to bother.
Efficient Training with Unsloth
Same GPU, same model, same LoRA config — and the run finishes in a third of the time using most of t
Data Preparation for Fine-Tuning
Nobody demos the data cleaning.
Knowledge and Chain-of-Thought Distillation
You proved the task is solvable.
GRPO, PPO, and KTO with TRL
DPO answered the common case.
DPO vs RLHF
For a couple of years, teaching a model to prefer good answers over bad ones meant running three mod
LoRA vs QLoRA vs DoRA vs Full Fine-Tuning
Four methods, one question: when you sit down to fine-tune, which do you reach for?
QLoRA: Fine-Tuning on One GPU
Try to full-fine-tune an 8B model on a single 24 GB consumer card and you won't get to the first tra
LoRA, Explained
A 7-billion-parameter model has 7 billion knobs.
The Ladder: Prompt, RAG, Fine-tune, Distill
Most fine-tuning projects should have stayed a prompt.