Four methods, one question: when you sit down to fine-tune, which do you reach for? The marketing answer is "it depends." The useful answer is a short decision and a clear sense of what each one is trading away. Let me give you both.
The four aren't really rivals on equal footing. Three of them — LoRA, QLoRA, DoRA — are variations on the same idea (freeze the base, train a small update), and full fine-tuning is the heavyweight baseline they're all trying to approximate cheaply. So the real question is two questions: do I need full fine-tuning at all, and if not, which of the cheap three.
The lineup, briefly
Full fine-tuning updates every parameter. Maximum capacity, maximum cost. You need memory for the weights, the gradients, and two optimizer copies — call it 4× the model size in 16-bit before you even count activations. It's the gold standard for quality when the change is large and you have the hardware, and it's overkill for almost everything else.
LoRA freezes the base and trains two low-rank matrices whose product is the weight update. Roughly 0.1–1% of the parameters train. Tiny adapters, cheap memory, mergeable at the end for zero inference cost. This is the default.
QLoRA is LoRA with the frozen base stored in 4 bits. Same adapters, same training loop, a fraction of the resident memory. Slightly slower per step because the base is dequantized on the fly. This is the default when memory is tight.
DoRA splits each weight into two parts — its magnitude (how big) and its direction (which way it points) — and applies a LoRA-style low-rank update only to the direction, while learning the magnitude separately. The 2024 paper's argument is that full fine-tuning and LoRA change weights in measurably different patterns, and decoupling magnitude from direction lets a low-rank method behave more like full fine-tuning. In practice it tends to close part of the quality gap with LoRA, especially at low ranks, for a small extra cost.
The honest comparison
| Full FT | LoRA | QLoRA | DoRA | |
|---|---|---|---|---|
| Trainable params | 100% | ~0.1–1% | ~0.1–1% | ~0.1–1% + magnitudes |
| Base precision | 16-bit | 16-bit | 4-bit | 16-bit (or 4-bit) |
| Relative memory | very high | low | lowest | low |
| Per-step speed | baseline | fast | slower (dequant) | slightly slower than LoRA |
| Quality ceiling | highest | high | high | high, often > LoRA at low rank |
| Mergeable to base | n/a | yes | yes | yes |
| Ship as small adapter | no | yes | yes | yes |
The columns that don't move are as telling as the ones that do. All three PEFT methods produce a small, mergeable adapter; all three train a sliver of the parameters. They differ on which resource they spend — QLoRA buys memory with compute, DoRA buys a little quality with a little compute and bookkeeping — not on the basic shape of the deal.
How I'd actually choose
Start at LoRA. Not because it's always best, but because it's the cheapest thing that's usually good enough, and you want a baseline number before you spend more. Train it, measure it against a real eval, and let the result push you to a different rung.
If you ran out of memory before quality was even the question — go to QLoRA. It's a one-config change and it's the reason you can fine-tune a 70B-class model on a single big card at all. Accept the slightly slower steps; you're trading time for the ability to run at all.
If LoRA trained fine but the quality isn't there, and you suspect the low-rank update itself is the limit — try DoRA before you jump to full fine-tuning. In PEFT it's a single flag on the same config, so the experiment is nearly free, and at low ranks it's where DoRA most often earns its keep.
from peft import LoraConfig
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
use_dora=True, # the entire difference between LoRA and DoRA here
task_type="CAUSAL_LM",
)
If you've exhausted the adapter methods and the task genuinely needs more capacity than a low-rank update can carry — a large domain shift, a new modality of behavior, a model you're effectively re-specializing — and you have the GPUs, then full fine-tuning is the right tool and you should stop apologizing for using it. The mistake isn't doing full fine-tuning. The mistake is starting there.
The trap in the middle
There's a tempting move I'd warn against: cranking LoRA's rank higher and higher to chase full-fine-tuning quality. Past a point you're spending the memory you were trying to save and you've left the regime where low-rank is a good fit, and you'd have been better off with DoRA at a sane rank or full fine-tuning outright. High rank is not a free dial toward quality; it's a signal that the method might be wrong for the change you're making.
So: four methods, but really one ladder of effort with two side-doors for memory and for quality. LoRA is the floor you measure from. QLoRA is the door you take when the card is too small. DoRA is the door you take when the rank is too low. Full fine-tuning is the room at the top you enter only after the cheaper doors are closed. Pick by the constraint that's actually binding you — not by whichever acronym shipped most recently.
Leave a Reply
Your email address will not be published.