Fine-tuning vs RAG vs prompt engineering: which one and when.

Three different tools for three different problems. Choosing the wrong one wastes months. Here's the decision tree.

Three techniques get conflated in AI procurement: prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. Different problems, different cost profiles. Choosing wrong costs months and budget.

The decision tree

Q1: Is the model wrong about FACTS specific to your organization? - Yes → RAG. Give the model access to your documents. - No → Q2.

Q2: Is the model wrong about STYLE / VOICE / FORMAT specific to your org? - Yes → Fine-tune (try strong system prompts first; if not enough, fine-tune). - No → Q3.

Q3: Is the model wrong because your prompt is unclear or missing structure? - Yes → Prompt engineering.

Why order matters

Most "we need fine-tuning" conversations are actually prompt engineering problems wearing a fine-tuning costume. Fine-tuning costs $5k-$50k for small models, $100k+ for large, and takes weeks. Prompt engineering takes hours, costs nothing.

90% of the time, "the model doesn't understand our business" = "we haven't given it a clear enough prompt." Fix the prompt before the model.

RAG vs fine-tuning: the actual difference

RAG changes what the model has access to. Model unchanged. Adding context at query time. Update corpus → RAG updates instantly.

Fine-tuning changes what the model knows in its weights. Training data baked in. Update knowledge → retrain. Slow, expensive, but produces behavior baked deeper than any prompt.

When to fine-tune: - Need specific writing style/voice prompting can't easily match - Very specific task (e.g., classify legal contracts into 12 categories) where you've maxed prompting - Need cheaper inference (small fine-tuned model can match a large general one on narrow task) - Compliance requires bounded behavior baked in, not promised by prompt

When NOT to fine-tune (use RAG): - Need model to know your facts - Knowledge changes regularly - Need citation/provenance

The actual 2026 default

Build with strong prompt engineering + RAG. Add fine-tuning only with evidence prompt+RAG hits a ceiling. 90% of enterprise AI deployments end at prompt+RAG. No fine-tuning.

Common mistakes

1. Fine-tuning first. $30k on a fine-tune a 50-line system prompt would have matched. 2. Indexing badly. "We have RAG" — but retrieval pulls irrelevant chunks 40% of the time. Quality of retrieval IS the system. 3. Treating prompt engineering as one-shot. Real prompt engineering is iterative: write, test on 50 cases, refine, test, refine.

Apache-3's AI Readiness curriculum spends a full week on this decision tree because getting it wrong burns more agency money than nearly any other AI procurement mistake.