Retrieval-augmented generation (RAG) for working professionals.
RAG gives an AI access to your own data without retraining it. Cheaper than fine-tuning, faster to ship, easier to audit. Default architecture for enterprise AI in 2026.
8 min read
RAG (retrieval-augmented generation) is the architecture that lets a stock AI model answer questions about YOUR documents without expensive retraining. Default for enterprise AI in 2026.
How it works
1. Index: take your documents. Split into chunks. Convert each chunk to a vector. Store in a vector database. 2. Retrieve: when a user asks a question, convert the question to a vector. Find the most similar chunks. 3. Generate: send question + retrieved chunks to the LLM as context. The LLM answers using the chunks as source of truth.
Why RAG instead of fine-tuning
| Question | RAG | Fine-tuning | |---|---|---| | Time to update | Minutes | Hours-days | | Cost | $0.10/M tokens | $1000s+ | | Citation/provenance | Built in | None | | Handles new info | Yes (re-index) | No (frozen) | | Reduces hallucination | Yes | Sometimes |
The 2026 default is RAG. Fine-tune only when you need a different writing style for the model itself.
Where RAG works
- Internal Q&A over policies, contracts, KB - Code search ("how does the auth service handle token expiry?") - Compliance research ("what does FAR 52.224-3 require?") - Sales enablement (reps query product / pricing / competitive docs)
Where RAG fails
- Multi-hop questions ("compare 2023 vs 2025 churn by industry"). RAG retrieves chunks; doesn't compute joins. Use a SQL agent + RAG hybrid. - Questions answered by structure not content ("how many policies do we have?"). Use deterministic queries. - Bad retrievers. If retrieval pulls wrong chunks, LLM confidently answers wrong. Retrieval quality is THE bottleneck.
The 2026 stack
- Vector DB: Pinecone, Weaviate, Postgres+pgvector, Supabase Vector - Embedding model: OpenAI text-embedding-3, Cohere v3, Voyage AI - LLM: Claude, GPT-4o/5, Gemini - Orchestration: LangChain, LlamaIndex, or a 200-line Python script (most production RAG is the third)
How to evaluate
Build a test set of 50-100 questions with known correct answers. Score: 1. Did it cite the right source? 2. Did it answer correctly? 3. Did it refuse when the answer wasn't in the corpus?
Below 80% on all three, retrieval is broken. Above 90%, you're production-ready.
Apache-3 Inc.'s AI Operations Support helps agencies stand up RAG for internal Q&A over policy and SOP corpora.
Related articles
Working with sensitive data inside AI tools.
Most people put data into AI tools they would not put into a public email. Here is the honest version of what is safe and what is not.
5 min read →
What is AI? A plain-English answer for working professionals.
If you skip the marketing and the doom, AI is a small set of practical capabilities that you can use today. Here is the honest version.
7 min read →
No-code automation patterns for working professionals.
You can automate ninety percent of repetitive office work with prompts, scheduled jobs, and a couple of free tools. Here are the patterns that actually work.
6 min read →