Retrieval-augmented generation (RAG) for working professionals.

RAG gives an AI access to your own data without retraining it. Cheaper than fine-tuning, faster to ship, easier to audit. Default architecture for enterprise AI in 2026.

RAG (retrieval-augmented generation) is the architecture that lets a stock AI model answer questions about YOUR documents without expensive retraining. Default for enterprise AI in 2026.

How it works

1. Index: take your documents. Split into chunks. Convert each chunk to a vector. Store in a vector database. 2. Retrieve: when a user asks a question, convert the question to a vector. Find the most similar chunks. 3. Generate: send question + retrieved chunks to the LLM as context. The LLM answers using the chunks as source of truth.

Why RAG instead of fine-tuning

| Question | RAG | Fine-tuning | |---|---|---| | Time to update | Minutes | Hours-days | | Cost | $0.10/M tokens | $1000s+ | | Citation/provenance | Built in | None | | Handles new info | Yes (re-index) | No (frozen) | | Reduces hallucination | Yes | Sometimes |

The 2026 default is RAG. Fine-tune only when you need a different writing style for the model itself.

Where RAG works

- Internal Q&A over policies, contracts, KB - Code search ("how does the auth service handle token expiry?") - Compliance research ("what does FAR 52.224-3 require?") - Sales enablement (reps query product / pricing / competitive docs)

Where RAG fails

- Multi-hop questions ("compare 2023 vs 2025 churn by industry"). RAG retrieves chunks; doesn't compute joins. Use a SQL agent + RAG hybrid. - Questions answered by structure not content ("how many policies do we have?"). Use deterministic queries. - Bad retrievers. If retrieval pulls wrong chunks, LLM confidently answers wrong. Retrieval quality is THE bottleneck.

The 2026 stack

- Vector DB: Pinecone, Weaviate, Postgres+pgvector, Supabase Vector - Embedding model: OpenAI text-embedding-3, Cohere v3, Voyage AI - LLM: Claude, GPT-4o/5, Gemini - Orchestration: LangChain, LlamaIndex, or a 200-line Python script (most production RAG is the third)

How to evaluate

Build a test set of 50-100 questions with known correct answers. Score: 1. Did it cite the right source? 2. Did it answer correctly? 3. Did it refuse when the answer wasn't in the corpus?

Below 80% on all three, retrieval is broken. Above 90%, you're production-ready.

Apache-3 Inc.'s AI Operations Support helps agencies stand up RAG for internal Q&A over policy and SOP corpora.