What is RAG (Retrieval-Augmented Generation)?
RAG combines a language model with a retrieval system, letting the AI search a knowledge base before answering — reducing hallucinations and keeping responses up to date.
TL;DR: RAG combines a language model with a retrieval system, letting the AI search a knowledge base before answering — reducing hallucinations and keeping responses up to date.
The Problem RAG Solves
Standard LLMs have a fixed training cutoff. Ask ChatGPT about last week's news and it either confabulates or says it doesn't know. RAG patches this by retrieving real documents at query time.
How RAG Works (3 Steps)
1. Index: Documents are chunked and converted to vector embeddings stored in a vector database. 2. Retrieve: Your question is also embedded, and the nearest document chunks are fetched. 3. Generate: The LLM receives your question + the retrieved chunks as context and writes the answer.
RAG vs Fine-tuning
Fine-tuning bakes knowledge into model weights — expensive and static. RAG keeps knowledge external and updatable. Use RAG when your data changes frequently; fine-tune when you need a specific style or format the model must produce consistently.
Real-World RAG Examples
Perplexity.ai uses RAG to search the web before answering. GitHub Copilot Enterprise uses RAG over your company's private codebase. Notion AI uses RAG on your own workspace documents.