AiTechWorlds
AiTechWorlds
Complete RAG pipeline — chunking, embedding, vector search, re-ranking, and LangChain implementation with code.
RAG (Retrieval-Augmented Generation) is an architecture that connects an LLM to an external knowledge source at inference time. Instead of relying solely on training data, the model retrieves relevant documents and uses them as context to answer queries.
Core Problem RAG Solves:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64, # overlap preserves context across chunks
separators=["\n\n", "\n", ".", " "]
)
chunks = splitter.split_documents(documents)from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
retriever = vectorstore.as_retriever(
search_type="mmr", # Maximum Marginal Relevance — reduces redundancy
search_kwargs={"k": 5}
)
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o-mini"),
retriever=retriever,
return_source_documents=True
)
result = qa_chain.invoke({"query": "What is the refund policy?"})
print(result["result"])| Strategy | Best For | Chunk Size |
|---|---|---|
| Fixed-size (character) | Simple docs, quick setup | 256–1024 chars |
| Recursive (semantic) | Mixed documents | 512–1024 chars |
| Sentence-level | QA, precise retrieval | 1–5 sentences |
| Semantic chunking | Long form content | Variable |
| Document structure | PDFs, HTML, code | By section/function |
Overlap rule of thumb: 10–20% of chunk size to avoid cutting context at boundaries.
| Model | Dimensions | Best For | Free? |
|---|---|---|---|
text-embedding-3-small | 1,536 | General RAG (OpenAI) | No |
text-embedding-3-large | 3,072 | High accuracy (OpenAI) | No |
nomic-embed-text | 768 | Local / open source | Yes |
bge-m3 | 1,024 | Multilingual | Yes |
all-MiniLM-L6-v2 | 384 | Fast, low resource | Yes |
| Database | Deployment | Scale | Free Tier |
|---|---|---|---|
| Chroma | Local / self-hosted | Small-medium | Yes (local) |
| Pinecone | Cloud managed | Large | Yes (1 index) |
| Qdrant | Self-hosted / cloud | Large | Yes (cloud) |
| Weaviate | Self-hosted / cloud | Large | Yes |
| FAISS | In-memory local | Medium | Yes (library) |
| pgvector | PostgreSQL extension | Medium | Yes |
| Technique | What It Does | When to Use |
|---|---|---|
| HyDE | Generate hypothetical doc before retrieval | Vague/short queries |
| Multi-query retrieval | Generate multiple query variants | Low recall |
| Re-ranking (cross-encoder) | Re-score retrieved docs with more powerful model | Precision-critical |
| Parent document retrieval | Retrieve full section when child chunk matches | Chunking loses context |
| Self-RAG | Model decides when to retrieve | Adaptive pipelines |
| RAPTOR | Hierarchical summarization tree | Very long documents |
| Aspect | Naive RAG | Advanced RAG |
|---|---|---|
| Indexing | Fixed chunks | Hierarchical + metadata |
| Retrieval | Dense vector search only | Hybrid (dense + sparse) |
| Re-ranking | None | Cross-encoder re-ranker |
| Query | Raw user input | Query expansion/rewrite |
| Response | Direct LLM output | Cited, grounded output |
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
bm25 = BM25Retriever.from_documents(chunks, k=5)
vector = vectorstore.as_retriever(search_kwargs={"k": 5})
hybrid = EnsembleRetriever(
retrievers=[bm25, vector],
weights=[0.4, 0.6] # 60% semantic, 40% keyword
)| Metric | Measures | Tool |
|---|---|---|
| Context Recall | How much ground truth is retrieved | RAGAS |
| Context Precision | How relevant retrieved docs are | RAGAS |
| Faithfulness | Is answer grounded in retrieved docs? | RAGAS |
| Answer Relevancy | Does answer address the question? | RAGAS |
Download RAG: Retrieval-Augmented Generation Guide
Get this note + 100s more free on Telegram
Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!
No spam. Leave anytime.