10 LangChain Text Splitters: Recursive, Markdown, Code (2026)
A practical guide to all 10 LangChain text splitters — Recursive, Markdown, Code, HTML, Semantic, Token — with comparison table and chunking best practices.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Chunking is the step everyone underestimates. I've seen teams spend weeks optimizing their embedding models and retrieval algorithms while leaving their text splitter at default settings — and then wonder why their RAG system keeps returning irrelevant or cut-off answers.
The truth is that poor chunking creates problems that no amount of retrieval optimization can fix. If a chunk cuts a sentence in half, the embedding captures noise. If a function is split across two chunks, code search breaks down. If you chunk a Markdown document without respecting headers, you lose the structural signal that makes document hierarchies useful.
LangChain ships with a surprisingly complete set of text splitters, each designed for a specific document type or splitting strategy. This guide covers all 10 in practical depth — what each one does, when to use it, and working Python code for each. I'll also include a comparison table and the key decisions that determine chunk quality.
Before diving in, if you're building a full retrieval pipeline around these chunks, the RAG system tutorial covers the storage and retrieval side in detail.
Why Chunking Strategy Matters More Than You Think
Here's a simple mental model: an embedding model compresses a text chunk into a single vector. If the chunk is incoherent — half a sentence, a code snippet ripped from its context, three unrelated paragraphs — the embedding averages out to something that doesn't represent anything clearly.
Coherent chunks produce embeddings that cluster reliably in the vector space. Incoherent chunks produce noise.
Research from LlamaIndex's chunking benchmarks (2024) shows that semantic chunking approaches improve retrieval MRR (Mean Reciprocal Rank) by 12–18% compared to naive fixed-size splitting, especially for long-form documents with varied structure.
The other factor is chunk size. The rule of thumb I use: your chunk should be the smallest unit of text that can answer a question completely on its own. For prose, that's usually 2–4 paragraphs. For code, it's usually one function. For API documentation, it's usually one endpoint.
Comparison Table: All 10 Splitters
| Splitter | Chunk Coherence | Overlap Handling | Code Awareness | Speed | Best For |
|---|---|---|---|---|---|
| RecursiveCharacterTextSplitter | High | Yes | None | Fast | General prose, mixed documents |
| CharacterTextSplitter | Medium | Yes | None | Fastest | Simple text, quick prototyping |
| MarkdownHeaderTextSplitter | Very High | No (header-based) | None | Fast | Markdown docs, wikis |
| MarkdownTextSplitter | High | Yes | None | Fast | Long Markdown without strict headers |
| CodeTextSplitter | Very High | Yes | Full | Fast | Source code (15 languages) |
| HTMLHeaderTextSplitter | Very High | No (tag-based) | None | Fast | HTML docs, web content |
| HTMLSectionSplitter | High | No | None | Fast | HTML with section tags |
| TokenTextSplitter | Medium | Yes | None | Medium | LLM context window management |
| SentenceTransformersTokenTextSplitter | High | Yes | None | Slow | Semantic model alignment |
| SemanticChunker | Very High | Auto | None | Slowest | Long-form unstructured documents |
1. RecursiveCharacterTextSplitter
This is the one I reach for first with any general document. It tries a hierarchy of separators in order: paragraph breaks (\n\n), then line breaks (\n), then sentence endings (. ), then spaces, then individual characters. It only moves to the next separator if a chunk exceeds chunk_size.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
separators=["\n\n", "\n", ". ", " ", ""],
length_function=len # Can also use tiktoken for token-based sizing
)
text = """
LangChain is a framework for building LLM applications.
It provides abstractions for chains, agents, and memory.
The retrieval module supports multiple vector stores including
FAISS, Chroma, Pinecone, and LanceDB. Each has different
performance characteristics depending on dataset size.
Agent frameworks in LangChain support tool use, planning,
and multi-step reasoning patterns.
"""
chunks = splitter.split_text(text)
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1} ({len(chunk)} chars):")
print(chunk)
print("---")
# With documents
from langchain.schema import Document
docs = [Document(page_content=text, metadata={"source": "intro.txt"})]
doc_chunks = splitter.split_documents(docs)
print(f"Created {len(doc_chunks)} chunks")
For token-based sizing (better for LLM context management):
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter
def tiktoken_len(text: str) -> int:
enc = tiktoken.get_encoding("cl100k_base")
return len(enc.encode(text))
splitter = RecursiveCharacterTextSplitter(
chunk_size=400, # tokens, not characters
chunk_overlap=40,
length_function=tiktoken_len
)
Using tiktoken_len instead of len ensures chunks respect actual token counts, which prevents the silent truncation that happens when you embed more tokens than your model's context window.
2. CharacterTextSplitter
The simplest splitter — splits on a single separator character. Useful when you know your document has a predictable structure.
from langchain.text_splitter import CharacterTextSplitter
# Split on double newlines (paragraph boundaries)
splitter = CharacterTextSplitter(
separator="\n\n",
chunk_size=1000,
chunk_overlap=100
)
chunks = splitter.split_text(text)
# Split on custom delimiter (e.g., HR tags in formatted docs)
hr_splitter = CharacterTextSplitter(
separator="---",
chunk_size=800,
chunk_overlap=0 # No overlap when sections are self-contained
)
I use CharacterTextSplitter mostly for preprocessed documents where sections are already clearly delimited. For anything organic, RecursiveCharacterTextSplitter handles edge cases better.
3. MarkdownHeaderTextSplitter
This is the right choice for any Markdown-structured content — documentation, wikis, README files. It splits on header levels and preserves the header hierarchy in chunk metadata.
from langchain.text_splitter import MarkdownHeaderTextSplitter
markdown_text = """
# Introduction
This guide covers LangChain fundamentals.
## Installation
Install with pip:
```bash
pip install langchain
Core Concepts
Chains
Chains connect multiple components together.
Agents
Agents use LLMs to decide which tools to call.
Configuration
Set your API key as an environment variable. """
Define which headers to split on
headers_to_split_on = [ ("#", "h1"), ("##", "h2"), ("###", "h3"), ]
splitter = MarkdownHeaderTextSplitter( headers_to_split_on=headers_to_split_on, strip_headers=False # Keep header text in chunk content )
chunks = splitter.split_text(markdown_text)
for chunk in chunks: print("Content:", chunk.page_content[:100]) print("Metadata:", chunk.metadata) print("---")
Each chunk's metadata includes the header path, so you know exactly where in the document hierarchy each chunk came from. This is invaluable for citation and source attribution in RAG responses.
For longer Markdown documents where header-based chunks are still too large, chain it with `RecursiveCharacterTextSplitter`:
```python
from langchain.text_splitter import (
MarkdownHeaderTextSplitter,
RecursiveCharacterTextSplitter
)
# First split by headers
header_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")]
)
header_chunks = header_splitter.split_text(long_markdown)
# Then split large sections further
char_splitter = RecursiveCharacterTextSplitter(
chunk_size=500, chunk_overlap=50
)
final_chunks = char_splitter.split_documents(header_chunks)
# Metadata from header splitting is preserved!
This two-stage approach preserves structural metadata while keeping chunk sizes manageable.
4. CodeTextSplitter
Code is fundamentally different from prose. The natural unit is a function, method, or class — not a paragraph. CodeTextSplitter uses language-specific separators to respect code structure.
from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
# Python code splitter
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=1000,
chunk_overlap=100
)
python_code = '''
import os
from typing import List
def load_documents(path: str) -> List[str]:
"""Load all text files from a directory."""
documents = []
for filename in os.listdir(path):
if filename.endswith(".txt"):
with open(os.path.join(path, filename)) as f:
documents.append(f.read())
return documents
class DocumentProcessor:
def __init__(self, chunk_size: int = 500):
self.chunk_size = chunk_size
def process(self, documents: List[str]) -> List[str]:
"""Process and chunk documents."""
return [doc[:self.chunk_size] for doc in documents]
def main():
docs = load_documents("./data")
processor = DocumentProcessor(chunk_size=400)
chunks = processor.process(docs)
print(f"Processed {len(chunks)} chunks")
'''
chunks = python_splitter.split_text(python_code)
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:")
print(chunk[:200])
print("---")
# Supported languages
print("Supported:", [lang.value for lang in Language])
# python, js, ts, markdown, latex, html, sol, rust, go, cpp, c, scala, ruby, cobol, lua
The Python separators try to split on class definitions, function definitions, and method definitions first, so you rarely get a chunk that cuts through a function body.
For JavaScript/TypeScript (common in full-stack codebases):
js_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.JS,
chunk_size=800,
chunk_overlap=80
)
ts_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.TS,
chunk_size=800,
chunk_overlap=80
)
For a code search application built on this pattern, the build AI agent with LangChain post shows how code chunks can feed a tool-using agent.
5. HTMLHeaderTextSplitter
For HTML documents — scraped web pages, exported documentation, HTML reports — this splitter preserves the document's heading hierarchy in metadata, similar to MarkdownHeaderTextSplitter.
from langchain.text_splitter import HTMLHeaderTextSplitter
html_content = """
<!DOCTYPE html>
<html>
<body>
<h1>LangChain Documentation</h1>
<p>LangChain is a framework for building LLM applications.</p>
<h2>Getting Started</h2>
<p>Install LangChain with pip install langchain.</p>
<h2>Core Components</h2>
<h3>Chains</h3>
<p>Chains connect LLM calls with other components.</p>
<h3>Agents</h3>
<p>Agents use LLMs to make decisions about tool use.</p>
</body>
</html>
"""
headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
("h3", "Header 3"),
]
splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunks = splitter.split_text(html_content)
for chunk in chunks:
print("Content:", chunk.page_content)
print("Metadata:", chunk.metadata)
print("---")
This splitter is especially useful for web scraping pipelines where you've collected HTML pages and want to preserve navigational context in each chunk's metadata.
6. HTMLSectionSplitter
Similar to HTMLHeaderTextSplitter, but splits on <section>, <article>, and <div> tags in addition to headings. Better for modern HTML where semantic sectioning elements are used.
from langchain_text_splitters import HTMLSectionSplitter
html_with_sections = """
<article>
<section>
<h2>Introduction</h2>
<p>This covers the basics of LangChain.</p>
</section>
<section>
<h2>Advanced Usage</h2>
<p>Advanced patterns include custom chains and agents.</p>
</section>
</article>
"""
splitter = HTMLSectionSplitter(
headers_to_split_on=[("h2", "section_title")]
)
chunks = splitter.split_text(html_with_sections)
7. TokenTextSplitter
When you need precise control over token counts — especially for LLM calls where you're managing context windows — TokenTextSplitter splits based on actual token counts rather than character counts.
from langchain.text_splitter import TokenTextSplitter
# Uses tiktoken under the hood
splitter = TokenTextSplitter(
encoding_name="cl100k_base", # OpenAI's encoding
chunk_size=512, # tokens
chunk_overlap=50 # tokens
)
long_text = "..." * 5000 # Your long document here
chunks = splitter.split_text(long_text)
print(f"Number of chunks: {len(chunks)}")
# Verify token counts
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
for i, chunk in enumerate(chunks[:3]):
token_count = len(enc.encode(chunk))
print(f"Chunk {i+1}: {token_count} tokens")
I use TokenTextSplitter primarily when I'm feeding chunks directly into LLM prompts with strict context limits — for example, when summarizing or classifying chunks in a batch job. For embedding-based retrieval, RecursiveCharacterTextSplitter with a tiktoken_len function is usually cleaner.
8. SentenceTransformersTokenTextSplitter
For projects using sentence-transformers models (BERT-based, not OpenAI), this splitter aligns chunk sizes with the model's tokenizer rather than tiktoken.
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
# Sized for all-MiniLM-L6-v2 (512 token limit)
splitter = SentenceTransformersTokenTextSplitter(
model_name="sentence-transformers/all-MiniLM-L6-v2",
tokens_per_chunk=256,
chunk_overlap=25
)
chunks = splitter.split_text(text)
# Each chunk will fit within the model's context window
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode([c for c in chunks])
print(f"Embedded {len(chunks)} chunks")
This matters because sentence-transformer models have a hard 512-token limit. Chunks that exceed this limit get silently truncated during encoding, which degrades embedding quality. SentenceTransformersTokenTextSplitter prevents that.
For more on open-source embedding models, the Hugging Face transformers tutorial covers the full model selection process.
9. SemanticChunker
This is the most computationally expensive splitter, and also the most intelligent. Instead of splitting on character counts or separators, SemanticChunker uses embedding similarity to find natural break points — places where the topic shifts.
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
splitter = SemanticChunker(
embeddings=embeddings,
breakpoint_threshold_type="percentile", # or "standard_deviation", "interquartile"
breakpoint_threshold_amount=95 # Split at the 95th percentile of similarity drops
)
long_article = """
[Your long, multi-topic document here]
"""
chunks = splitter.split_text(long_article)
print(f"Created {len(chunks)} semantic chunks")
for i, chunk in enumerate(chunks):
print(f"\nChunk {i+1} ({len(chunk)} chars):")
print(chunk[:200])
Three breakpoint strategies:
"percentile": Splits where the similarity drop is in the top N percentile. Good for documents with clear topic shifts."standard_deviation": Splits where similarity drops more than N standard deviations below the mean. Good for consistent documents."interquartile": Uses the IQR method. More robust to outliers.
SemanticChunker is slower because it embeds every sentence to compute similarities. For a 10,000-word document, expect 5–15 seconds and a small embedding cost. Worth it for long-form content where topic coherence matters more than processing speed.
10. LatexTextSplitter
For academic or scientific documents written in LaTeX, the dedicated splitter respects LaTeX structure:
from langchain.text_splitter import LatexTextSplitter
latex_text = r"""
\section{Introduction}
This paper presents a novel approach to ...
\subsection{Background}
Previous work on retrieval-augmented generation ...
\section{Methodology}
We propose the following architecture ...
"""
splitter = LatexTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = splitter.split_text(latex_text)
for chunk in chunks:
print(chunk[:200])
print("---")
Choosing the Right Splitter: Decision Guide
Here's the practical decision flow I follow:
Is your document Markdown?
→ MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter for oversized sections
Is your document HTML?
→ HTMLHeaderTextSplitter or HTMLSectionSplitter based on whether headings or sections dominate
Is your document source code?
→ CodeTextSplitter (via RecursiveCharacterTextSplitter.from_language())
Are you using sentence-transformers (not OpenAI)?
→ SentenceTransformersTokenTextSplitter to respect model token limits
Is token count precision critical (LLM prompts, not just embedding)?
→ TokenTextSplitter
Is your document long-form, unstructured, and topic-varied?
→ SemanticChunker if you can afford the latency/cost, otherwise RecursiveCharacterTextSplitter with a small chunk size
Everything else?
→ RecursiveCharacterTextSplitter with chunk_size=500, chunk_overlap=50
Production Pipeline: Multi-Format Document Ingestion
Real applications have to handle multiple document types simultaneously. Here's a routing pattern:
from pathlib import Path
from typing import List
from langchain.schema import Document
from langchain.text_splitter import (
RecursiveCharacterTextSplitter,
MarkdownHeaderTextSplitter,
TokenTextSplitter,
Language
)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
EXTENSION_TO_SPLITTER = {
".md": "markdown",
".markdown": "markdown",
".py": "python",
".js": "javascript",
".ts": "typescript",
".html": "html",
".htm": "html",
".tex": "latex",
}
def get_splitter_for_file(
file_path: str,
chunk_size: int = 500,
chunk_overlap: int = 50,
use_semantic: bool = False,
embeddings=None
):
"""Return the appropriate text splitter based on file extension."""
ext = Path(file_path).suffix.lower()
splitter_type = EXTENSION_TO_SPLITTER.get(ext, "default")
if splitter_type == "markdown":
return MarkdownHeaderTextSplitter(
headers_to_split_on=[
("#", "h1"), ("##", "h2"), ("###", "h3")
]
)
elif splitter_type == "python":
return RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
elif splitter_type == "javascript":
return RecursiveCharacterTextSplitter.from_language(
language=Language.JS,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
elif splitter_type == "typescript":
return RecursiveCharacterTextSplitter.from_language(
language=Language.TS,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
elif use_semantic and embeddings:
return SemanticChunker(
embeddings=embeddings,
breakpoint_threshold_type="percentile",
breakpoint_threshold_amount=95
)
else:
return RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
def chunk_documents(
documents: List[Document],
chunk_size: int = 500,
chunk_overlap: int = 50,
use_semantic_for_long: bool = False
) -> List[Document]:
"""
Chunk a list of documents using the appropriate splitter
for each file type.
"""
embeddings = OpenAIEmbeddings() if use_semantic_for_long else None
all_chunks = []
for doc in documents:
source = doc.metadata.get("source", "")
# For very short documents, skip chunking
if len(doc.page_content) < 200:
all_chunks.append(doc)
continue
# Use semantic chunking for long unstructured documents
use_semantic = (
use_semantic_for_long
and len(doc.page_content) > 5000
and Path(source).suffix not in EXTENSION_TO_SPLITTER
)
splitter = get_splitter_for_file(
file_path=source,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
use_semantic=use_semantic,
embeddings=embeddings
)
try:
if hasattr(splitter, 'split_documents'):
chunks = splitter.split_documents([doc])
else:
texts = splitter.split_text(doc.page_content)
chunks = [
Document(
page_content=text,
metadata={**doc.metadata, "chunk_index": i}
)
for i, text in enumerate(texts)
]
all_chunks.extend(chunks)
except Exception as e:
print(f"Warning: Chunking failed for {source}: {e}")
# Fall back to default splitter
fallback = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
chunks = fallback.split_documents([doc])
all_chunks.extend(chunks)
print(f"Chunked {len(documents)} documents into {len(all_chunks)} chunks")
return all_chunks
# Usage example
if __name__ == "__main__":
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader(
"./knowledge_base/",
glob="**/*",
show_progress=True
)
documents = loader.load()
chunks = chunk_documents(
documents=documents,
chunk_size=400,
chunk_overlap=40,
use_semantic_for_long=False # Set True for better quality, slower
)
# Group chunks by source type
from collections import Counter
source_types = Counter(
Path(c.metadata.get("source", "")).suffix
for c in chunks
)
print("Chunks by file type:", dict(source_types))
This production pipeline automatically routes each document to the best splitter for its type, with a fallback for unknown formats.
For how these chunks feed into a larger agent system, the post on AI agent memory and planning covers the retrieval-memory connection.
Evaluating Chunk Quality
You can actually measure chunk quality before spending money on embeddings. A few quick checks:
from typing import List
import statistics
def analyze_chunks(chunks: List[str]) -> dict:
"""Quick quality analysis of chunk output."""
lengths = [len(c) for c in chunks]
# Check for very short chunks (likely noise)
short_chunks = [c for c in chunks if len(c) < 50]
# Check for chunks that look cut off (end mid-sentence)
def ends_cleanly(text: str) -> bool:
stripped = text.rstrip()
return (
stripped.endswith('.')
or stripped.endswith('?')
or stripped.endswith('!')
or stripped.endswith('```')
or stripped.endswith('\n')
)
clean_endings = sum(1 for c in chunks if ends_cleanly(c))
return {
"total_chunks": len(chunks),
"avg_length": statistics.mean(lengths),
"median_length": statistics.median(lengths),
"std_dev": statistics.stdev(lengths) if len(lengths) > 1 else 0,
"short_chunks_pct": len(short_chunks) / len(chunks) * 100,
"clean_endings_pct": clean_endings / len(chunks) * 100,
"min_length": min(lengths),
"max_length": max(lengths)
}
# Use it
stats = analyze_chunks([c.page_content for c in chunks])
print(f"Quality stats: {stats}")
# Target: clean_endings_pct > 70%, short_chunks_pct < 5%
Target metrics for a healthy chunk set: clean_endings_pct above 70%, short_chunks_pct below 5%, and std_dev below 200 characters (consistent chunk sizes).
For the embedding and storage step after chunking, see the semantic search tutorial for how chunk quality translates to retrieval performance.
Conclusion
Text splitting is the foundation everything else in your RAG pipeline sits on. Picking the wrong splitter means your embeddings are noisy, your retrieval is imprecise, and your LLM gets confused context — problems that are hard to debug because they're downstream from the root cause.
The choice comes down to document type: Markdown headers for structured docs, CodeTextSplitter for source code, SemanticChunker for long unstructured content, and RecursiveCharacterTextSplitter as the reliable default for everything else. The two-stage pattern — header splitting followed by character splitting for oversized sections — handles the most common real-world case (documentation) particularly well.
Use the analyze_chunks() function to validate your output before ingesting into a vector database. A 15-minute audit of your chunking output can save hours of debugging downstream retrieval failures.
For the next step after splitting, the LangChain tutorial 2025 walks through embedding and storing your chunks for retrieval.
FAQs
Which LangChain text splitter should I use for general documents? RecursiveCharacterTextSplitter is the best default choice for most documents. It tries paragraph breaks first, then sentence breaks, then word breaks, producing coherent chunks that don't cut mid-sentence. Start with chunk_size=500 and chunk_overlap=50 and adjust from there.
What chunk size gives the best RAG retrieval performance? Research from Pinecone and LlamaIndex suggests that 300–600 tokens per chunk works best for most retrieval tasks. Smaller chunks (100–200 tokens) improve precision but lose context. Larger chunks (1000+ tokens) preserve context but dilute relevance scores. For code, the optimal unit is usually one function or class method.
Does chunk overlap actually improve retrieval quality? Yes, but the improvement has diminishing returns. An overlap of 10–15% of chunk size (e.g., 50 tokens for a 400-token chunk) typically prevents information loss at chunk boundaries. Going above 20% overlap wastes storage and embedding cost without proportional quality gains.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.
Build a LangChain Agent with Memory and Tools (Full Example)
Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.