7 LangChain Callbacks for Token Usage and Cost Tracking
Track OpenAI API spend with LangChain callbacks — get_openai_callback, custom cost trackers, per-chain breakdowns, budget alerts, and monthly cost estimation.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Token costs have a way of sneaking up on you. A RAG pipeline looks cheap in testing — a few cents per query — and then you deploy it, the volume ramps up, and your end-of-month bill arrives. Or you build an autonomous agent with a loop, forget to add a termination condition, and watch it make 200 API calls before you notice.
LangChain's callback system gives you the hooks to track every token that flows through your chains. This guide covers seven approaches: from the one-liner built-in tracker to production-grade systems with per-user cost attribution, budget alerts, and database logging.
Understanding the Callback System
Before the code: LangChain callbacks fire at specific lifecycle events. The ones relevant to cost tracking are:
on_llm_start— fires with the prompt before the API callon_llm_end— fires with the response including token usageon_chain_start/on_chain_end— fires for each chain stepon_tool_start/on_tool_end— fires when agents use tools
You can attach callbacks at three levels:
- Global — on the
ChatOpenAIinstance (affects all calls from that LLM) - Chain — passed to
.invoke()as{"callbacks": [...]} - Request —
RunnableConfigpassed at invocation time
Understanding which level is correct for your use case saves a lot of debugging.
Callback 1: get_openai_callback (Built-in, Zero Setup)
For quick cost checks during development, this is the fastest approach:
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template(
"Write a brief analysis of {topic} in under 100 words."
)
chain = prompt | llm | StrOutputParser()
with get_openai_callback() as cb:
result = chain.invoke({"topic": "serverless computing"})
print("Response:", result[:150])
# Token counts and cost from the API response
print(f"\nToken Usage:")
print(f" Prompt tokens: {cb.prompt_tokens:,}")
print(f" Completion tokens: {cb.completion_tokens:,}")
print(f" Total tokens: {cb.total_tokens:,}")
print(f" Total cost: ${cb.total_cost:.6f}")
The context manager accumulates totals across all calls within the block:
with get_openai_callback() as cb:
# Multiple calls — all tracked together
result1 = chain.invoke({"topic": "machine learning"})
result2 = chain.invoke({"topic": "deep learning"})
result3 = chain.invoke({"topic": "reinforcement learning"})
print(f"3 calls total: {cb.total_tokens:,} tokens = ${cb.total_cost:.6f}")
print(f"Average per call: {cb.total_tokens / 3:.0f} tokens")
Limitation: It only works with OpenAI and Azure OpenAI models. For other providers, you need the approaches below.
Callback 2: Custom Per-Model Cost Tracking Callback
A custom callback gives you cost tracking for any LLM provider and full control over what you log:
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time
import json
# Pricing table — update when providers change pricing
MODEL_PRICING = {
"gpt-4o": {
"prompt": 0.0025 / 1000, # $2.50 per 1M tokens
"completion": 0.010 / 1000 # $10 per 1M tokens
},
"gpt-4o-mini": {
"prompt": 0.00015 / 1000, # $0.15 per 1M tokens
"completion": 0.0006 / 1000 # $0.60 per 1M tokens
},
"gpt-3.5-turbo": {
"prompt": 0.0005 / 1000, # $0.50 per 1M tokens
"completion": 0.0015 / 1000 # $1.50 per 1M tokens
},
"text-embedding-3-small": {
"prompt": 0.00002 / 1000, # $0.02 per 1M tokens
"completion": 0.0
},
"text-embedding-3-large": {
"prompt": 0.00013 / 1000, # $0.13 per 1M tokens
"completion": 0.0
},
"claude-3-5-sonnet": {
"prompt": 0.003 / 1000, # $3 per 1M tokens
"completion": 0.015 / 1000 # $15 per 1M tokens
}
}
class CostTrackingCallback(BaseCallbackHandler):
"""Track token usage and compute cost across all LLM calls."""
def __init__(self, name: str = "default"):
super().__init__()
self.name = name
self.calls = []
self.total_prompt_tokens = 0
self.total_completion_tokens = 0
self.total_cost = 0.0
self._start_times = {}
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs
) -> None:
run_id = kwargs.get("run_id", "unknown")
self._start_times[str(run_id)] = time.time()
def on_llm_end(self, response: LLMResult, **kwargs) -> None:
run_id = str(kwargs.get("run_id", "unknown"))
elapsed = time.time() - self._start_times.pop(run_id, time.time())
# Extract model name from response metadata
model = "unknown"
if response.llm_output and "model_name" in response.llm_output:
model = response.llm_output["model_name"]
# Extract token usage
usage = {}
if response.llm_output and "token_usage" in response.llm_output:
usage = response.llm_output["token_usage"]
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
total_tokens = usage.get("total_tokens", prompt_tokens + completion_tokens)
# Compute cost
pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
call_cost = (
prompt_tokens * pricing["prompt"] +
completion_tokens * pricing["completion"]
)
# Record this call
call_record = {
"model": model,
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": total_tokens,
"cost": call_cost,
"latency_seconds": round(elapsed, 3),
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
}
self.calls.append(call_record)
# Update totals
self.total_prompt_tokens += prompt_tokens
self.total_completion_tokens += completion_tokens
self.total_cost += call_cost
@property
def summary(self) -> dict:
return {
"tracker_name": self.name,
"total_calls": len(self.calls),
"total_prompt_tokens": self.total_prompt_tokens,
"total_completion_tokens": self.total_completion_tokens,
"total_tokens": self.total_prompt_tokens + self.total_completion_tokens,
"total_cost_usd": round(self.total_cost, 6),
"avg_cost_per_call": round(self.total_cost / len(self.calls), 6) if self.calls else 0.0
}
def reset(self):
self.calls.clear()
self.total_prompt_tokens = 0
self.total_completion_tokens = 0
self.total_cost = 0.0
Using the callback:
tracker = CostTrackingCallback(name="content_pipeline")
llm_with_tracker = ChatOpenAI(
model="gpt-4o-mini",
callbacks=[tracker]
)
chain = (
ChatPromptTemplate.from_template("Explain {concept} in simple terms.")
| llm_with_tracker
| StrOutputParser()
)
for concept in ["neural networks", "transformers", "reinforcement learning"]:
result = chain.invoke({"concept": concept})
print(f"{concept}: {result[:80]}...")
print("\nCost Summary:")
print(json.dumps(tracker.summary, indent=2))
Callback 3: Per-Request Cost Attribution
In multi-user applications, you need to know which user incurred which costs:
from langchain_core.runnables import RunnableConfig
from typing import Dict
import threading
class PerUserCostTracker(BaseCallbackHandler):
"""Thread-safe per-user cost tracking."""
def __init__(self):
super().__init__()
self._lock = threading.Lock()
self._user_costs: Dict[str, dict] = {}
self._call_contexts = {}
def on_llm_start(self, serialized, prompts, **kwargs):
run_id = str(kwargs.get("run_id", ""))
# User ID comes from metadata in RunnableConfig
user_id = kwargs.get("metadata", {}).get("user_id", "anonymous")
self._call_contexts[run_id] = {"user_id": user_id, "start": time.time()}
def on_llm_end(self, response: LLMResult, **kwargs):
run_id = str(kwargs.get("run_id", ""))
context = self._call_contexts.pop(run_id, {})
user_id = context.get("user_id", "anonymous")
usage = {}
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
model = response.llm_output.get("model_name", "unknown") if response.llm_output else "unknown"
pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
call_cost = (
usage.get("prompt_tokens", 0) * pricing["prompt"] +
usage.get("completion_tokens", 0) * pricing["completion"]
)
with self._lock:
if user_id not in self._user_costs:
self._user_costs[user_id] = {"calls": 0, "tokens": 0, "cost": 0.0}
self._user_costs[user_id]["calls"] += 1
self._user_costs[user_id]["tokens"] += usage.get("total_tokens", 0)
self._user_costs[user_id]["cost"] += call_cost
def get_user_cost(self, user_id: str) -> dict:
with self._lock:
return self._user_costs.get(user_id, {"calls": 0, "tokens": 0, "cost": 0.0})
def get_all_user_costs(self) -> dict:
with self._lock:
return dict(self._user_costs)
# Attach to LLM globally
user_tracker = PerUserCostTracker()
tracked_llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[user_tracker])
chain = ChatPromptTemplate.from_template("Answer: {q}") | tracked_llm | StrOutputParser()
# Simulate multiple users
for user_id, question in [
("user_alice", "What is machine learning?"),
("user_bob", "Explain neural networks in detail with several examples please"),
("user_alice", "What is deep learning?"),
("user_charlie", "Brief overview of AI"),
]:
config = RunnableConfig(metadata={"user_id": user_id})
chain.invoke({"q": question}, config=config)
print("Per-user costs:")
for uid, costs in user_tracker.get_all_user_costs().items():
print(f" {uid}: {costs['calls']} calls, {costs['tokens']:,} tokens, ${costs['cost']:.6f}")
Callback 4: Budget Alert Callback
Stop an agent before it blows through a cost limit:
class BudgetAlertCallback(BaseCallbackHandler):
"""Raise an exception when spending exceeds a budget threshold."""
def __init__(self, budget_usd: float, alert_at_percent: float = 0.8):
super().__init__()
self.budget = budget_usd
self.alert_threshold = budget_usd * alert_at_percent
self.spent = 0.0
self.alerts_sent = []
def on_llm_end(self, response: LLMResult, **kwargs):
if not response.llm_output:
return
usage = response.llm_output.get("token_usage", {})
model = response.llm_output.get("model_name", "gpt-4o-mini")
pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})
call_cost = (
usage.get("prompt_tokens", 0) * pricing["prompt"] +
usage.get("completion_tokens", 0) * pricing["completion"]
)
self.spent += call_cost
# Alert at threshold
if self.spent >= self.alert_threshold and not self.alerts_sent:
alert_msg = (
f"BUDGET ALERT: Spent ${self.spent:.4f} of ${self.budget:.2f} budget "
f"({self.spent/self.budget*100:.1f}%)"
)
self.alerts_sent.append(alert_msg)
print(f"WARNING: {alert_msg}")
# Hard stop at budget
if self.spent >= self.budget:
raise RuntimeError(
f"Budget exceeded: spent ${self.spent:.4f} against ${self.budget:.2f} limit. "
"Halting execution."
)
# Usage with a tight budget for testing
budget_tracker = BudgetAlertCallback(budget_usd=0.01, alert_at_percent=0.5)
try:
expensive_llm = ChatOpenAI(model="gpt-4o", callbacks=[budget_tracker])
chain = ChatPromptTemplate.from_template("{q}") | expensive_llm | StrOutputParser()
for question in ["Explain AI", "Explain ML", "Explain DL", "Explain RL", "Explain NLP"]:
result = chain.invoke({"q": question})
print(f"Q: {question} | Spent so far: ${budget_tracker.spent:.6f}")
except RuntimeError as e:
print(f"Stopped: {e}")
Callback 5: Structured Logging to Database
For production monitoring, push cost events to a database:
import sqlite3
from dataclasses import dataclass
import threading
class DatabaseCostLogger(BaseCallbackHandler):
"""Log token usage to SQLite for analysis and billing."""
def __init__(self, db_path: str = "llm_costs.db"):
super().__init__()
self.db_path = db_path
self._local = threading.local()
self._setup_db()
self._pending_starts = {}
def _get_conn(self):
if not hasattr(self._local, "conn"):
self._local.conn = sqlite3.connect(self.db_path)
return self._local.conn
def _setup_db(self):
conn = sqlite3.connect(self.db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS llm_calls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
model TEXT,
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,
cost_usd REAL,
latency_seconds REAL,
user_id TEXT,
chain_name TEXT,
session_id TEXT
)
""")
conn.commit()
conn.close()
def on_llm_start(self, serialized, prompts, **kwargs):
run_id = str(kwargs.get("run_id", ""))
self._pending_starts[run_id] = {
"start_time": time.time(),
"user_id": kwargs.get("metadata", {}).get("user_id", "system"),
"chain_name": kwargs.get("metadata", {}).get("chain_name", "unknown"),
"session_id": kwargs.get("metadata", {}).get("session_id", ""),
}
def on_llm_end(self, response: LLMResult, **kwargs):
run_id = str(kwargs.get("run_id", ""))
pending = self._pending_starts.pop(run_id, {})
elapsed = time.time() - pending.get("start_time", time.time())
usage = {}
model = "unknown"
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
model = response.llm_output.get("model_name", "unknown")
pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
cost = (
usage.get("prompt_tokens", 0) * pricing["prompt"] +
usage.get("completion_tokens", 0) * pricing["completion"]
)
conn = self._get_conn()
conn.execute("""
INSERT INTO llm_calls
(timestamp, model, prompt_tokens, completion_tokens, total_tokens,
cost_usd, latency_seconds, user_id, chain_name, session_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
model,
usage.get("prompt_tokens", 0),
usage.get("completion_tokens", 0),
usage.get("total_tokens", 0),
round(cost, 8),
round(elapsed, 3),
pending.get("user_id", "system"),
pending.get("chain_name", "unknown"),
pending.get("session_id", "")
))
conn.commit()
def get_daily_report(self) -> dict:
conn = self._get_conn()
today = time.strftime("%Y-%m-%d")
row = conn.execute("""
SELECT
COUNT(*) as calls,
SUM(total_tokens) as tokens,
SUM(cost_usd) as total_cost,
AVG(latency_seconds) as avg_latency
FROM llm_calls
WHERE timestamp LIKE ?
""", (f"{today}%",)).fetchone()
return {
"date": today,
"calls": row[0] or 0,
"total_tokens": row[1] or 0,
"total_cost_usd": round(row[2] or 0, 6),
"avg_latency_seconds": round(row[3] or 0, 3)
}
Callback 6: Chain-Level Cost Breakdown
When you have a complex multi-step chain, knowing which step costs the most is essential:
from langchain_core.callbacks import BaseCallbackHandler
from collections import defaultdict
class ChainCostBreakdown(BaseCallbackHandler):
"""Track costs per chain step for pipeline optimization."""
def __init__(self):
super().__init__()
self.step_costs = defaultdict(lambda: {"calls": 0, "tokens": 0, "cost": 0.0})
self._active_chains = {}
def on_chain_start(self, serialized, inputs, **kwargs):
run_id = str(kwargs.get("run_id", ""))
chain_name = serialized.get("name", serialized.get("id", ["unknown"])[-1])
self._active_chains[run_id] = chain_name
def on_llm_end(self, response: LLMResult, **kwargs):
parent_run_id = str(kwargs.get("parent_run_id", ""))
chain_name = self._active_chains.get(parent_run_id, "root_llm")
usage = {}
model = "unknown"
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
model = response.llm_output.get("model_name", "unknown")
pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})
cost = (
usage.get("prompt_tokens", 0) * pricing["prompt"] +
usage.get("completion_tokens", 0) * pricing["completion"]
)
self.step_costs[chain_name]["calls"] += 1
self.step_costs[chain_name]["tokens"] += usage.get("total_tokens", 0)
self.step_costs[chain_name]["cost"] += cost
def print_breakdown(self):
print("\nChain Cost Breakdown:")
print(f"{'Step':<30} {'Calls':>6} {'Tokens':>10} {'Cost':>12}")
print("-" * 62)
total_cost = sum(v["cost"] for v in self.step_costs.values())
for step, data in sorted(self.step_costs.items(), key=lambda x: -x[1]["cost"]):
pct = data["cost"] / total_cost * 100 if total_cost > 0 else 0
print(f"{step:<30} {data['calls']:>6} {data['tokens']:>10,} "
f"${data['cost']:>10.6f} ({pct:.1f}%)")
print("-" * 62)
print(f"{'TOTAL':<30} {'':>6} {sum(v['tokens'] for v in self.step_costs.values()):>10,} "
f"${total_cost:>10.6f}")
Callback 7: Monthly Cost Projection
Project your monthly bill based on recent usage patterns:
import sqlite3
from datetime import datetime, timedelta
def project_monthly_cost(db_path: str = "llm_costs.db") -> dict:
"""Project monthly cost based on the last 7 days of usage."""
conn = sqlite3.connect(db_path)
seven_days_ago = (datetime.utcnow() - timedelta(days=7)).strftime("%Y-%m-%dT%H:%M:%SZ")
daily_data = conn.execute("""
SELECT
DATE(timestamp) as date,
SUM(cost_usd) as daily_cost,
SUM(total_tokens) as daily_tokens,
COUNT(*) as daily_calls
FROM llm_calls
WHERE timestamp > ?
GROUP BY DATE(timestamp)
ORDER BY date
""", (seven_days_ago,)).fetchall()
conn.close()
if not daily_data:
return {"error": "No usage data found in the last 7 days"}
daily_costs = [row[1] for row in daily_data]
avg_daily_cost = sum(daily_costs) / len(daily_costs)
projected_monthly = avg_daily_cost * 30
return {
"days_analyzed": len(daily_data),
"avg_daily_cost_usd": round(avg_daily_cost, 4),
"total_7d_cost_usd": round(sum(daily_costs), 4),
"projected_monthly_usd": round(projected_monthly, 2),
"daily_breakdown": [
{"date": row[0], "cost": round(row[1], 4),
"tokens": row[2], "calls": row[3]}
for row in daily_data
]
}
projection = project_monthly_cost()
print(f"Projected monthly cost: ${projection.get('projected_monthly_usd', 0):.2f}")
OpenAI Model Pricing Reference (2026)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| gpt-4o | $2.50 | $10.00 | Complex reasoning, production |
| gpt-4o-mini | $0.15 | $0.60 | Classification, fast tasks |
| gpt-3.5-turbo | $0.50 | $1.50 | Simple Q&A, legacy apps |
| text-embedding-3-small | $0.02 | — | High-volume embedding |
| text-embedding-3-large | $0.13 | — | High-accuracy embedding |
| o1-mini | $3.00 | $12.00 | Multi-step reasoning |
| o1 | $15.00 | $60.00 | Complex research tasks |
The dual-model pattern (cheap model for classification, quality model for generation) described in the Build AI chatbot Python guide can cut costs by 60-80% for high-volume support applications.
Combining Callbacks: Production Setup
from langchain_openai import ChatOpenAI
def create_tracked_llm(
model: str = "gpt-4o-mini",
budget_usd: Optional[float] = None,
user_id: Optional[str] = None
) -> ChatOpenAI:
"""Create an LLM with full cost tracking configured."""
callbacks = [
CostTrackingCallback(name=f"{model}_tracker"),
DatabaseCostLogger(db_path="production_costs.db"),
]
if budget_usd is not None:
callbacks.append(BudgetAlertCallback(budget_usd=budget_usd))
return ChatOpenAI(
model=model,
callbacks=callbacks,
metadata={"user_id": user_id or "system"}
)
# Typical production usage
llm = create_tracked_llm(
model="gpt-4o-mini",
budget_usd=1.00, # $1 per agent run
user_id="prod_pipeline"
)
chain = (
ChatPromptTemplate.from_template("Analyze: {input}")
| llm
| StrOutputParser()
)
result = chain.invoke({"input": "the latest trends in AI agent development"})
This setup pairs well with the OpenAI API integration guide which covers API key management and rate limiting. For agents that use tools and might run many iterations, combining budget callbacks with the checkpointing patterns from LangChain checkpointers helps you resume from checkpoints without re-paying for already-completed steps.
Testing Callbacks Without Incurring Costs
from langchain_core.outputs import LLMResult, ChatGeneration
from langchain_core.messages import AIMessage
import uuid
def simulate_llm_callback(callback: BaseCallbackHandler, model: str,
prompt_tokens: int, completion_tokens: int):
"""Simulate an LLM call for testing callbacks."""
run_id = uuid.uuid4()
# Simulate start
callback.on_llm_start(
serialized={"name": "ChatOpenAI"},
prompts=["test prompt"],
run_id=run_id
)
# Simulate end with fake usage data
fake_response = LLMResult(
generations=[[ChatGeneration(message=AIMessage(content="test response"))]],
llm_output={
"model_name": model,
"token_usage": {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": prompt_tokens + completion_tokens
}
}
)
callback.on_llm_end(fake_response, run_id=run_id)
# Test your callback without API calls
tracker = CostTrackingCallback(name="test")
simulate_llm_callback(tracker, "gpt-4o-mini", 150, 200)
simulate_llm_callback(tracker, "gpt-4o", 500, 300)
print(tracker.summary)
Key Takeaways
The seven callbacks in this guide cover the full spectrum from quick one-liners to production-grade monitoring systems. get_openai_callback is your fastest debugging tool. CostTrackingCallback gives you provider-agnostic tracking with configurable pricing tables. BudgetAlertCallback prevents runaway agent loops from generating surprise bills. DatabaseCostLogger gives you the audit trail and query capability that operational teams need.
The most important pattern is combining them: use PerUserCostTracker for billing attribution, BudgetAlertCallback for safety, and DatabaseCostLogger for historical analysis — all attached to the same LLM instance. None of these callbacks add measurable latency; they only execute after the API call completes.
For the RAG architectures discussed in the RAG system tutorial and for the agent patterns in Build AI agent with LangChain, these callbacks integrate at the LLM level and capture every token across every chain step automatically.
Frequently Asked Questions
Does get_openai_callback work with GPT-4o and GPT-4o-mini? Yes. get_openai_callback uses the token counts reported directly by the OpenAI API, so it works with any model that returns usage data — GPT-4o, GPT-4o-mini, GPT-3.5-turbo, and the embedding models.
Can I track costs for non-OpenAI models? Yes. Build a custom callback by subclassing BaseCallbackHandler and implementing on_llm_end. The response object contains the token counts reported by any model that provides them. For models that don't report tokens, you can use tiktoken to estimate costs from the prompt and response text.
How do I set a hard spending limit that stops the agent? Use a custom callback that raises a BudgetExceededException inside on_llm_start or on_llm_end. When the cumulative cost exceeds your limit, the exception propagates up and halts the chain. Wrap your agent invocation in a try/except to handle it gracefully.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.
Build a LangChain Agent with Memory and Tools (Full Example)
Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.