7 LangChain Callbacks: Logging, Tracing, and Streaming (2026)
Master 7 LangChain callbacks including StdOutCallbackHandler, LangSmith tracing, custom callbacks, streaming tokens, and token usage monitoring with working Python examples.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
The hardest part of debugging a LangChain application isn't the code — it's not being able to see what's happening inside. The LLM calls something, chains execute, tools run, and you get a final answer. But if the answer is wrong, you have no idea which step went wrong, what the model was actually thinking, or how many tokens you burned getting there. That opacity is what callbacks solve.
I've worked on production LangChain systems where callbacks were the difference between shipping in a week and spending two weeks staring at unhelpful error messages. This guide covers all seven callback types you'll actually use, from the basic stdout logger to custom monitoring integrations and LangSmith tracing.
If you're building agents that you'll need to monitor, Build AI agent with LangChain sets up the agent you'll instrument here. For the LCEL chains where callbacks plug in naturally, see the LCEL complete guide.
How LangChain Callbacks Work
Every major LangChain event fires a callback. LLM calls, chain starts and ends, tool executions, retriever calls — all of these emit events that your callback handlers can listen to. The callback system is essentially an observer pattern built into the LangChain runtime.
Callbacks can be attached at three levels:
- Constructor callbacks: Always active for that component
- Runtime callbacks: Active only for that specific invocation
- Global callbacks: Active for everything in the application
from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler
# Constructor level — always active
llm = ChatOpenAI(
model="gpt-4o-mini",
callbacks=[StdOutCallbackHandler()]
)
# Runtime level — only for this invocation
result = llm.invoke("Hello", config={"callbacks": [StdOutCallbackHandler()]})
# Both work — choose based on whether you want permanent or one-off logging
Callback 1: StdOutCallbackHandler
The simplest callback. Prints everything to stdout. Perfect for development when you want to see exactly what's happening.
from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Answer this question: {question}")
parser = StrOutputParser()
# With verbose=True, you get detailed output
chain = prompt | llm | parser
result = chain.invoke(
{"question": "What is LangChain?"},
config={"callbacks": [StdOutCallbackHandler()]}
)
In practice, I prefer verbose=True on the chain components directly during development:
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
# verbose=True on the chain itself
chain = LLMChain(
llm=OpenAI(temperature=0, verbose=True),
prompt=PromptTemplate(input_variables=["q"], template="Answer: {q}"),
verbose=True
)
Callback 2: FileCallbackHandler
When you need persistent logs — audit trails, debugging sessions, production logs — write to a file.
from langchain_community.callbacks import FileCallbackHandler
import logging
from datetime import datetime
# Create a timestamped log file
log_filename = f"langchain_logs_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
file_callback = FileCallbackHandler(log_filename)
chain = prompt | llm | parser
# All chain events written to file
result = chain.invoke(
{"question": "Explain neural networks"},
config={"callbacks": [file_callback]}
)
print(f"Logs written to: {log_filename}")
For structured logging that integrates with your existing log infrastructure:
import logging
import json
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Union
from uuid import UUID
class StructuredFileCallbackHandler(BaseCallbackHandler):
"""Writes structured JSON logs for each chain event."""
def __init__(self, log_file: str):
self.log_file = log_file
logging.basicConfig(
filename=log_file,
level=logging.INFO,
format='%(message)s'
)
self.logger = logging.getLogger(__name__)
def _log(self, event_type: str, data: dict):
entry = {
"timestamp": datetime.now().isoformat(),
"event": event_type,
**data
}
self.logger.info(json.dumps(entry))
def on_llm_start(self, serialized: Dict, prompts: List[str], **kwargs):
self._log("llm_start", {
"model": serialized.get("kwargs", {}).get("model_name", "unknown"),
"prompt_preview": prompts[0][:200] if prompts else ""
})
def on_llm_end(self, response: LLMResult, **kwargs):
generations = response.generations
output = generations[0][0].text if generations else ""
token_usage = response.llm_output.get("token_usage", {}) if response.llm_output else {}
self._log("llm_end", {
"output_preview": output[:200],
"total_tokens": token_usage.get("total_tokens", 0),
"prompt_tokens": token_usage.get("prompt_tokens", 0),
"completion_tokens": token_usage.get("completion_tokens", 0)
})
def on_chain_start(self, serialized: Dict, inputs: Dict, **kwargs):
self._log("chain_start", {
"chain_type": serialized.get("id", ["unknown"])[-1],
"input_keys": list(inputs.keys())
})
def on_chain_end(self, outputs: Dict, **kwargs):
self._log("chain_end", {"output_keys": list(outputs.keys())})
def on_chain_error(self, error: Exception, **kwargs):
self._log("chain_error", {"error": str(error), "error_type": type(error).__name__})
def on_tool_start(self, serialized: Dict, input_str: str, **kwargs):
self._log("tool_start", {
"tool": serialized.get("name", "unknown"),
"input": input_str[:200]
})
def on_tool_end(self, output: str, **kwargs):
self._log("tool_end", {"output_preview": output[:200]})
# Use it
structured_logger = StructuredFileCallbackHandler("app_traces.jsonl")
result = chain.invoke(
{"question": "What is LCEL?"},
config={"callbacks": [structured_logger]}
)
Callback 3: LangSmith Tracing
LangSmith is the production-grade observability platform for LangChain. It gives you a visual trace explorer, latency analytics, cost tracking, and the ability to replay and debug specific runs.
pip install langsmith
import os
# Enable LangSmith tracing via environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-production-app"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com" # default
# That's it — all LangChain runs are now traced automatically
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {concept}") | llm | StrOutputParser()
# This run will appear in LangSmith
result = chain.invoke({"concept": "embeddings"})
print(result)
For more control over what gets traced:
from langsmith import Client, traceable
from langsmith.run_helpers import get_current_run_tree
client = Client()
# Tag specific functions as traceable
@traceable(name="document_qa", tags=["production", "qa"])
def answer_question(question: str, document_id: str) -> str:
"""This function will appear as a named node in LangSmith traces."""
chain = ChatPromptTemplate.from_template("{question}") | llm | StrOutputParser()
# Add metadata to the current run
run_tree = get_current_run_tree()
if run_tree:
run_tree.extra["document_id"] = document_id
return chain.invoke({"question": question})
result = answer_question("What is RAG?", "doc_123")
Creating datasets and running evaluations in LangSmith:
from langsmith import Client
client = Client()
# Create a dataset for evaluation
dataset = client.create_dataset(
dataset_name="QA Evaluation Set",
description="Test cases for document Q&A system"
)
# Add examples
examples = [
{"question": "What is LangChain?", "expected": "LangChain is a framework..."},
{"question": "What is LCEL?", "expected": "LCEL is LangChain Expression Language..."}
]
client.create_examples(
inputs=[{"question": e["question"]} for e in examples],
outputs=[{"answer": e["expected"]} for e in examples],
dataset_id=dataset.id
)
print(f"Dataset created: {dataset.name}")
Callback 4: Custom Callback Class
When you need to integrate LangChain monitoring with your own systems — Datadog, Sentry, Prometheus, custom dashboards — build a custom callback handler.
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time
class MetricsCallbackHandler(BaseCallbackHandler):
"""
Custom callback that collects metrics for monitoring.
Tracks latency, token usage, and error rates.
"""
def __init__(self, service_name: str = "langchain-app"):
self.service_name = service_name
self.metrics = {
"total_calls": 0,
"total_tokens": 0,
"total_cost_estimate": 0.0,
"error_count": 0,
"latencies": []
}
self._start_times = {}
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
run_id: Any = None,
**kwargs: Any
) -> None:
self._start_times[str(run_id)] = time.time()
self.metrics["total_calls"] += 1
def on_llm_end(
self,
response: LLMResult,
run_id: Any = None,
**kwargs: Any
) -> None:
run_id_str = str(run_id)
if run_id_str in self._start_times:
latency = time.time() - self._start_times.pop(run_id_str)
self.metrics["latencies"].append(latency)
# Extract token usage
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
total_tokens = usage.get("total_tokens", 0)
self.metrics["total_tokens"] += total_tokens
# Rough cost estimate (gpt-4o-mini pricing)
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
cost = (prompt_tokens * 0.00015 + completion_tokens * 0.0006) / 1000
self.metrics["total_cost_estimate"] += cost
def on_llm_error(self, error: Exception, **kwargs: Any) -> None:
self.metrics["error_count"] += 1
# Here you'd integrate with Sentry, PagerDuty, etc.
print(f"[ALERT] LLM Error in {self.service_name}: {error}")
def get_summary(self) -> dict:
latencies = self.metrics["latencies"]
avg_latency = sum(latencies) / len(latencies) if latencies else 0
return {
"service": self.service_name,
"total_calls": self.metrics["total_calls"],
"total_tokens": self.metrics["total_tokens"],
"estimated_cost_usd": round(self.metrics["total_cost_estimate"], 4),
"error_count": self.metrics["error_count"],
"avg_latency_seconds": round(avg_latency, 3),
"error_rate": self.metrics["error_count"] / max(self.metrics["total_calls"], 1)
}
# Use across your application
metrics = MetricsCallbackHandler(service_name="document-qa-api")
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[metrics])
chain = prompt | llm | StrOutputParser()
# Run multiple queries
questions = [
"What is RAG?",
"Explain LangChain LCEL",
"What are output parsers?"
]
for q in questions:
chain.invoke({"question": q})
print("\nMetrics Summary:")
import json
print(json.dumps(metrics.get_summary(), indent=2))
Callback 5: StreamingCallbackHandler for Real-Time Output
When users expect to see tokens as they're generated (like ChatGPT does), you need streaming. Here's a proper streaming callback:
from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
# Built-in streaming to stdout
streaming_llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()]
)
chain = ChatPromptTemplate.from_template("Write a short poem about {topic}") | streaming_llm | StrOutputParser()
# Tokens print as they arrive
print("Generating poem: ")
result = chain.invoke({"topic": "machine learning"})
For a custom streaming callback with more control:
from langchain_core.callbacks import BaseCallbackHandler
import queue
import threading
class QueueCallbackHandler(BaseCallbackHandler):
"""
Streams tokens into a queue for consumption by another thread.
Perfect for FastAPI Server-Sent Events or WebSocket streaming.
"""
def __init__(self):
self.token_queue = queue.Queue()
self.done = False
def on_llm_new_token(self, token: str, **kwargs) -> None:
"""Called for each new token during streaming."""
self.token_queue.put(token)
def on_llm_end(self, response: LLMResult, **kwargs) -> None:
"""Signal that streaming is complete."""
self.token_queue.put(None) # Sentinel value
self.done = True
def on_llm_error(self, error: Exception, **kwargs) -> None:
self.token_queue.put(f"[ERROR: {error}]")
self.token_queue.put(None)
def run_chain_streaming(question: str):
"""Run chain in thread, consume tokens from main thread."""
callback = QueueCallbackHandler()
streaming_llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[callback]
)
chain = ChatPromptTemplate.from_template("{question}") | streaming_llm | StrOutputParser()
# Run chain in background thread
def run():
chain.invoke({"question": question})
thread = threading.Thread(target=run)
thread.start()
# Consume tokens from queue
full_response = ""
while True:
token = callback.token_queue.get(timeout=30)
if token is None:
break
full_response += token
print(token, end="", flush=True)
print() # newline
thread.join()
return full_response
response = run_chain_streaming("Explain the attention mechanism in transformers.")
Callback 6: Token Usage Callback
Cost control is a production concern. This callback tracks token usage and can alert or block when limits are approached.
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {topic} in detail.") | llm | StrOutputParser()
# Context manager for token tracking
with get_openai_callback() as cb:
result1 = chain.invoke({"topic": "transformers"})
result2 = chain.invoke({"topic": "attention mechanisms"})
result3 = chain.invoke({"topic": "RLHF"})
print(f"Total tokens used: {cb.total_tokens}")
print(f"Prompt tokens: {cb.prompt_tokens}")
print(f"Completion tokens: {cb.completion_tokens}")
print(f"Total cost: ${cb.total_cost:.4f}")
For a budget-enforcing callback:
from langchain_core.callbacks import BaseCallbackHandler
class BudgetCallbackHandler(BaseCallbackHandler):
"""Tracks costs and raises an exception when budget is exceeded."""
# Pricing per 1M tokens (gpt-4o-mini as of 2025)
PROMPT_COST_PER_1M = 0.15
COMPLETION_COST_PER_1M = 0.60
def __init__(self, budget_usd: float):
self.budget_usd = budget_usd
self.total_cost = 0.0
self.total_tokens = 0
def on_llm_end(self, response: LLMResult, **kwargs) -> None:
if response.llm_output:
usage = response.llm_output.get("token_usage", {})
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
cost = (
prompt_tokens * self.PROMPT_COST_PER_1M / 1_000_000 +
completion_tokens * self.COMPLETION_COST_PER_1M / 1_000_000
)
self.total_cost += cost
self.total_tokens += usage.get("total_tokens", 0)
if self.total_cost > self.budget_usd:
raise RuntimeError(
f"Budget exceeded! Spent ${self.total_cost:.4f} of ${self.budget_usd} budget. "
f"Total tokens used: {self.total_tokens}"
)
@property
def remaining_budget(self) -> float:
return max(0, self.budget_usd - self.total_cost)
budget = BudgetCallbackHandler(budget_usd=0.10) # $0.10 limit
llm_with_budget = ChatOpenAI(model="gpt-4o-mini", callbacks=[budget])
Callback 7: Async Callbacks
For high-throughput applications, async callbacks prevent the monitoring overhead from blocking your main application:
from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.outputs import LLMResult
import asyncio
import aiofiles
import json
class AsyncLoggingCallbackHandler(AsyncCallbackHandler):
"""Non-blocking async logging callback."""
def __init__(self, log_file: str):
self.log_file = log_file
async def on_llm_start(
self,
serialized: dict,
prompts: list,
**kwargs
) -> None:
await self._write_log({
"event": "llm_start",
"model": serialized.get("kwargs", {}).get("model_name"),
"timestamp": datetime.now().isoformat()
})
async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
tokens = {}
if response.llm_output:
tokens = response.llm_output.get("token_usage", {})
await self._write_log({
"event": "llm_end",
"tokens": tokens,
"timestamp": datetime.now().isoformat()
})
async def _write_log(self, data: dict) -> None:
async with aiofiles.open(self.log_file, "a") as f:
await f.write(json.dumps(data) + "\n")
# Use with async chains
async def run_async():
async_logger = AsyncLoggingCallbackHandler("async_logs.jsonl")
llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[async_logger])
chain = ChatPromptTemplate.from_template("{q}") | llm | StrOutputParser()
# Run multiple queries concurrently
tasks = [
chain.ainvoke({"q": "What is LangChain?"}),
chain.ainvoke({"q": "What is LCEL?"}),
chain.ainvoke({"q": "What are callbacks?"}),
]
results = await asyncio.gather(*tasks)
return results
results = asyncio.run(run_async())
Comparison Table: LangChain Callback Options
| Callback | Setup | Use Case | Performance | LangSmith Integration |
|---|---|---|---|---|
| StdOutCallbackHandler | Zero | Development debugging | No overhead | No |
| FileCallbackHandler | Minimal | Audit logs, debug sessions | Low | No |
| LangSmith Tracing | Env vars | Production monitoring | Low (async) | Yes, native |
| Custom BaseCallbackHandler | Medium | Custom monitoring, metrics | Varies | Manual |
| StreamingCallbackHandler | Low | Real-time UI streaming | None | No |
| Token Usage (get_openai_callback) | None | Cost tracking | Minimal | No |
| AsyncCallbackHandler | Medium | High-throughput production | Minimal | Manual |
According to LangSmith's documentation, teams using LangSmith reduce production debugging time by 60% compared to log-only setups, primarily because trace visualization shows the complete chain execution at a glance.
The Deploy AI model to production guide covers how to structure callbacks in a deployed service. For the agent architecture these callbacks monitor, AI agent memory and planning shows what's happening inside the traces you'll be watching.
Also worth reading: AI research agent build shows a multi-tool agent where callback monitoring is essential — with multiple tools firing, knowing which step consumed tokens is critical for cost optimization.
Conclusion
Callbacks are not optional for production LangChain applications — they're your window into what your application is doing. Start with LangSmith tracing from day one (it's just two environment variables), add a token usage callback to stay aware of costs, and build a custom metrics handler when you need to integrate with your existing monitoring stack.
The pattern I use in every production project: LangSmith for traces, a custom MetricsCallbackHandler for Datadog/Prometheus, and a BudgetCallbackHandler to prevent runaway costs. Together these give you visibility, alertability, and financial guardrails.
For the full picture of LangChain in production, pair this with LangChain tutorial 2025 and OpenAI API integration for the OpenAI-specific monitoring hooks.
Frequently Asked Questions
Do LangChain callbacks slow down my application?
Synchronous callbacks add minimal overhead — typically under 5ms per callback event. Asynchronous callbacks (using async handlers) have near-zero overhead. The main cost is I/O: writing to files or sending traces to LangSmith. For production, use async callbacks and batch trace uploads where possible.
Is LangSmith required for LangChain in production?
No, LangSmith is optional but highly recommended for production debugging. Without it, you're working with console logs when something goes wrong. LangSmith gives you visual traces showing exactly what happened inside each chain invocation, which saves hours of debugging time.
How do I build a custom LangChain callback for my monitoring system?
Subclass BaseCallbackHandler and override the event methods you need — on_llm_start, on_llm_end, on_chain_start, on_chain_end, on_tool_start, on_tool_end. Return None from each handler. Instantiate your callback and pass it in the callbacks parameter when invoking chains.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.
Build a LangChain Agent with Memory and Tools (Full Example)
Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.