7 LangChain Callbacks: Logging, Tracing, and Streaming (2026)

Q: How do I build a custom LangChain callback for my monitoring system?

Subclass BaseCallbackHandler and override the event methods you need — on_llm_start, on_llm_end, on_chain_start, on_chain_end, on_tool_start, on_tool_end. Return None from each handler. Instantiate your callback and pass it in the callbacks parameter when invoking chains.

The hardest part of debugging a LangChain application isn't the code — it's not being able to see what's happening inside. The LLM calls something, chains execute, tools run, and you get a final answer. But if the answer is wrong, you have no idea which step went wrong, what the model was actually thinking, or how many tokens you burned getting there. That opacity is what callbacks solve.

I've worked on production LangChain systems where callbacks were the difference between shipping in a week and spending two weeks staring at unhelpful error messages. This guide covers all seven callback types you'll actually use, from the basic stdout logger to custom monitoring integrations and LangSmith tracing.

If you're building agents that you'll need to monitor, Build AI agent with LangChain sets up the agent you'll instrument here. For the LCEL chains where callbacks plug in naturally, see the LCEL complete guide.

How LangChain Callbacks Work

Every major LangChain event fires a callback. LLM calls, chain starts and ends, tool executions, retriever calls — all of these emit events that your callback handlers can listen to. The callback system is essentially an observer pattern built into the LangChain runtime.

Callbacks can be attached at three levels:

Constructor callbacks: Always active for that component
Runtime callbacks: Active only for that specific invocation
Global callbacks: Active for everything in the application

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler

# Constructor level — always active
llm = ChatOpenAI(
    model="gpt-4o-mini",
    callbacks=[StdOutCallbackHandler()]
)

# Runtime level — only for this invocation
result = llm.invoke("Hello", config={"callbacks": [StdOutCallbackHandler()]})

# Both work — choose based on whether you want permanent or one-off logging

Callback 1: StdOutCallbackHandler

The simplest callback. Prints everything to stdout. Perfect for development when you want to see exactly what's happening.

from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Answer this question: {question}")
parser = StrOutputParser()

# With verbose=True, you get detailed output
chain = prompt | llm | parser

result = chain.invoke(
    {"question": "What is LangChain?"},
    config={"callbacks": [StdOutCallbackHandler()]}
)

In practice, I prefer verbose=True on the chain components directly during development:

from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate

# verbose=True on the chain itself
chain = LLMChain(
    llm=OpenAI(temperature=0, verbose=True),
    prompt=PromptTemplate(input_variables=["q"], template="Answer: {q}"),
    verbose=True
)

Callback 2: FileCallbackHandler

When you need persistent logs — audit trails, debugging sessions, production logs — write to a file.

from langchain_community.callbacks import FileCallbackHandler
import logging
from datetime import datetime

# Create a timestamped log file
log_filename = f"langchain_logs_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"

file_callback = FileCallbackHandler(log_filename)

chain = prompt | llm | parser

# All chain events written to file
result = chain.invoke(
    {"question": "Explain neural networks"},
    config={"callbacks": [file_callback]}
)

print(f"Logs written to: {log_filename}")

For structured logging that integrates with your existing log infrastructure:

import logging
import json
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Union
from uuid import UUID

class StructuredFileCallbackHandler(BaseCallbackHandler):
    """Writes structured JSON logs for each chain event."""
    
    def __init__(self, log_file: str):
        self.log_file = log_file
        logging.basicConfig(
            filename=log_file,
            level=logging.INFO,
            format='%(message)s'
        )
        self.logger = logging.getLogger(__name__)
    
    def _log(self, event_type: str, data: dict):
        entry = {
            "timestamp": datetime.now().isoformat(),
            "event": event_type,
            **data
        }
        self.logger.info(json.dumps(entry))
    
    def on_llm_start(self, serialized: Dict, prompts: List[str], **kwargs):
        self._log("llm_start", {
            "model": serialized.get("kwargs", {}).get("model_name", "unknown"),
            "prompt_preview": prompts[0][:200] if prompts else ""
        })
    
    def on_llm_end(self, response: LLMResult, **kwargs):
        generations = response.generations
        output = generations[0][0].text if generations else ""
        token_usage = response.llm_output.get("token_usage", {}) if response.llm_output else {}
        
        self._log("llm_end", {
            "output_preview": output[:200],
            "total_tokens": token_usage.get("total_tokens", 0),
            "prompt_tokens": token_usage.get("prompt_tokens", 0),
            "completion_tokens": token_usage.get("completion_tokens", 0)
        })
    
    def on_chain_start(self, serialized: Dict, inputs: Dict, **kwargs):
        self._log("chain_start", {
            "chain_type": serialized.get("id", ["unknown"])[-1],
            "input_keys": list(inputs.keys())
        })
    
    def on_chain_end(self, outputs: Dict, **kwargs):
        self._log("chain_end", {"output_keys": list(outputs.keys())})
    
    def on_chain_error(self, error: Exception, **kwargs):
        self._log("chain_error", {"error": str(error), "error_type": type(error).__name__})
    
    def on_tool_start(self, serialized: Dict, input_str: str, **kwargs):
        self._log("tool_start", {
            "tool": serialized.get("name", "unknown"),
            "input": input_str[:200]
        })
    
    def on_tool_end(self, output: str, **kwargs):
        self._log("tool_end", {"output_preview": output[:200]})

# Use it
structured_logger = StructuredFileCallbackHandler("app_traces.jsonl")
result = chain.invoke(
    {"question": "What is LCEL?"},
    config={"callbacks": [structured_logger]}
)

Callback 3: LangSmith Tracing

LangSmith is the production-grade observability platform for LangChain. It gives you a visual trace explorer, latency analytics, cost tracking, and the ability to replay and debug specific runs.

pip install langsmith

import os

# Enable LangSmith tracing via environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-production-app"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"  # default

# That's it — all LangChain runs are now traced automatically
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {concept}") | llm | StrOutputParser()

# This run will appear in LangSmith
result = chain.invoke({"concept": "embeddings"})
print(result)

For more control over what gets traced:

from langsmith import Client, traceable
from langsmith.run_helpers import get_current_run_tree

client = Client()

# Tag specific functions as traceable
@traceable(name="document_qa", tags=["production", "qa"])
def answer_question(question: str, document_id: str) -> str:
    """This function will appear as a named node in LangSmith traces."""
    chain = ChatPromptTemplate.from_template("{question}") | llm | StrOutputParser()
    
    # Add metadata to the current run
    run_tree = get_current_run_tree()
    if run_tree:
        run_tree.extra["document_id"] = document_id
    
    return chain.invoke({"question": question})

result = answer_question("What is RAG?", "doc_123")

Creating datasets and running evaluations in LangSmith:

from langsmith import Client

client = Client()

# Create a dataset for evaluation
dataset = client.create_dataset(
    dataset_name="QA Evaluation Set",
    description="Test cases for document Q&A system"
)

# Add examples
examples = [
    {"question": "What is LangChain?", "expected": "LangChain is a framework..."},
    {"question": "What is LCEL?", "expected": "LCEL is LangChain Expression Language..."}
]

client.create_examples(
    inputs=[{"question": e["question"]} for e in examples],
    outputs=[{"answer": e["expected"]} for e in examples],
    dataset_id=dataset.id
)

print(f"Dataset created: {dataset.name}")

Callback 4: Custom Callback Class

When you need to integrate LangChain monitoring with your own systems — Datadog, Sentry, Prometheus, custom dashboards — build a custom callback handler.

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time

class MetricsCallbackHandler(BaseCallbackHandler):
    """
    Custom callback that collects metrics for monitoring.
    Tracks latency, token usage, and error rates.
    """
    
    def __init__(self, service_name: str = "langchain-app"):
        self.service_name = service_name
        self.metrics = {
            "total_calls": 0,
            "total_tokens": 0,
            "total_cost_estimate": 0.0,
            "error_count": 0,
            "latencies": []
        }
        self._start_times = {}
    
    def on_llm_start(
        self, 
        serialized: Dict[str, Any], 
        prompts: List[str], 
        run_id: Any = None,
        **kwargs: Any
    ) -> None:
        self._start_times[str(run_id)] = time.time()
        self.metrics["total_calls"] += 1
    
    def on_llm_end(
        self, 
        response: LLMResult, 
        run_id: Any = None,
        **kwargs: Any
    ) -> None:
        run_id_str = str(run_id)
        if run_id_str in self._start_times:
            latency = time.time() - self._start_times.pop(run_id_str)
            self.metrics["latencies"].append(latency)
        
        # Extract token usage
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            total_tokens = usage.get("total_tokens", 0)
            self.metrics["total_tokens"] += total_tokens
            
            # Rough cost estimate (gpt-4o-mini pricing)
            prompt_tokens = usage.get("prompt_tokens", 0)
            completion_tokens = usage.get("completion_tokens", 0)
            cost = (prompt_tokens * 0.00015 + completion_tokens * 0.0006) / 1000
            self.metrics["total_cost_estimate"] += cost
    
    def on_llm_error(self, error: Exception, **kwargs: Any) -> None:
        self.metrics["error_count"] += 1
        # Here you'd integrate with Sentry, PagerDuty, etc.
        print(f"[ALERT] LLM Error in {self.service_name}: {error}")
    
    def get_summary(self) -> dict:
        latencies = self.metrics["latencies"]
        avg_latency = sum(latencies) / len(latencies) if latencies else 0
        
        return {
            "service": self.service_name,
            "total_calls": self.metrics["total_calls"],
            "total_tokens": self.metrics["total_tokens"],
            "estimated_cost_usd": round(self.metrics["total_cost_estimate"], 4),
            "error_count": self.metrics["error_count"],
            "avg_latency_seconds": round(avg_latency, 3),
            "error_rate": self.metrics["error_count"] / max(self.metrics["total_calls"], 1)
        }

# Use across your application
metrics = MetricsCallbackHandler(service_name="document-qa-api")

llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[metrics])
chain = prompt | llm | StrOutputParser()

# Run multiple queries
questions = [
    "What is RAG?",
    "Explain LangChain LCEL",
    "What are output parsers?"
]

for q in questions:
    chain.invoke({"question": q})

print("\nMetrics Summary:")
import json
print(json.dumps(metrics.get_summary(), indent=2))

Callback 5: StreamingCallbackHandler for Real-Time Output

When users expect to see tokens as they're generated (like ChatGPT does), you need streaming. Here's a proper streaming callback:

from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Built-in streaming to stdout
streaming_llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

chain = ChatPromptTemplate.from_template("Write a short poem about {topic}") | streaming_llm | StrOutputParser()

# Tokens print as they arrive
print("Generating poem: ")
result = chain.invoke({"topic": "machine learning"})

For a custom streaming callback with more control:

from langchain_core.callbacks import BaseCallbackHandler
import queue
import threading

class QueueCallbackHandler(BaseCallbackHandler):
    """
    Streams tokens into a queue for consumption by another thread.
    Perfect for FastAPI Server-Sent Events or WebSocket streaming.
    """
    
    def __init__(self):
        self.token_queue = queue.Queue()
        self.done = False
    
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        """Called for each new token during streaming."""
        self.token_queue.put(token)
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        """Signal that streaming is complete."""
        self.token_queue.put(None)  # Sentinel value
        self.done = True
    
    def on_llm_error(self, error: Exception, **kwargs) -> None:
        self.token_queue.put(f"[ERROR: {error}]")
        self.token_queue.put(None)

def run_chain_streaming(question: str):
    """Run chain in thread, consume tokens from main thread."""
    callback = QueueCallbackHandler()
    
    streaming_llm = ChatOpenAI(
        model="gpt-4o-mini",
        streaming=True,
        callbacks=[callback]
    )
    
    chain = ChatPromptTemplate.from_template("{question}") | streaming_llm | StrOutputParser()
    
    # Run chain in background thread
    def run():
        chain.invoke({"question": question})
    
    thread = threading.Thread(target=run)
    thread.start()
    
    # Consume tokens from queue
    full_response = ""
    while True:
        token = callback.token_queue.get(timeout=30)
        if token is None:
            break
        full_response += token
        print(token, end="", flush=True)
    
    print()  # newline
    thread.join()
    return full_response

response = run_chain_streaming("Explain the attention mechanism in transformers.")

Callback 6: Token Usage Callback

Cost control is a production concern. This callback tracks token usage and can alert or block when limits are approached.

from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {topic} in detail.") | llm | StrOutputParser()

# Context manager for token tracking
with get_openai_callback() as cb:
    result1 = chain.invoke({"topic": "transformers"})
    result2 = chain.invoke({"topic": "attention mechanisms"})
    result3 = chain.invoke({"topic": "RLHF"})
    
    print(f"Total tokens used: {cb.total_tokens}")
    print(f"Prompt tokens: {cb.prompt_tokens}")
    print(f"Completion tokens: {cb.completion_tokens}")
    print(f"Total cost: ${cb.total_cost:.4f}")

For a budget-enforcing callback:

from langchain_core.callbacks import BaseCallbackHandler

class BudgetCallbackHandler(BaseCallbackHandler):
    """Tracks costs and raises an exception when budget is exceeded."""
    
    # Pricing per 1M tokens (gpt-4o-mini as of 2025)
    PROMPT_COST_PER_1M = 0.15
    COMPLETION_COST_PER_1M = 0.60
    
    def __init__(self, budget_usd: float):
        self.budget_usd = budget_usd
        self.total_cost = 0.0
        self.total_tokens = 0
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            prompt_tokens = usage.get("prompt_tokens", 0)
            completion_tokens = usage.get("completion_tokens", 0)
            
            cost = (
                prompt_tokens * self.PROMPT_COST_PER_1M / 1_000_000 +
                completion_tokens * self.COMPLETION_COST_PER_1M / 1_000_000
            )
            
            self.total_cost += cost
            self.total_tokens += usage.get("total_tokens", 0)
            
            if self.total_cost > self.budget_usd:
                raise RuntimeError(
                    f"Budget exceeded! Spent ${self.total_cost:.4f} of ${self.budget_usd} budget. "
                    f"Total tokens used: {self.total_tokens}"
                )
    
    @property
    def remaining_budget(self) -> float:
        return max(0, self.budget_usd - self.total_cost)

budget = BudgetCallbackHandler(budget_usd=0.10)  # $0.10 limit
llm_with_budget = ChatOpenAI(model="gpt-4o-mini", callbacks=[budget])

Callback 7: Async Callbacks

For high-throughput applications, async callbacks prevent the monitoring overhead from blocking your main application:

from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.outputs import LLMResult
import asyncio
import aiofiles
import json

class AsyncLoggingCallbackHandler(AsyncCallbackHandler):
    """Non-blocking async logging callback."""
    
    def __init__(self, log_file: str):
        self.log_file = log_file
    
    async def on_llm_start(
        self, 
        serialized: dict, 
        prompts: list, 
        **kwargs
    ) -> None:
        await self._write_log({
            "event": "llm_start",
            "model": serialized.get("kwargs", {}).get("model_name"),
            "timestamp": datetime.now().isoformat()
        })
    
    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens = {}
        if response.llm_output:
            tokens = response.llm_output.get("token_usage", {})
        
        await self._write_log({
            "event": "llm_end",
            "tokens": tokens,
            "timestamp": datetime.now().isoformat()
        })
    
    async def _write_log(self, data: dict) -> None:
        async with aiofiles.open(self.log_file, "a") as f:
            await f.write(json.dumps(data) + "\n")

# Use with async chains
async def run_async():
    async_logger = AsyncLoggingCallbackHandler("async_logs.jsonl")
    llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[async_logger])
    chain = ChatPromptTemplate.from_template("{q}") | llm | StrOutputParser()
    
    # Run multiple queries concurrently
    tasks = [
        chain.ainvoke({"q": "What is LangChain?"}),
        chain.ainvoke({"q": "What is LCEL?"}),
        chain.ainvoke({"q": "What are callbacks?"}),
    ]
    
    results = await asyncio.gather(*tasks)
    return results

results = asyncio.run(run_async())

Comparison Table: LangChain Callback Options

Callback	Setup	Use Case	Performance	LangSmith Integration
StdOutCallbackHandler	Zero	Development debugging	No overhead	No
FileCallbackHandler	Minimal	Audit logs, debug sessions	Low	No
LangSmith Tracing	Env vars	Production monitoring	Low (async)	Yes, native
Custom BaseCallbackHandler	Medium	Custom monitoring, metrics	Varies	Manual
StreamingCallbackHandler	Low	Real-time UI streaming	None	No
Token Usage (get_openai_callback)	None	Cost tracking	Minimal	No
AsyncCallbackHandler	Medium	High-throughput production	Minimal	Manual

According to LangSmith's documentation, teams using LangSmith reduce production debugging time by 60% compared to log-only setups, primarily because trace visualization shows the complete chain execution at a glance.

The Deploy AI model to production guide covers how to structure callbacks in a deployed service. For the agent architecture these callbacks monitor, AI agent memory and planning shows what's happening inside the traces you'll be watching.

Also worth reading: AI research agent build shows a multi-tool agent where callback monitoring is essential — with multiple tools firing, knowing which step consumed tokens is critical for cost optimization.

Conclusion

Callbacks are not optional for production LangChain applications — they're your window into what your application is doing. Start with LangSmith tracing from day one (it's just two environment variables), add a token usage callback to stay aware of costs, and build a custom metrics handler when you need to integrate with your existing monitoring stack.

The pattern I use in every production project: LangSmith for traces, a custom MetricsCallbackHandler for Datadog/Prometheus, and a BudgetCallbackHandler to prevent runaway costs. Together these give you visibility, alertability, and financial guardrails.

For the full picture of LangChain in production, pair this with LangChain tutorial 2025 and OpenAI API integration for the OpenAI-specific monitoring hooks.

Frequently Asked Questions

Do LangChain callbacks slow down my application?

Synchronous callbacks add minimal overhead — typically under 5ms per callback event. Asynchronous callbacks (using async handlers) have near-zero overhead. The main cost is I/O: writing to files or sending traces to LangSmith. For production, use async callbacks and batch trace uploads where possible.

Is LangSmith required for LangChain in production?

No, LangSmith is optional but highly recommended for production debugging. Without it, you're working with console logs when something goes wrong. LangSmith gives you visual traces showing exactly what happened inside each chain invocation, which saves hours of debugging time.

How do I build a custom LangChain callback for my monitoring system?

Subclass BaseCallbackHandler and override the event methods you need — on_llm_start, on_llm_end, on_chain_start, on_chain_end, on_tool_start, on_tool_end. Return None from each handler. Instantiate your callback and pass it in the callbacks parameter when invoking chains.

How LangChain Callbacks Work

Callbacks can be attached at three levels:

Constructor callbacks: Always active for that component
Runtime callbacks: Active only for that specific invocation
Global callbacks: Active for everything in the application

from langchain_openai import ChatOpenAI
from langchain_core.callbacks import StdOutCallbackHandler

# Constructor level — always active
llm = ChatOpenAI(
    model="gpt-4o-mini",
    callbacks=[StdOutCallbackHandler()]
)

# Runtime level — only for this invocation
result = llm.invoke("Hello", config={"callbacks": [StdOutCallbackHandler()]})

# Both work — choose based on whether you want permanent or one-off logging

Callback 1: StdOutCallbackHandler

The simplest callback. Prints everything to stdout. Perfect for development when you want to see exactly what's happening.

from langchain_core.callbacks import StdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_template("Answer this question: {question}")
parser = StrOutputParser()

# With verbose=True, you get detailed output
chain = prompt | llm | parser

result = chain.invoke(
    {"question": "What is LangChain?"},
    config={"callbacks": [StdOutCallbackHandler()]}
)

In practice, I prefer verbose=True on the chain components directly during development:

from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate

# verbose=True on the chain itself
chain = LLMChain(
    llm=OpenAI(temperature=0, verbose=True),
    prompt=PromptTemplate(input_variables=["q"], template="Answer: {q}"),
    verbose=True
)

Callback 2: FileCallbackHandler

When you need persistent logs — audit trails, debugging sessions, production logs — write to a file.

from langchain_community.callbacks import FileCallbackHandler
import logging
from datetime import datetime

# Create a timestamped log file
log_filename = f"langchain_logs_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"

file_callback = FileCallbackHandler(log_filename)

chain = prompt | llm | parser

# All chain events written to file
result = chain.invoke(
    {"question": "Explain neural networks"},
    config={"callbacks": [file_callback]}
)

print(f"Logs written to: {log_filename}")

For structured logging that integrates with your existing log infrastructure:

import logging
import json
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Union
from uuid import UUID

class StructuredFileCallbackHandler(BaseCallbackHandler):
    """Writes structured JSON logs for each chain event."""
    
    def __init__(self, log_file: str):
        self.log_file = log_file
        logging.basicConfig(
            filename=log_file,
            level=logging.INFO,
            format='%(message)s'
        )
        self.logger = logging.getLogger(__name__)
    
    def _log(self, event_type: str, data: dict):
        entry = {
            "timestamp": datetime.now().isoformat(),
            "event": event_type,
            **data
        }
        self.logger.info(json.dumps(entry))
    
    def on_llm_start(self, serialized: Dict, prompts: List[str], **kwargs):
        self._log("llm_start", {
            "model": serialized.get("kwargs", {}).get("model_name", "unknown"),
            "prompt_preview": prompts[0][:200] if prompts else ""
        })
    
    def on_llm_end(self, response: LLMResult, **kwargs):
        generations = response.generations
        output = generations[0][0].text if generations else ""
        token_usage = response.llm_output.get("token_usage", {}) if response.llm_output else {}
        
        self._log("llm_end", {
            "output_preview": output[:200],
            "total_tokens": token_usage.get("total_tokens", 0),
            "prompt_tokens": token_usage.get("prompt_tokens", 0),
            "completion_tokens": token_usage.get("completion_tokens", 0)
        })
    
    def on_chain_start(self, serialized: Dict, inputs: Dict, **kwargs):
        self._log("chain_start", {
            "chain_type": serialized.get("id", ["unknown"])[-1],
            "input_keys": list(inputs.keys())
        })
    
    def on_chain_end(self, outputs: Dict, **kwargs):
        self._log("chain_end", {"output_keys": list(outputs.keys())})
    
    def on_chain_error(self, error: Exception, **kwargs):
        self._log("chain_error", {"error": str(error), "error_type": type(error).__name__})
    
    def on_tool_start(self, serialized: Dict, input_str: str, **kwargs):
        self._log("tool_start", {
            "tool": serialized.get("name", "unknown"),
            "input": input_str[:200]
        })
    
    def on_tool_end(self, output: str, **kwargs):
        self._log("tool_end", {"output_preview": output[:200]})

# Use it
structured_logger = StructuredFileCallbackHandler("app_traces.jsonl")
result = chain.invoke(
    {"question": "What is LCEL?"},
    config={"callbacks": [structured_logger]}
)

Callback 3: LangSmith Tracing

LangSmith is the production-grade observability platform for LangChain. It gives you a visual trace explorer, latency analytics, cost tracking, and the ability to replay and debug specific runs.

pip install langsmith

import os

# Enable LangSmith tracing via environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-production-app"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"  # default

# That's it — all LangChain runs are now traced automatically
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {concept}") | llm | StrOutputParser()

# This run will appear in LangSmith
result = chain.invoke({"concept": "embeddings"})
print(result)

For more control over what gets traced:

from langsmith import Client, traceable
from langsmith.run_helpers import get_current_run_tree

client = Client()

# Tag specific functions as traceable
@traceable(name="document_qa", tags=["production", "qa"])
def answer_question(question: str, document_id: str) -> str:
    """This function will appear as a named node in LangSmith traces."""
    chain = ChatPromptTemplate.from_template("{question}") | llm | StrOutputParser()
    
    # Add metadata to the current run
    run_tree = get_current_run_tree()
    if run_tree:
        run_tree.extra["document_id"] = document_id
    
    return chain.invoke({"question": question})

result = answer_question("What is RAG?", "doc_123")

Creating datasets and running evaluations in LangSmith:

from langsmith import Client

client = Client()

# Create a dataset for evaluation
dataset = client.create_dataset(
    dataset_name="QA Evaluation Set",
    description="Test cases for document Q&A system"
)

# Add examples
examples = [
    {"question": "What is LangChain?", "expected": "LangChain is a framework..."},
    {"question": "What is LCEL?", "expected": "LCEL is LangChain Expression Language..."}
]

client.create_examples(
    inputs=[{"question": e["question"]} for e in examples],
    outputs=[{"answer": e["expected"]} for e in examples],
    dataset_id=dataset.id
)

print(f"Dataset created: {dataset.name}")

Callback 4: Custom Callback Class

When you need to integrate LangChain monitoring with your own systems — Datadog, Sentry, Prometheus, custom dashboards — build a custom callback handler.

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time

class MetricsCallbackHandler(BaseCallbackHandler):
    """
    Custom callback that collects metrics for monitoring.
    Tracks latency, token usage, and error rates.
    """
    
    def __init__(self, service_name: str = "langchain-app"):
        self.service_name = service_name
        self.metrics = {
            "total_calls": 0,
            "total_tokens": 0,
            "total_cost_estimate": 0.0,
            "error_count": 0,
            "latencies": []
        }
        self._start_times = {}
    
    def on_llm_start(
        self, 
        serialized: Dict[str, Any], 
        prompts: List[str], 
        run_id: Any = None,
        **kwargs: Any
    ) -> None:
        self._start_times[str(run_id)] = time.time()
        self.metrics["total_calls"] += 1
    
    def on_llm_end(
        self, 
        response: LLMResult, 
        run_id: Any = None,
        **kwargs: Any
    ) -> None:
        run_id_str = str(run_id)
        if run_id_str in self._start_times:
            latency = time.time() - self._start_times.pop(run_id_str)
            self.metrics["latencies"].append(latency)
        
        # Extract token usage
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            total_tokens = usage.get("total_tokens", 0)
            self.metrics["total_tokens"] += total_tokens
            
            # Rough cost estimate (gpt-4o-mini pricing)
            prompt_tokens = usage.get("prompt_tokens", 0)
            completion_tokens = usage.get("completion_tokens", 0)
            cost = (prompt_tokens * 0.00015 + completion_tokens * 0.0006) / 1000
            self.metrics["total_cost_estimate"] += cost
    
    def on_llm_error(self, error: Exception, **kwargs: Any) -> None:
        self.metrics["error_count"] += 1
        # Here you'd integrate with Sentry, PagerDuty, etc.
        print(f"[ALERT] LLM Error in {self.service_name}: {error}")
    
    def get_summary(self) -> dict:
        latencies = self.metrics["latencies"]
        avg_latency = sum(latencies) / len(latencies) if latencies else 0
        
        return {
            "service": self.service_name,
            "total_calls": self.metrics["total_calls"],
            "total_tokens": self.metrics["total_tokens"],
            "estimated_cost_usd": round(self.metrics["total_cost_estimate"], 4),
            "error_count": self.metrics["error_count"],
            "avg_latency_seconds": round(avg_latency, 3),
            "error_rate": self.metrics["error_count"] / max(self.metrics["total_calls"], 1)
        }

# Use across your application
metrics = MetricsCallbackHandler(service_name="document-qa-api")

llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[metrics])
chain = prompt | llm | StrOutputParser()

# Run multiple queries
questions = [
    "What is RAG?",
    "Explain LangChain LCEL",
    "What are output parsers?"
]

for q in questions:
    chain.invoke({"question": q})

print("\nMetrics Summary:")
import json
print(json.dumps(metrics.get_summary(), indent=2))

Callback 5: StreamingCallbackHandler for Real-Time Output

When users expect to see tokens as they're generated (like ChatGPT does), you need streaming. Here's a proper streaming callback:

from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Built-in streaming to stdout
streaming_llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

chain = ChatPromptTemplate.from_template("Write a short poem about {topic}") | streaming_llm | StrOutputParser()

# Tokens print as they arrive
print("Generating poem: ")
result = chain.invoke({"topic": "machine learning"})

For a custom streaming callback with more control:

from langchain_core.callbacks import BaseCallbackHandler
import queue
import threading

class QueueCallbackHandler(BaseCallbackHandler):
    """
    Streams tokens into a queue for consumption by another thread.
    Perfect for FastAPI Server-Sent Events or WebSocket streaming.
    """
    
    def __init__(self):
        self.token_queue = queue.Queue()
        self.done = False
    
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        """Called for each new token during streaming."""
        self.token_queue.put(token)
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        """Signal that streaming is complete."""
        self.token_queue.put(None)  # Sentinel value
        self.done = True
    
    def on_llm_error(self, error: Exception, **kwargs) -> None:
        self.token_queue.put(f"[ERROR: {error}]")
        self.token_queue.put(None)

def run_chain_streaming(question: str):
    """Run chain in thread, consume tokens from main thread."""
    callback = QueueCallbackHandler()
    
    streaming_llm = ChatOpenAI(
        model="gpt-4o-mini",
        streaming=True,
        callbacks=[callback]
    )
    
    chain = ChatPromptTemplate.from_template("{question}") | streaming_llm | StrOutputParser()
    
    # Run chain in background thread
    def run():
        chain.invoke({"question": question})
    
    thread = threading.Thread(target=run)
    thread.start()
    
    # Consume tokens from queue
    full_response = ""
    while True:
        token = callback.token_queue.get(timeout=30)
        if token is None:
            break
        full_response += token
        print(token, end="", flush=True)
    
    print()  # newline
    thread.join()
    return full_response

response = run_chain_streaming("Explain the attention mechanism in transformers.")

Callback 6: Token Usage Callback

Cost control is a production concern. This callback tracks token usage and can alert or block when limits are approached.

from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = ChatPromptTemplate.from_template("Explain {topic} in detail.") | llm | StrOutputParser()

# Context manager for token tracking
with get_openai_callback() as cb:
    result1 = chain.invoke({"topic": "transformers"})
    result2 = chain.invoke({"topic": "attention mechanisms"})
    result3 = chain.invoke({"topic": "RLHF"})
    
    print(f"Total tokens used: {cb.total_tokens}")
    print(f"Prompt tokens: {cb.prompt_tokens}")
    print(f"Completion tokens: {cb.completion_tokens}")
    print(f"Total cost: ${cb.total_cost:.4f}")

For a budget-enforcing callback:

from langchain_core.callbacks import BaseCallbackHandler

class BudgetCallbackHandler(BaseCallbackHandler):
    """Tracks costs and raises an exception when budget is exceeded."""
    
    # Pricing per 1M tokens (gpt-4o-mini as of 2025)
    PROMPT_COST_PER_1M = 0.15
    COMPLETION_COST_PER_1M = 0.60
    
    def __init__(self, budget_usd: float):
        self.budget_usd = budget_usd
        self.total_cost = 0.0
        self.total_tokens = 0
    
    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            prompt_tokens = usage.get("prompt_tokens", 0)
            completion_tokens = usage.get("completion_tokens", 0)
            
            cost = (
                prompt_tokens * self.PROMPT_COST_PER_1M / 1_000_000 +
                completion_tokens * self.COMPLETION_COST_PER_1M / 1_000_000
            )
            
            self.total_cost += cost
            self.total_tokens += usage.get("total_tokens", 0)
            
            if self.total_cost > self.budget_usd:
                raise RuntimeError(
                    f"Budget exceeded! Spent ${self.total_cost:.4f} of ${self.budget_usd} budget. "
                    f"Total tokens used: {self.total_tokens}"
                )
    
    @property
    def remaining_budget(self) -> float:
        return max(0, self.budget_usd - self.total_cost)

budget = BudgetCallbackHandler(budget_usd=0.10)  # $0.10 limit
llm_with_budget = ChatOpenAI(model="gpt-4o-mini", callbacks=[budget])

Callback 7: Async Callbacks

For high-throughput applications, async callbacks prevent the monitoring overhead from blocking your main application:

from langchain_core.callbacks import AsyncCallbackHandler
from langchain_core.outputs import LLMResult
import asyncio
import aiofiles
import json

class AsyncLoggingCallbackHandler(AsyncCallbackHandler):
    """Non-blocking async logging callback."""
    
    def __init__(self, log_file: str):
        self.log_file = log_file
    
    async def on_llm_start(
        self, 
        serialized: dict, 
        prompts: list, 
        **kwargs
    ) -> None:
        await self._write_log({
            "event": "llm_start",
            "model": serialized.get("kwargs", {}).get("model_name"),
            "timestamp": datetime.now().isoformat()
        })
    
    async def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        tokens = {}
        if response.llm_output:
            tokens = response.llm_output.get("token_usage", {})
        
        await self._write_log({
            "event": "llm_end",
            "tokens": tokens,
            "timestamp": datetime.now().isoformat()
        })
    
    async def _write_log(self, data: dict) -> None:
        async with aiofiles.open(self.log_file, "a") as f:
            await f.write(json.dumps(data) + "\n")

# Use with async chains
async def run_async():
    async_logger = AsyncLoggingCallbackHandler("async_logs.jsonl")
    llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[async_logger])
    chain = ChatPromptTemplate.from_template("{q}") | llm | StrOutputParser()
    
    # Run multiple queries concurrently
    tasks = [
        chain.ainvoke({"q": "What is LangChain?"}),
        chain.ainvoke({"q": "What is LCEL?"}),
        chain.ainvoke({"q": "What are callbacks?"}),
    ]
    
    results = await asyncio.gather(*tasks)
    return results

results = asyncio.run(run_async())

Comparison Table: LangChain Callback Options

Callback	Setup	Use Case	Performance	LangSmith Integration
StdOutCallbackHandler	Zero	Development debugging	No overhead	No
FileCallbackHandler	Minimal	Audit logs, debug sessions	Low	No
LangSmith Tracing	Env vars	Production monitoring	Low (async)	Yes, native
Custom BaseCallbackHandler	Medium	Custom monitoring, metrics	Varies	Manual
StreamingCallbackHandler	Low	Real-time UI streaming	None	No
Token Usage (get_openai_callback)	None	Cost tracking	Minimal	No
AsyncCallbackHandler	Medium	High-throughput production	Minimal	Manual

Conclusion

For the full picture of LangChain in production, pair this with LangChain tutorial 2025 and OpenAI API integration for the OpenAI-specific monitoring hooks.

Frequently Asked Questions

Do LangChain callbacks slow down my application?

Is LangSmith required for LangChain in production?

How do I build a custom LangChain callback for my monitoring system?

7 LangChain Callbacks: Logging, Tracing, and Streaming (2026)

How LangChain Callbacks Work

Callback 1: StdOutCallbackHandler

Callback 2: FileCallbackHandler

Callback 3: LangSmith Tracing

Callback 4: Custom Callback Class

Callback 5: StreamingCallbackHandler for Real-Time Output

Callback 6: Token Usage Callback

Callback 7: Async Callbacks

Comparison Table: LangChain Callback Options

Conclusion

Frequently Asked Questions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?

AutoGPT vs LangChain Agents: Which is More Autonomous?

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

Get Free AI Notes Daily

7 LangChain Callbacks: Logging, Tracing, and Streaming (2026)

How LangChain Callbacks Work

Callback 1: StdOutCallbackHandler

Callback 2: FileCallbackHandler

Callback 3: LangSmith Tracing

Callback 4: Custom Callback Class

Callback 5: StreamingCallbackHandler for Real-Time Output

Callback 6: Token Usage Callback

Callback 7: Async Callbacks

Comparison Table: LangChain Callback Options

Conclusion

Frequently Asked Questions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?

AutoGPT vs LangChain Agents: Which is More Autonomous?

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

Get Free AI Notes Daily