AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

cost monitoring dashboard showing AI token usage — LangChain token tracking callbacks

7 LangChain Callbacks for Token Usage and Cost Tracking

⚡ Quick Answer

Track OpenAI API spend with LangChain callbacks — get_openai_callback, custom cost trackers, per-chain breakdowns, budget alerts, and monthly cost estimation.

AiTechWorlds Team May 31, 2026 13 min read

#LangChain #callbacks #token usage #cost monitoring #OpenAI

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Token costs have a way of sneaking up on you. A RAG pipeline looks cheap in testing — a few cents per query — and then you deploy it, the volume ramps up, and your end-of-month bill arrives. Or you build an autonomous agent with a loop, forget to add a termination condition, and watch it make 200 API calls before you notice.

LangChain's callback system gives you the hooks to track every token that flows through your chains. This guide covers seven approaches: from the one-liner built-in tracker to production-grade systems with per-user cost attribution, budget alerts, and database logging.

Understanding the Callback System

Before the code: LangChain callbacks fire at specific lifecycle events. The ones relevant to cost tracking are:

on_llm_start — fires with the prompt before the API call
on_llm_end — fires with the response including token usage
on_chain_start / on_chain_end — fires for each chain step
on_tool_start / on_tool_end — fires when agents use tools

You can attach callbacks at three levels:

Global — on the ChatOpenAI instance (affects all calls from that LLM)
Chain — passed to .invoke() as {"callbacks": [...]}
Request — RunnableConfig passed at invocation time

Understanding which level is correct for your use case saves a lot of debugging.

Callback 1: get_openai_callback (Built-in, Zero Setup)

For quick cost checks during development, this is the fastest approach:

from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_template(
    "Write a brief analysis of {topic} in under 100 words."
)
chain = prompt | llm | StrOutputParser()

with get_openai_callback() as cb:
    result = chain.invoke({"topic": "serverless computing"})
    print("Response:", result[:150])

# Token counts and cost from the API response
print(f"\nToken Usage:")
print(f"  Prompt tokens:     {cb.prompt_tokens:,}")
print(f"  Completion tokens: {cb.completion_tokens:,}")
print(f"  Total tokens:      {cb.total_tokens:,}")
print(f"  Total cost:        ${cb.total_cost:.6f}")

The context manager accumulates totals across all calls within the block:

with get_openai_callback() as cb:
    # Multiple calls — all tracked together
    result1 = chain.invoke({"topic": "machine learning"})
    result2 = chain.invoke({"topic": "deep learning"})
    result3 = chain.invoke({"topic": "reinforcement learning"})

print(f"3 calls total: {cb.total_tokens:,} tokens = ${cb.total_cost:.6f}")
print(f"Average per call: {cb.total_tokens / 3:.0f} tokens")

Limitation: It only works with OpenAI and Azure OpenAI models. For other providers, you need the approaches below.

Callback 2: Custom Per-Model Cost Tracking Callback

A custom callback gives you cost tracking for any LLM provider and full control over what you log:

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time
import json

# Pricing table — update when providers change pricing
MODEL_PRICING = {
    "gpt-4o": {
        "prompt": 0.0025 / 1000,      # $2.50 per 1M tokens
        "completion": 0.010 / 1000    # $10 per 1M tokens
    },
    "gpt-4o-mini": {
        "prompt": 0.00015 / 1000,     # $0.15 per 1M tokens
        "completion": 0.0006 / 1000   # $0.60 per 1M tokens
    },
    "gpt-3.5-turbo": {
        "prompt": 0.0005 / 1000,      # $0.50 per 1M tokens
        "completion": 0.0015 / 1000   # $1.50 per 1M tokens
    },
    "text-embedding-3-small": {
        "prompt": 0.00002 / 1000,     # $0.02 per 1M tokens
        "completion": 0.0
    },
    "text-embedding-3-large": {
        "prompt": 0.00013 / 1000,     # $0.13 per 1M tokens
        "completion": 0.0
    },
    "claude-3-5-sonnet": {
        "prompt": 0.003 / 1000,       # $3 per 1M tokens
        "completion": 0.015 / 1000    # $15 per 1M tokens
    }
}

class CostTrackingCallback(BaseCallbackHandler):
    """Track token usage and compute cost across all LLM calls."""

    def __init__(self, name: str = "default"):
        super().__init__()
        self.name = name
        self.calls = []
        self.total_prompt_tokens = 0
        self.total_completion_tokens = 0
        self.total_cost = 0.0
        self._start_times = {}

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs
    ) -> None:
        run_id = kwargs.get("run_id", "unknown")
        self._start_times[str(run_id)] = time.time()

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        run_id = str(kwargs.get("run_id", "unknown"))
        elapsed = time.time() - self._start_times.pop(run_id, time.time())

        # Extract model name from response metadata
        model = "unknown"
        if response.llm_output and "model_name" in response.llm_output:
            model = response.llm_output["model_name"]

        # Extract token usage
        usage = {}
        if response.llm_output and "token_usage" in response.llm_output:
            usage = response.llm_output["token_usage"]
        
        prompt_tokens = usage.get("prompt_tokens", 0)
        completion_tokens = usage.get("completion_tokens", 0)
        total_tokens = usage.get("total_tokens", prompt_tokens + completion_tokens)

        # Compute cost
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
        call_cost = (
            prompt_tokens * pricing["prompt"] +
            completion_tokens * pricing["completion"]
        )

        # Record this call
        call_record = {
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "total_tokens": total_tokens,
            "cost": call_cost,
            "latency_seconds": round(elapsed, 3),
            "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
        }
        self.calls.append(call_record)

        # Update totals
        self.total_prompt_tokens += prompt_tokens
        self.total_completion_tokens += completion_tokens
        self.total_cost += call_cost

    @property
    def summary(self) -> dict:
        return {
            "tracker_name": self.name,
            "total_calls": len(self.calls),
            "total_prompt_tokens": self.total_prompt_tokens,
            "total_completion_tokens": self.total_completion_tokens,
            "total_tokens": self.total_prompt_tokens + self.total_completion_tokens,
            "total_cost_usd": round(self.total_cost, 6),
            "avg_cost_per_call": round(self.total_cost / len(self.calls), 6) if self.calls else 0.0
        }

    def reset(self):
        self.calls.clear()
        self.total_prompt_tokens = 0
        self.total_completion_tokens = 0
        self.total_cost = 0.0

Using the callback:

tracker = CostTrackingCallback(name="content_pipeline")

llm_with_tracker = ChatOpenAI(
    model="gpt-4o-mini",
    callbacks=[tracker]
)

chain = (
    ChatPromptTemplate.from_template("Explain {concept} in simple terms.")
    | llm_with_tracker
    | StrOutputParser()
)

for concept in ["neural networks", "transformers", "reinforcement learning"]:
    result = chain.invoke({"concept": concept})
    print(f"{concept}: {result[:80]}...")

print("\nCost Summary:")
print(json.dumps(tracker.summary, indent=2))

Callback 3: Per-Request Cost Attribution

In multi-user applications, you need to know which user incurred which costs:

from langchain_core.runnables import RunnableConfig
from typing import Dict
import threading

class PerUserCostTracker(BaseCallbackHandler):
    """Thread-safe per-user cost tracking."""

    def __init__(self):
        super().__init__()
        self._lock = threading.Lock()
        self._user_costs: Dict[str, dict] = {}
        self._call_contexts = {}

    def on_llm_start(self, serialized, prompts, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        # User ID comes from metadata in RunnableConfig
        user_id = kwargs.get("metadata", {}).get("user_id", "anonymous")
        self._call_contexts[run_id] = {"user_id": user_id, "start": time.time()}

    def on_llm_end(self, response: LLMResult, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        context = self._call_contexts.pop(run_id, {})
        user_id = context.get("user_id", "anonymous")

        usage = {}
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})

        model = response.llm_output.get("model_name", "unknown") if response.llm_output else "unknown"
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})

        call_cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )

        with self._lock:
            if user_id not in self._user_costs:
                self._user_costs[user_id] = {"calls": 0, "tokens": 0, "cost": 0.0}
            self._user_costs[user_id]["calls"] += 1
            self._user_costs[user_id]["tokens"] += usage.get("total_tokens", 0)
            self._user_costs[user_id]["cost"] += call_cost

    def get_user_cost(self, user_id: str) -> dict:
        with self._lock:
            return self._user_costs.get(user_id, {"calls": 0, "tokens": 0, "cost": 0.0})

    def get_all_user_costs(self) -> dict:
        with self._lock:
            return dict(self._user_costs)

# Attach to LLM globally
user_tracker = PerUserCostTracker()
tracked_llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[user_tracker])
chain = ChatPromptTemplate.from_template("Answer: {q}") | tracked_llm | StrOutputParser()

# Simulate multiple users
for user_id, question in [
    ("user_alice", "What is machine learning?"),
    ("user_bob", "Explain neural networks in detail with several examples please"),
    ("user_alice", "What is deep learning?"),
    ("user_charlie", "Brief overview of AI"),
]:
    config = RunnableConfig(metadata={"user_id": user_id})
    chain.invoke({"q": question}, config=config)

print("Per-user costs:")
for uid, costs in user_tracker.get_all_user_costs().items():
    print(f"  {uid}: {costs['calls']} calls, {costs['tokens']:,} tokens, ${costs['cost']:.6f}")

Callback 4: Budget Alert Callback

Stop an agent before it blows through a cost limit:

class BudgetAlertCallback(BaseCallbackHandler):
    """Raise an exception when spending exceeds a budget threshold."""

    def __init__(self, budget_usd: float, alert_at_percent: float = 0.8):
        super().__init__()
        self.budget = budget_usd
        self.alert_threshold = budget_usd * alert_at_percent
        self.spent = 0.0
        self.alerts_sent = []

    def on_llm_end(self, response: LLMResult, **kwargs):
        if not response.llm_output:
            return

        usage = response.llm_output.get("token_usage", {})
        model = response.llm_output.get("model_name", "gpt-4o-mini")
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})

        call_cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        self.spent += call_cost

        # Alert at threshold
        if self.spent >= self.alert_threshold and not self.alerts_sent:
            alert_msg = (
                f"BUDGET ALERT: Spent ${self.spent:.4f} of ${self.budget:.2f} budget "
                f"({self.spent/self.budget*100:.1f}%)"
            )
            self.alerts_sent.append(alert_msg)
            print(f"WARNING: {alert_msg}")

        # Hard stop at budget
        if self.spent >= self.budget:
            raise RuntimeError(
                f"Budget exceeded: spent ${self.spent:.4f} against ${self.budget:.2f} limit. "
                "Halting execution."
            )

# Usage with a tight budget for testing
budget_tracker = BudgetAlertCallback(budget_usd=0.01, alert_at_percent=0.5)

try:
    expensive_llm = ChatOpenAI(model="gpt-4o", callbacks=[budget_tracker])
    chain = ChatPromptTemplate.from_template("{q}") | expensive_llm | StrOutputParser()
    
    for question in ["Explain AI", "Explain ML", "Explain DL", "Explain RL", "Explain NLP"]:
        result = chain.invoke({"q": question})
        print(f"Q: {question} | Spent so far: ${budget_tracker.spent:.6f}")

except RuntimeError as e:
    print(f"Stopped: {e}")

Callback 5: Structured Logging to Database

For production monitoring, push cost events to a database:

import sqlite3
from dataclasses import dataclass
import threading

class DatabaseCostLogger(BaseCallbackHandler):
    """Log token usage to SQLite for analysis and billing."""

    def __init__(self, db_path: str = "llm_costs.db"):
        super().__init__()
        self.db_path = db_path
        self._local = threading.local()
        self._setup_db()
        self._pending_starts = {}

    def _get_conn(self):
        if not hasattr(self._local, "conn"):
            self._local.conn = sqlite3.connect(self.db_path)
        return self._local.conn

    def _setup_db(self):
        conn = sqlite3.connect(self.db_path)
        conn.execute("""
            CREATE TABLE IF NOT EXISTS llm_calls (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT NOT NULL,
                model TEXT,
                prompt_tokens INTEGER,
                completion_tokens INTEGER,
                total_tokens INTEGER,
                cost_usd REAL,
                latency_seconds REAL,
                user_id TEXT,
                chain_name TEXT,
                session_id TEXT
            )
        """)
        conn.commit()
        conn.close()

    def on_llm_start(self, serialized, prompts, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        self._pending_starts[run_id] = {
            "start_time": time.time(),
            "user_id": kwargs.get("metadata", {}).get("user_id", "system"),
            "chain_name": kwargs.get("metadata", {}).get("chain_name", "unknown"),
            "session_id": kwargs.get("metadata", {}).get("session_id", ""),
        }

    def on_llm_end(self, response: LLMResult, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        pending = self._pending_starts.pop(run_id, {})
        
        elapsed = time.time() - pending.get("start_time", time.time())
        
        usage = {}
        model = "unknown"
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            model = response.llm_output.get("model_name", "unknown")
        
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
        cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        
        conn = self._get_conn()
        conn.execute("""
            INSERT INTO llm_calls 
            (timestamp, model, prompt_tokens, completion_tokens, total_tokens,
             cost_usd, latency_seconds, user_id, chain_name, session_id)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            model,
            usage.get("prompt_tokens", 0),
            usage.get("completion_tokens", 0),
            usage.get("total_tokens", 0),
            round(cost, 8),
            round(elapsed, 3),
            pending.get("user_id", "system"),
            pending.get("chain_name", "unknown"),
            pending.get("session_id", "")
        ))
        conn.commit()

    def get_daily_report(self) -> dict:
        conn = self._get_conn()
        today = time.strftime("%Y-%m-%d")
        row = conn.execute("""
            SELECT 
                COUNT(*) as calls,
                SUM(total_tokens) as tokens,
                SUM(cost_usd) as total_cost,
                AVG(latency_seconds) as avg_latency
            FROM llm_calls 
            WHERE timestamp LIKE ?
        """, (f"{today}%",)).fetchone()
        
        return {
            "date": today,
            "calls": row[0] or 0,
            "total_tokens": row[1] or 0,
            "total_cost_usd": round(row[2] or 0, 6),
            "avg_latency_seconds": round(row[3] or 0, 3)
        }

Callback 6: Chain-Level Cost Breakdown

When you have a complex multi-step chain, knowing which step costs the most is essential:

from langchain_core.callbacks import BaseCallbackHandler
from collections import defaultdict

class ChainCostBreakdown(BaseCallbackHandler):
    """Track costs per chain step for pipeline optimization."""

    def __init__(self):
        super().__init__()
        self.step_costs = defaultdict(lambda: {"calls": 0, "tokens": 0, "cost": 0.0})
        self._active_chains = {}

    def on_chain_start(self, serialized, inputs, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        chain_name = serialized.get("name", serialized.get("id", ["unknown"])[-1])
        self._active_chains[run_id] = chain_name

    def on_llm_end(self, response: LLMResult, **kwargs):
        parent_run_id = str(kwargs.get("parent_run_id", ""))
        chain_name = self._active_chains.get(parent_run_id, "root_llm")
        
        usage = {}
        model = "unknown"
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            model = response.llm_output.get("model_name", "unknown")
        
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})
        cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        
        self.step_costs[chain_name]["calls"] += 1
        self.step_costs[chain_name]["tokens"] += usage.get("total_tokens", 0)
        self.step_costs[chain_name]["cost"] += cost

    def print_breakdown(self):
        print("\nChain Cost Breakdown:")
        print(f"{'Step':<30} {'Calls':>6} {'Tokens':>10} {'Cost':>12}")
        print("-" * 62)
        
        total_cost = sum(v["cost"] for v in self.step_costs.values())
        
        for step, data in sorted(self.step_costs.items(), key=lambda x: -x[1]["cost"]):
            pct = data["cost"] / total_cost * 100 if total_cost > 0 else 0
            print(f"{step:<30} {data['calls']:>6} {data['tokens']:>10,} "
                  f"${data['cost']:>10.6f} ({pct:.1f}%)")
        
        print("-" * 62)
        print(f"{'TOTAL':<30} {'':>6} {sum(v['tokens'] for v in self.step_costs.values()):>10,} "
              f"${total_cost:>10.6f}")

Callback 7: Monthly Cost Projection

Project your monthly bill based on recent usage patterns:

import sqlite3
from datetime import datetime, timedelta

def project_monthly_cost(db_path: str = "llm_costs.db") -> dict:
    """Project monthly cost based on the last 7 days of usage."""
    conn = sqlite3.connect(db_path)
    
    seven_days_ago = (datetime.utcnow() - timedelta(days=7)).strftime("%Y-%m-%dT%H:%M:%SZ")
    
    daily_data = conn.execute("""
        SELECT 
            DATE(timestamp) as date,
            SUM(cost_usd) as daily_cost,
            SUM(total_tokens) as daily_tokens,
            COUNT(*) as daily_calls
        FROM llm_calls
        WHERE timestamp > ?
        GROUP BY DATE(timestamp)
        ORDER BY date
    """, (seven_days_ago,)).fetchall()
    
    conn.close()
    
    if not daily_data:
        return {"error": "No usage data found in the last 7 days"}
    
    daily_costs = [row[1] for row in daily_data]
    avg_daily_cost = sum(daily_costs) / len(daily_costs)
    projected_monthly = avg_daily_cost * 30
    
    return {
        "days_analyzed": len(daily_data),
        "avg_daily_cost_usd": round(avg_daily_cost, 4),
        "total_7d_cost_usd": round(sum(daily_costs), 4),
        "projected_monthly_usd": round(projected_monthly, 2),
        "daily_breakdown": [
            {"date": row[0], "cost": round(row[1], 4), 
             "tokens": row[2], "calls": row[3]}
            for row in daily_data
        ]
    }

projection = project_monthly_cost()
print(f"Projected monthly cost: ${projection.get('projected_monthly_usd', 0):.2f}")

OpenAI Model Pricing Reference (2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
gpt-4o	$2.50	$10.00	Complex reasoning, production
gpt-4o-mini	$0.15	$0.60	Classification, fast tasks
gpt-3.5-turbo	$0.50	$1.50	Simple Q&A, legacy apps
text-embedding-3-small	$0.02	—	High-volume embedding
text-embedding-3-large	$0.13	—	High-accuracy embedding
o1-mini	$3.00	$12.00	Multi-step reasoning
o1	$15.00	$60.00	Complex research tasks

The dual-model pattern (cheap model for classification, quality model for generation) described in the Build AI chatbot Python guide can cut costs by 60-80% for high-volume support applications.

Combining Callbacks: Production Setup

from langchain_openai import ChatOpenAI

def create_tracked_llm(
    model: str = "gpt-4o-mini",
    budget_usd: Optional[float] = None,
    user_id: Optional[str] = None
) -> ChatOpenAI:
    """Create an LLM with full cost tracking configured."""
    
    callbacks = [
        CostTrackingCallback(name=f"{model}_tracker"),
        DatabaseCostLogger(db_path="production_costs.db"),
    ]
    
    if budget_usd is not None:
        callbacks.append(BudgetAlertCallback(budget_usd=budget_usd))
    
    return ChatOpenAI(
        model=model,
        callbacks=callbacks,
        metadata={"user_id": user_id or "system"}
    )

# Typical production usage
llm = create_tracked_llm(
    model="gpt-4o-mini",
    budget_usd=1.00,  # $1 per agent run
    user_id="prod_pipeline"
)

chain = (
    ChatPromptTemplate.from_template("Analyze: {input}")
    | llm
    | StrOutputParser()
)

result = chain.invoke({"input": "the latest trends in AI agent development"})

This setup pairs well with the OpenAI API integration guide which covers API key management and rate limiting. For agents that use tools and might run many iterations, combining budget callbacks with the checkpointing patterns from LangChain checkpointers helps you resume from checkpoints without re-paying for already-completed steps.

Testing Callbacks Without Incurring Costs

from langchain_core.outputs import LLMResult, ChatGeneration
from langchain_core.messages import AIMessage
import uuid

def simulate_llm_callback(callback: BaseCallbackHandler, model: str, 
                           prompt_tokens: int, completion_tokens: int):
    """Simulate an LLM call for testing callbacks."""
    run_id = uuid.uuid4()
    
    # Simulate start
    callback.on_llm_start(
        serialized={"name": "ChatOpenAI"},
        prompts=["test prompt"],
        run_id=run_id
    )
    
    # Simulate end with fake usage data
    fake_response = LLMResult(
        generations=[[ChatGeneration(message=AIMessage(content="test response"))]],
        llm_output={
            "model_name": model,
            "token_usage": {
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": prompt_tokens + completion_tokens
            }
        }
    )
    callback.on_llm_end(fake_response, run_id=run_id)

# Test your callback without API calls
tracker = CostTrackingCallback(name="test")
simulate_llm_callback(tracker, "gpt-4o-mini", 150, 200)
simulate_llm_callback(tracker, "gpt-4o", 500, 300)

print(tracker.summary)

Key Takeaways

The seven callbacks in this guide cover the full spectrum from quick one-liners to production-grade monitoring systems. get_openai_callback is your fastest debugging tool. CostTrackingCallback gives you provider-agnostic tracking with configurable pricing tables. BudgetAlertCallback prevents runaway agent loops from generating surprise bills. DatabaseCostLogger gives you the audit trail and query capability that operational teams need.

The most important pattern is combining them: use PerUserCostTracker for billing attribution, BudgetAlertCallback for safety, and DatabaseCostLogger for historical analysis — all attached to the same LLM instance. None of these callbacks add measurable latency; they only execute after the API call completes.

For the RAG architectures discussed in the RAG system tutorial and for the agent patterns in Build AI agent with LangChain, these callbacks integrate at the LLM level and capture every token across every chain step automatically.

Frequently Asked Questions

Does get_openai_callback work with GPT-4o and GPT-4o-mini? Yes. get_openai_callback uses the token counts reported directly by the OpenAI API, so it works with any model that returns usage data — GPT-4o, GPT-4o-mini, GPT-3.5-turbo, and the embedding models.

Can I track costs for non-OpenAI models? Yes. Build a custom callback by subclassing BaseCallbackHandler and implementing on_llm_end. The response object contains the token counts reported by any model that provides them. For models that don't report tokens, you can use tiktoken to estimate costs from the prompt and response text.

How do I set a hard spending limit that stops the agent? Use a custom callback that raises a BudgetExceededException inside on_llm_start or on_llm_end. When the cumulative cost exceeds your limit, the exception propagates up and halts the chain. Wrap your agent invocation in a try/except to handle it gracefully.

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

Yes. get_openai_callback uses the token counts reported directly by the OpenAI API, so it works with any model that returns usage data — GPT-4o, GPT-4o-mini, GPT-3.5-turbo, and the embedding models.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

search relevance ranking showing scores — LangChain advanced RAG retrieval strategies

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

AI agent architecture with memory and tool connections — LangChain agent memory tools

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

developer coding AI agent decision loop — LangChain agent types ZeroShot ReAct Conversational

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

FastAPI server running LangChain endpoint — deploy LangChain FastAPI REST streaming

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

BookBuilding AI Apps: Developer's Guide NotesAI Agent Development Notes NotesRAG: Retrieval-Augmented Generation Guide BookChatGPT Mastery Guide BookAI Agent Development Guide CourseAI Agent Development Course

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Langchain

7 LangChain Callbacks for Token Usage and Cost Tracking

⚡ Quick Answer

Track OpenAI API spend with LangChain callbacks — get_openai_callback, custom cost trackers, per-chain breakdowns, budget alerts, and monthly cost estimation.

AiTechWorlds Team May 31, 2026 13 min read

#LangChain #callbacks #token usage #cost monitoring #OpenAI

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Understanding the Callback System

Before the code: LangChain callbacks fire at specific lifecycle events. The ones relevant to cost tracking are:

on_llm_start — fires with the prompt before the API call
on_llm_end — fires with the response including token usage
on_chain_start / on_chain_end — fires for each chain step
on_tool_start / on_tool_end — fires when agents use tools

You can attach callbacks at three levels:

Global — on the ChatOpenAI instance (affects all calls from that LLM)
Chain — passed to .invoke() as {"callbacks": [...]}
Request — RunnableConfig passed at invocation time

Understanding which level is correct for your use case saves a lot of debugging.

Callback 1: get_openai_callback (Built-in, Zero Setup)

For quick cost checks during development, this is the fastest approach:

from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini")

prompt = ChatPromptTemplate.from_template(
    "Write a brief analysis of {topic} in under 100 words."
)
chain = prompt | llm | StrOutputParser()

with get_openai_callback() as cb:
    result = chain.invoke({"topic": "serverless computing"})
    print("Response:", result[:150])

# Token counts and cost from the API response
print(f"\nToken Usage:")
print(f"  Prompt tokens:     {cb.prompt_tokens:,}")
print(f"  Completion tokens: {cb.completion_tokens:,}")
print(f"  Total tokens:      {cb.total_tokens:,}")
print(f"  Total cost:        ${cb.total_cost:.6f}")

The context manager accumulates totals across all calls within the block:

with get_openai_callback() as cb:
    # Multiple calls — all tracked together
    result1 = chain.invoke({"topic": "machine learning"})
    result2 = chain.invoke({"topic": "deep learning"})
    result3 = chain.invoke({"topic": "reinforcement learning"})

print(f"3 calls total: {cb.total_tokens:,} tokens = ${cb.total_cost:.6f}")
print(f"Average per call: {cb.total_tokens / 3:.0f} tokens")

Limitation: It only works with OpenAI and Azure OpenAI models. For other providers, you need the approaches below.

Callback 2: Custom Per-Model Cost Tracking Callback

A custom callback gives you cost tracking for any LLM provider and full control over what you log:

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
from typing import Any, Dict, List, Optional, Union
import time
import json

# Pricing table — update when providers change pricing
MODEL_PRICING = {
    "gpt-4o": {
        "prompt": 0.0025 / 1000,      # $2.50 per 1M tokens
        "completion": 0.010 / 1000    # $10 per 1M tokens
    },
    "gpt-4o-mini": {
        "prompt": 0.00015 / 1000,     # $0.15 per 1M tokens
        "completion": 0.0006 / 1000   # $0.60 per 1M tokens
    },
    "gpt-3.5-turbo": {
        "prompt": 0.0005 / 1000,      # $0.50 per 1M tokens
        "completion": 0.0015 / 1000   # $1.50 per 1M tokens
    },
    "text-embedding-3-small": {
        "prompt": 0.00002 / 1000,     # $0.02 per 1M tokens
        "completion": 0.0
    },
    "text-embedding-3-large": {
        "prompt": 0.00013 / 1000,     # $0.13 per 1M tokens
        "completion": 0.0
    },
    "claude-3-5-sonnet": {
        "prompt": 0.003 / 1000,       # $3 per 1M tokens
        "completion": 0.015 / 1000    # $15 per 1M tokens
    }
}

class CostTrackingCallback(BaseCallbackHandler):
    """Track token usage and compute cost across all LLM calls."""

    def __init__(self, name: str = "default"):
        super().__init__()
        self.name = name
        self.calls = []
        self.total_prompt_tokens = 0
        self.total_completion_tokens = 0
        self.total_cost = 0.0
        self._start_times = {}

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs
    ) -> None:
        run_id = kwargs.get("run_id", "unknown")
        self._start_times[str(run_id)] = time.time()

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        run_id = str(kwargs.get("run_id", "unknown"))
        elapsed = time.time() - self._start_times.pop(run_id, time.time())

        # Extract model name from response metadata
        model = "unknown"
        if response.llm_output and "model_name" in response.llm_output:
            model = response.llm_output["model_name"]

        # Extract token usage
        usage = {}
        if response.llm_output and "token_usage" in response.llm_output:
            usage = response.llm_output["token_usage"]
        
        prompt_tokens = usage.get("prompt_tokens", 0)
        completion_tokens = usage.get("completion_tokens", 0)
        total_tokens = usage.get("total_tokens", prompt_tokens + completion_tokens)

        # Compute cost
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
        call_cost = (
            prompt_tokens * pricing["prompt"] +
            completion_tokens * pricing["completion"]
        )

        # Record this call
        call_record = {
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "total_tokens": total_tokens,
            "cost": call_cost,
            "latency_seconds": round(elapsed, 3),
            "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
        }
        self.calls.append(call_record)

        # Update totals
        self.total_prompt_tokens += prompt_tokens
        self.total_completion_tokens += completion_tokens
        self.total_cost += call_cost

    @property
    def summary(self) -> dict:
        return {
            "tracker_name": self.name,
            "total_calls": len(self.calls),
            "total_prompt_tokens": self.total_prompt_tokens,
            "total_completion_tokens": self.total_completion_tokens,
            "total_tokens": self.total_prompt_tokens + self.total_completion_tokens,
            "total_cost_usd": round(self.total_cost, 6),
            "avg_cost_per_call": round(self.total_cost / len(self.calls), 6) if self.calls else 0.0
        }

    def reset(self):
        self.calls.clear()
        self.total_prompt_tokens = 0
        self.total_completion_tokens = 0
        self.total_cost = 0.0

Using the callback:

tracker = CostTrackingCallback(name="content_pipeline")

llm_with_tracker = ChatOpenAI(
    model="gpt-4o-mini",
    callbacks=[tracker]
)

chain = (
    ChatPromptTemplate.from_template("Explain {concept} in simple terms.")
    | llm_with_tracker
    | StrOutputParser()
)

for concept in ["neural networks", "transformers", "reinforcement learning"]:
    result = chain.invoke({"concept": concept})
    print(f"{concept}: {result[:80]}...")

print("\nCost Summary:")
print(json.dumps(tracker.summary, indent=2))

Callback 3: Per-Request Cost Attribution

In multi-user applications, you need to know which user incurred which costs:

from langchain_core.runnables import RunnableConfig
from typing import Dict
import threading

class PerUserCostTracker(BaseCallbackHandler):
    """Thread-safe per-user cost tracking."""

    def __init__(self):
        super().__init__()
        self._lock = threading.Lock()
        self._user_costs: Dict[str, dict] = {}
        self._call_contexts = {}

    def on_llm_start(self, serialized, prompts, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        # User ID comes from metadata in RunnableConfig
        user_id = kwargs.get("metadata", {}).get("user_id", "anonymous")
        self._call_contexts[run_id] = {"user_id": user_id, "start": time.time()}

    def on_llm_end(self, response: LLMResult, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        context = self._call_contexts.pop(run_id, {})
        user_id = context.get("user_id", "anonymous")

        usage = {}
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})

        model = response.llm_output.get("model_name", "unknown") if response.llm_output else "unknown"
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})

        call_cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )

        with self._lock:
            if user_id not in self._user_costs:
                self._user_costs[user_id] = {"calls": 0, "tokens": 0, "cost": 0.0}
            self._user_costs[user_id]["calls"] += 1
            self._user_costs[user_id]["tokens"] += usage.get("total_tokens", 0)
            self._user_costs[user_id]["cost"] += call_cost

    def get_user_cost(self, user_id: str) -> dict:
        with self._lock:
            return self._user_costs.get(user_id, {"calls": 0, "tokens": 0, "cost": 0.0})

    def get_all_user_costs(self) -> dict:
        with self._lock:
            return dict(self._user_costs)

# Attach to LLM globally
user_tracker = PerUserCostTracker()
tracked_llm = ChatOpenAI(model="gpt-4o-mini", callbacks=[user_tracker])
chain = ChatPromptTemplate.from_template("Answer: {q}") | tracked_llm | StrOutputParser()

# Simulate multiple users
for user_id, question in [
    ("user_alice", "What is machine learning?"),
    ("user_bob", "Explain neural networks in detail with several examples please"),
    ("user_alice", "What is deep learning?"),
    ("user_charlie", "Brief overview of AI"),
]:
    config = RunnableConfig(metadata={"user_id": user_id})
    chain.invoke({"q": question}, config=config)

print("Per-user costs:")
for uid, costs in user_tracker.get_all_user_costs().items():
    print(f"  {uid}: {costs['calls']} calls, {costs['tokens']:,} tokens, ${costs['cost']:.6f}")

Callback 4: Budget Alert Callback

Stop an agent before it blows through a cost limit:

class BudgetAlertCallback(BaseCallbackHandler):
    """Raise an exception when spending exceeds a budget threshold."""

    def __init__(self, budget_usd: float, alert_at_percent: float = 0.8):
        super().__init__()
        self.budget = budget_usd
        self.alert_threshold = budget_usd * alert_at_percent
        self.spent = 0.0
        self.alerts_sent = []

    def on_llm_end(self, response: LLMResult, **kwargs):
        if not response.llm_output:
            return

        usage = response.llm_output.get("token_usage", {})
        model = response.llm_output.get("model_name", "gpt-4o-mini")
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})

        call_cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        self.spent += call_cost

        # Alert at threshold
        if self.spent >= self.alert_threshold and not self.alerts_sent:
            alert_msg = (
                f"BUDGET ALERT: Spent ${self.spent:.4f} of ${self.budget:.2f} budget "
                f"({self.spent/self.budget*100:.1f}%)"
            )
            self.alerts_sent.append(alert_msg)
            print(f"WARNING: {alert_msg}")

        # Hard stop at budget
        if self.spent >= self.budget:
            raise RuntimeError(
                f"Budget exceeded: spent ${self.spent:.4f} against ${self.budget:.2f} limit. "
                "Halting execution."
            )

# Usage with a tight budget for testing
budget_tracker = BudgetAlertCallback(budget_usd=0.01, alert_at_percent=0.5)

try:
    expensive_llm = ChatOpenAI(model="gpt-4o", callbacks=[budget_tracker])
    chain = ChatPromptTemplate.from_template("{q}") | expensive_llm | StrOutputParser()
    
    for question in ["Explain AI", "Explain ML", "Explain DL", "Explain RL", "Explain NLP"]:
        result = chain.invoke({"q": question})
        print(f"Q: {question} | Spent so far: ${budget_tracker.spent:.6f}")

except RuntimeError as e:
    print(f"Stopped: {e}")

Callback 5: Structured Logging to Database

For production monitoring, push cost events to a database:

import sqlite3
from dataclasses import dataclass
import threading

class DatabaseCostLogger(BaseCallbackHandler):
    """Log token usage to SQLite for analysis and billing."""

    def __init__(self, db_path: str = "llm_costs.db"):
        super().__init__()
        self.db_path = db_path
        self._local = threading.local()
        self._setup_db()
        self._pending_starts = {}

    def _get_conn(self):
        if not hasattr(self._local, "conn"):
            self._local.conn = sqlite3.connect(self.db_path)
        return self._local.conn

    def _setup_db(self):
        conn = sqlite3.connect(self.db_path)
        conn.execute("""
            CREATE TABLE IF NOT EXISTS llm_calls (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT NOT NULL,
                model TEXT,
                prompt_tokens INTEGER,
                completion_tokens INTEGER,
                total_tokens INTEGER,
                cost_usd REAL,
                latency_seconds REAL,
                user_id TEXT,
                chain_name TEXT,
                session_id TEXT
            )
        """)
        conn.commit()
        conn.close()

    def on_llm_start(self, serialized, prompts, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        self._pending_starts[run_id] = {
            "start_time": time.time(),
            "user_id": kwargs.get("metadata", {}).get("user_id", "system"),
            "chain_name": kwargs.get("metadata", {}).get("chain_name", "unknown"),
            "session_id": kwargs.get("metadata", {}).get("session_id", ""),
        }

    def on_llm_end(self, response: LLMResult, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        pending = self._pending_starts.pop(run_id, {})
        
        elapsed = time.time() - pending.get("start_time", time.time())
        
        usage = {}
        model = "unknown"
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            model = response.llm_output.get("model_name", "unknown")
        
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0, "completion": 0.0})
        cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        
        conn = self._get_conn()
        conn.execute("""
            INSERT INTO llm_calls 
            (timestamp, model, prompt_tokens, completion_tokens, total_tokens,
             cost_usd, latency_seconds, user_id, chain_name, session_id)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            model,
            usage.get("prompt_tokens", 0),
            usage.get("completion_tokens", 0),
            usage.get("total_tokens", 0),
            round(cost, 8),
            round(elapsed, 3),
            pending.get("user_id", "system"),
            pending.get("chain_name", "unknown"),
            pending.get("session_id", "")
        ))
        conn.commit()

    def get_daily_report(self) -> dict:
        conn = self._get_conn()
        today = time.strftime("%Y-%m-%d")
        row = conn.execute("""
            SELECT 
                COUNT(*) as calls,
                SUM(total_tokens) as tokens,
                SUM(cost_usd) as total_cost,
                AVG(latency_seconds) as avg_latency
            FROM llm_calls 
            WHERE timestamp LIKE ?
        """, (f"{today}%",)).fetchone()
        
        return {
            "date": today,
            "calls": row[0] or 0,
            "total_tokens": row[1] or 0,
            "total_cost_usd": round(row[2] or 0, 6),
            "avg_latency_seconds": round(row[3] or 0, 3)
        }

Callback 6: Chain-Level Cost Breakdown

When you have a complex multi-step chain, knowing which step costs the most is essential:

from langchain_core.callbacks import BaseCallbackHandler
from collections import defaultdict

class ChainCostBreakdown(BaseCallbackHandler):
    """Track costs per chain step for pipeline optimization."""

    def __init__(self):
        super().__init__()
        self.step_costs = defaultdict(lambda: {"calls": 0, "tokens": 0, "cost": 0.0})
        self._active_chains = {}

    def on_chain_start(self, serialized, inputs, **kwargs):
        run_id = str(kwargs.get("run_id", ""))
        chain_name = serialized.get("name", serialized.get("id", ["unknown"])[-1])
        self._active_chains[run_id] = chain_name

    def on_llm_end(self, response: LLMResult, **kwargs):
        parent_run_id = str(kwargs.get("parent_run_id", ""))
        chain_name = self._active_chains.get(parent_run_id, "root_llm")
        
        usage = {}
        model = "unknown"
        if response.llm_output:
            usage = response.llm_output.get("token_usage", {})
            model = response.llm_output.get("model_name", "unknown")
        
        pricing = MODEL_PRICING.get(model, {"prompt": 0.0001 / 1000, "completion": 0.0003 / 1000})
        cost = (
            usage.get("prompt_tokens", 0) * pricing["prompt"] +
            usage.get("completion_tokens", 0) * pricing["completion"]
        )
        
        self.step_costs[chain_name]["calls"] += 1
        self.step_costs[chain_name]["tokens"] += usage.get("total_tokens", 0)
        self.step_costs[chain_name]["cost"] += cost

    def print_breakdown(self):
        print("\nChain Cost Breakdown:")
        print(f"{'Step':<30} {'Calls':>6} {'Tokens':>10} {'Cost':>12}")
        print("-" * 62)
        
        total_cost = sum(v["cost"] for v in self.step_costs.values())
        
        for step, data in sorted(self.step_costs.items(), key=lambda x: -x[1]["cost"]):
            pct = data["cost"] / total_cost * 100 if total_cost > 0 else 0
            print(f"{step:<30} {data['calls']:>6} {data['tokens']:>10,} "
                  f"${data['cost']:>10.6f} ({pct:.1f}%)")
        
        print("-" * 62)
        print(f"{'TOTAL':<30} {'':>6} {sum(v['tokens'] for v in self.step_costs.values()):>10,} "
              f"${total_cost:>10.6f}")

Callback 7: Monthly Cost Projection

Project your monthly bill based on recent usage patterns:

import sqlite3
from datetime import datetime, timedelta

def project_monthly_cost(db_path: str = "llm_costs.db") -> dict:
    """Project monthly cost based on the last 7 days of usage."""
    conn = sqlite3.connect(db_path)
    
    seven_days_ago = (datetime.utcnow() - timedelta(days=7)).strftime("%Y-%m-%dT%H:%M:%SZ")
    
    daily_data = conn.execute("""
        SELECT 
            DATE(timestamp) as date,
            SUM(cost_usd) as daily_cost,
            SUM(total_tokens) as daily_tokens,
            COUNT(*) as daily_calls
        FROM llm_calls
        WHERE timestamp > ?
        GROUP BY DATE(timestamp)
        ORDER BY date
    """, (seven_days_ago,)).fetchall()
    
    conn.close()
    
    if not daily_data:
        return {"error": "No usage data found in the last 7 days"}
    
    daily_costs = [row[1] for row in daily_data]
    avg_daily_cost = sum(daily_costs) / len(daily_costs)
    projected_monthly = avg_daily_cost * 30
    
    return {
        "days_analyzed": len(daily_data),
        "avg_daily_cost_usd": round(avg_daily_cost, 4),
        "total_7d_cost_usd": round(sum(daily_costs), 4),
        "projected_monthly_usd": round(projected_monthly, 2),
        "daily_breakdown": [
            {"date": row[0], "cost": round(row[1], 4), 
             "tokens": row[2], "calls": row[3]}
            for row in daily_data
        ]
    }

projection = project_monthly_cost()
print(f"Projected monthly cost: ${projection.get('projected_monthly_usd', 0):.2f}")

OpenAI Model Pricing Reference (2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
gpt-4o	$2.50	$10.00	Complex reasoning, production
gpt-4o-mini	$0.15	$0.60	Classification, fast tasks
gpt-3.5-turbo	$0.50	$1.50	Simple Q&A, legacy apps
text-embedding-3-small	$0.02	—	High-volume embedding
text-embedding-3-large	$0.13	—	High-accuracy embedding
o1-mini	$3.00	$12.00	Multi-step reasoning
o1	$15.00	$60.00	Complex research tasks

The dual-model pattern (cheap model for classification, quality model for generation) described in the Build AI chatbot Python guide can cut costs by 60-80% for high-volume support applications.

Combining Callbacks: Production Setup

from langchain_openai import ChatOpenAI

def create_tracked_llm(
    model: str = "gpt-4o-mini",
    budget_usd: Optional[float] = None,
    user_id: Optional[str] = None
) -> ChatOpenAI:
    """Create an LLM with full cost tracking configured."""
    
    callbacks = [
        CostTrackingCallback(name=f"{model}_tracker"),
        DatabaseCostLogger(db_path="production_costs.db"),
    ]
    
    if budget_usd is not None:
        callbacks.append(BudgetAlertCallback(budget_usd=budget_usd))
    
    return ChatOpenAI(
        model=model,
        callbacks=callbacks,
        metadata={"user_id": user_id or "system"}
    )

# Typical production usage
llm = create_tracked_llm(
    model="gpt-4o-mini",
    budget_usd=1.00,  # $1 per agent run
    user_id="prod_pipeline"
)

chain = (
    ChatPromptTemplate.from_template("Analyze: {input}")
    | llm
    | StrOutputParser()
)

result = chain.invoke({"input": "the latest trends in AI agent development"})

Testing Callbacks Without Incurring Costs

from langchain_core.outputs import LLMResult, ChatGeneration
from langchain_core.messages import AIMessage
import uuid

def simulate_llm_callback(callback: BaseCallbackHandler, model: str, 
                           prompt_tokens: int, completion_tokens: int):
    """Simulate an LLM call for testing callbacks."""
    run_id = uuid.uuid4()
    
    # Simulate start
    callback.on_llm_start(
        serialized={"name": "ChatOpenAI"},
        prompts=["test prompt"],
        run_id=run_id
    )
    
    # Simulate end with fake usage data
    fake_response = LLMResult(
        generations=[[ChatGeneration(message=AIMessage(content="test response"))]],
        llm_output={
            "model_name": model,
            "token_usage": {
                "prompt_tokens": prompt_tokens,
                "completion_tokens": completion_tokens,
                "total_tokens": prompt_tokens + completion_tokens
            }
        }
    )
    callback.on_llm_end(fake_response, run_id=run_id)

# Test your callback without API calls
tracker = CostTrackingCallback(name="test")
simulate_llm_callback(tracker, "gpt-4o-mini", 150, 200)
simulate_llm_callback(tracker, "gpt-4o", 500, 300)

print(tracker.summary)

Key Takeaways

Frequently Asked Questions

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

7 LangChain Callbacks for Token Usage and Cost Tracking

Understanding the Callback System

Callback 1: get_openai_callback (Built-in, Zero Setup)

Callback 2: Custom Per-Model Cost Tracking Callback

Callback 3: Per-Request Cost Attribution

Callback 4: Budget Alert Callback

Callback 5: Structured Logging to Database

Callback 6: Chain-Level Cost Breakdown

Callback 7: Monthly Cost Projection

OpenAI Model Pricing Reference (2026)

Combining Callbacks: Production Setup

Testing Callbacks Without Incurring Costs

Key Takeaways

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily

7 LangChain Callbacks for Token Usage and Cost Tracking

Understanding the Callback System

Callback 1: get_openai_callback (Built-in, Zero Setup)

Callback 2: Custom Per-Model Cost Tracking Callback

Callback 3: Per-Request Cost Attribution

Callback 4: Budget Alert Callback

Callback 5: Structured Logging to Database

Callback 6: Chain-Level Cost Breakdown

Callback 7: Monthly Cost Projection

OpenAI Model Pricing Reference (2026)

Combining Callbacks: Production Setup

Testing Callbacks Without Incurring Costs

Key Takeaways

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily