AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

OpenAI API response on developer screen — LangChain OpenAI integration function calling

How to Use LangChain with OpenAI API (ChatGPT + Tools)

⚡ Quick Answer

Set up LangChain with the OpenAI API — configure ChatOpenAI, implement function calling, bind tools, and run parallel tool calls in a production-ready setup.

AiTechWorlds Team May 31, 2026 13 min read

#LangChain #OpenAI #ChatGPT #Function Calling #Tool Binding

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Getting LangChain to talk to OpenAI's API is about four lines of code. Getting it to work well in production — with function calling, parallel tool execution, proper error handling, and cost-conscious model selection — is a different conversation.

I've watched people burn through $200 in API credits debugging an agent because they didn't understand how LangChain manages token usage, or how to pick the right model for the task at hand. This guide covers both the setup and the production considerations.

We'll go from basic ChatOpenAI configuration through function calling, tool binding, and parallel tool execution. By the end, you'll have a solid mental model of the full LangChain/OpenAI stack and know which model to reach for in different scenarios.

For the foundation — if you haven't set up LangChain yet — start with the LangChain tutorial 2025. For the full picture of how OpenAI integrates with the broader AI stack, OpenAI API integration covers the lower-level details.

ChatOpenAI Configuration Deep Dive

Most tutorials show you the minimum ChatOpenAI setup. Let's go through every parameter worth knowing:

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

llm = ChatOpenAI(
    model="gpt-4o-mini",           # Model name
    temperature=0.7,                # 0=deterministic, 1=creative
    max_tokens=2048,                # Max output tokens (not input)
    max_retries=3,                  # Auto-retry on rate limits / errors
    request_timeout=60,             # Seconds before timeout
    streaming=False,                # Enable for real-time output
    
    # Advanced options
    model_kwargs={
        "top_p": 0.95,              # Nucleus sampling parameter
        "frequency_penalty": 0.1,   # Reduce repetition (0 to 2)
        "presence_penalty": 0.0,    # Topic diversity (0 to 2)
        "seed": 42,                 # Deterministic outputs (when temp=0)
    },
    
    # For Azure OpenAI deployments
    # openai_api_base="https://your-resource.openai.azure.com/",
    # openai_api_version="2024-02-01",
    # azure_deployment="gpt-4o-mini",
)

# Quick test
from langchain_core.messages import HumanMessage
response = llm.invoke([HumanMessage(content="Say 'hello' in five languages.")])
print(response.content)

Understanding Token Costs

max_tokens controls the output length, not the total context. The input tokens (your prompt + history + tool results) are charged separately. For long conversations or RAG contexts, your input can easily dwarf your output in token count.

To track usage:

from langchain_core.callbacks import UsageMetadataCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Track token usage
callback = UsageMetadataCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini")

response = llm.invoke(
    [HumanMessage(content="Explain gradient descent in 100 words.")],
    config={"callbacks": [callback]}
)

print(response.content)
# Access usage data from the response directly
if hasattr(response, 'usage_metadata') and response.usage_metadata:
    print(f"Input tokens: {response.usage_metadata['input_tokens']}")
    print(f"Output tokens: {response.usage_metadata['output_tokens']}")
    print(f"Total tokens: {response.usage_metadata['total_tokens']}")

Function Calling: The Foundation

Function calling is OpenAI's mechanism for having models call external functions with structured JSON arguments. It's more reliable than asking the model to "return JSON" because the format is enforced at the API level.

Basic Function Calling

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import json

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define functions using OpenAI's JSON schema format
functions = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature units"
                }
            },
            "required": ["city"]
        }
    }
]

# Bind functions to the model
llm_with_functions = llm.bind(functions=functions)

response = llm_with_functions.invoke(
    [HumanMessage(content="What's the weather in Tokyo?")]
)

# Check if model wants to call a function
if response.additional_kwargs.get("function_call"):
    func_call = response.additional_kwargs["function_call"]
    print(f"Function name: {func_call['name']}")
    print(f"Arguments: {func_call['arguments']}")
    args = json.loads(func_call['arguments'])
    print(f"City: {args['city']}")

This works, but it's tedious. LangChain's tool binding does the same thing with cleaner code.

Tool Binding: The LangChain Way

LangChain's @tool decorator and bind_tools() method handle the JSON schema conversion automatically:

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from typing import Optional
from pydantic import BaseModel, Field

# Define tools with the @tool decorator
@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a given city.
    
    Args:
        city: The city name (e.g., 'Tokyo', 'New York')
        units: Temperature units, either 'celsius' or 'fahrenheit'
    """
    # Real implementation would call a weather API
    return f"Weather in {city}: 22°C (clear sky)"

@tool
def search_products(
    query: str, 
    category: Optional[str] = None,
    max_price: Optional[float] = None
) -> str:
    """Search for products in the store catalog.
    
    Args:
        query: Search query string
        category: Optional product category filter
        max_price: Optional maximum price filter
    """
    return f"Found 5 products matching '{query}' in category '{category}'"

@tool
def calculate_shipping(
    weight_kg: float,
    destination_country: str,
    express: bool = False
) -> str:
    """Calculate shipping cost for an order.
    
    Args:
        weight_kg: Package weight in kilograms
        destination_country: Two-letter country code (e.g., 'US', 'DE', 'JP')
        express: Whether to use express shipping
    """
    base_cost = weight_kg * 2.5
    if express:
        base_cost *= 2.5
    return f"Shipping to {destination_country}: ${base_cost:.2f}"

# Bind multiple tools at once
tools = [get_weather, search_products, calculate_shipping]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# The model now knows about all three tools
response = llm_with_tools.invoke(
    [HumanMessage(content="I want to buy a laptop under $800 and ship it to Germany. How much will shipping cost for a 2kg package?")]
)

print(f"Tool calls: {response.tool_calls}")

The model's response includes structured tool_calls — a list of tools it wants to invoke and the arguments for each. Your application code executes those tools and feeds the results back.

Building a Complete Tool-Calling Pipeline

Here's a full pipeline that handles the tool-calling loop:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage
from typing import Any
import json

# Define tools
@tool
def web_search(query: str) -> str:
    """Search the web for current information."""
    # Mock — real version uses DuckDuckGo, Serper, or Bing API
    return f"Search results for '{query}': [Result 1: Python 3.13 released...] [Result 2: ...]"

@tool
def run_python_code(code: str) -> str:
    """Execute Python code and return the output. Safe for math and string operations."""
    try:
        # In production, use a sandboxed environment
        result = eval(code)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

@tool  
def save_note(content: str, title: str) -> str:
    """Save a note for later reference.
    
    Args:
        content: The note content
        title: A short descriptive title
    """
    # In production, write to a database or file
    print(f"[NOTE SAVED] {title}: {content[:50]}...")
    return f"Note '{title}' saved successfully."

tools = [web_search, run_python_code, save_note]
tool_map = {t.name: t for t in tools}

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

def run_tool_loop(user_message: str) -> str:
    """Run the complete tool-calling loop."""
    messages = [HumanMessage(content=user_message)]
    
    while True:
        response = llm_with_tools.invoke(messages)
        messages.append(response)
        
        # If no tool calls, we have the final answer
        if not response.tool_calls:
            return response.content
        
        # Execute all requested tools
        for tool_call in response.tool_calls:
            tool_name = tool_call["name"]
            tool_args = tool_call["args"]
            
            if tool_name in tool_map:
                result = tool_map[tool_name].invoke(tool_args)
            else:
                result = f"Error: Unknown tool '{tool_name}'"
            
            # Add tool result to messages
            messages.append(
                ToolMessage(
                    content=str(result),
                    tool_call_id=tool_call["id"]
                )
            )
            
            print(f"[Tool: {tool_name}] Input: {tool_args}")
            print(f"[Tool: {tool_name}] Output: {result[:100]}")

# Test the loop
answer = run_tool_loop(
    "Search for the latest Python version, then calculate 2 to the power of the version number's major version, and save the result as a note."
)
print(f"\nFinal Answer: {answer}")

Parallel Tool Calls

OpenAI's API supports calling multiple tools simultaneously. LangChain exposes this natively:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a ticker symbol."""
    prices = {"AAPL": "$182.50", "GOOGL": "$178.20", "MSFT": "$415.00"}
    return prices.get(ticker.upper(), f"Unknown ticker: {ticker}")

@tool
def get_company_info(ticker: str) -> str:
    """Get basic company information for a ticker."""
    info = {
        "AAPL": "Apple Inc. — Consumer electronics and software",
        "GOOGL": "Alphabet Inc. — Technology conglomerate",
        "MSFT": "Microsoft Corp. — Software and cloud services"
    }
    return info.get(ticker.upper(), f"No info for: {ticker}")

tools = [get_stock_price, get_company_info]
tool_map = {t.name: t for t in tools}

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# This query should trigger parallel calls for both tools
messages = [HumanMessage(content="Get the stock price AND company info for Apple, all at once.")]
response = llm_with_tools.invoke(messages)

print(f"Number of tool calls: {len(response.tool_calls)}")
# With gpt-4o or gpt-4o-mini, this should be 2 simultaneous calls

# Execute all tool calls (they can be parallelized in your code too)
import asyncio
from langchain_core.tools import BaseTool

async def execute_tool_calls_parallel(tool_calls: list) -> list[ToolMessage]:
    async def run_one(tc):
        tool = tool_map[tc["name"]]
        result = await asyncio.to_thread(tool.invoke, tc["args"])
        return ToolMessage(content=str(result), tool_call_id=tc["id"])
    
    return await asyncio.gather(*[run_one(tc) for tc in tool_calls])

tool_messages = asyncio.run(execute_tool_calls_parallel(response.tool_calls))
messages.append(response)
messages.extend(tool_messages)

final = llm_with_tools.invoke(messages)
print(f"\nFinal response: {final.content}")

Parallel tool calls can reduce multi-step agent latency significantly. A task that previously required three sequential LLM calls might complete in one call with three parallel tool executions. For user-facing applications with complex queries, this is worth implementing.

OpenAI Model Selection Guide

Picking the right model isn't just about cost — it's about matching capability to task.

Model	Speed	Cost (per 1M tokens in/out)	Reasoning	Context	Best For
gpt-4o	Medium	$2.50 / $10.00	Excellent	128K	Complex reasoning, highest accuracy
gpt-4o-mini	Fast	$0.15 / $0.60	Very Good	128K	Most production tasks (best value)
gpt-3.5-turbo	Very Fast	$0.50 / $1.50	Good	16K	Legacy; avoid for new projects
o3	Slow	$10.00 / $40.00	Outstanding	200K	Math, code, deep reasoning tasks
o4-mini	Medium	$1.10 / $4.40	Excellent	200K	Reasoning at lower cost than o3

My honest view: gpt-4o-mini is the right default for 90% of production tasks. It's fast, cheap, and handles most real-world workloads well. Move to gpt-4o when you're seeing quality failures on complex tasks. The o-series models are for specialized reasoning tasks where accuracy matters more than speed or cost.

Avoid gpt-3.5-turbo for new projects. The $0.35 per million token cost difference compared to gpt-4o-mini is not worth the quality regression.

Dynamic Model Selection

For applications with variable task complexity, you can route between models:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Route between models based on task type
fast_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
smart_llm = ChatOpenAI(model="gpt-4o", temperature=0)

classifier_prompt = ChatPromptTemplate.from_template(
    """Classify this task as 'simple' or 'complex'.
    Simple: factual Q&A, basic summaries, formatting
    Complex: multi-step reasoning, code generation, analysis
    
    Task: {task}
    
    Respond with ONE word: 'simple' or 'complex'"""
)

classifier = classifier_prompt | fast_llm | StrOutputParser()

def smart_route(task: str) -> str:
    complexity = classifier.invoke({"task": task}).strip().lower()
    
    if complexity == "complex":
        llm = smart_llm
        print("[Using gpt-4o for complex task]")
    else:
        llm = fast_llm
        print("[Using gpt-4o-mini for simple task]")
    
    task_prompt = ChatPromptTemplate.from_template("Complete this task:\n{task}")
    chain = task_prompt | llm | StrOutputParser()
    return chain.invoke({"task": task})

print(smart_route("What is 2 + 2?"))
print(smart_route("Analyze the time complexity of merge sort and compare it to quicksort with mathematical proof."))

Structured Output with OpenAI

Getting structured data back from OpenAI is cleaner with LangChain's with_structured_output() method:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List

class ProductReview(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    score: int = Field(description="Score from 1-10", ge=1, le=10)
    key_points: List[str] = Field(description="3-5 key points from the review")
    topics_mentioned: List[str] = Field(description="Product aspects mentioned")
    would_recommend: bool = Field(description="Whether reviewer recommends the product")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(ProductReview)

review_text = """
This laptop is absolutely amazing. The battery life is incredible - I get 
12 hours easily. The keyboard feels great and the screen is stunning. 
Only minor complaint is it runs a bit warm under heavy load. 
Definitely recommending this to my friends.
"""

analysis = structured_llm.invoke(f"Analyze this review:\n\n{review_text}")
print(f"Sentiment: {analysis.sentiment}")
print(f"Score: {analysis.score}/10")
print(f"Key Points: {analysis.key_points}")
print(f"Would Recommend: {analysis.would_recommend}")

with_structured_output() uses function calling under the hood to guarantee Pydantic-validated output. No more regex parsing or hoping the model returns valid JSON.

Production Considerations

Rate Limiting and Backoff

from langchain_openai import ChatOpenAI
import time

# LangChain handles basic retries automatically
llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=5,
    request_timeout=60,
)

# For batch processing with rate limit awareness
async def process_batch_with_rate_limit(prompts: list, delay: float = 0.1):
    """Process a batch of prompts respecting rate limits."""
    import asyncio
    from langchain_core.messages import HumanMessage
    
    results = []
    for prompt in prompts:
        response = await llm.ainvoke([HumanMessage(content=prompt)])
        results.append(response.content)
        await asyncio.sleep(delay)  # Basic rate limiting
    return results

Caching for Cost Reduction

from langchain_community.cache import SQLiteCache
from langchain_core.globals import set_llm_cache

# Persistent cache — identical prompts won't hit the API
set_llm_cache(SQLiteCache(database_path=".openai_cache.db"))

For a production deployment guide that covers rate limits, caching, and observability in more detail, see deploy AI model to production. The LangChain callbacks guide covers how to add proper logging and tracing to your OpenAI calls.

Handling Content Policy Errors

from openai import BadRequestError
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini")

def safe_invoke(message: str) -> str:
    try:
        response = llm.invoke([HumanMessage(content=message)])
        return response.content
    except BadRequestError as e:
        if "content_policy_violation" in str(e):
            return "I can't help with that request."
        raise  # Re-raise unexpected errors
    except Exception as e:
        return f"An error occurred: {str(e)}"

Conclusion

LangChain's OpenAI integration goes well beyond basic API calls. Function calling and tool binding give you a structured, reliable way for models to interact with external systems. Parallel tool calls reduce latency for multi-step tasks. Structured output eliminates the parsing hell that comes with asking models to "return JSON."

On model selection: default to gpt-4o-mini, it covers most production use cases effectively. Move to gpt-4o when you're seeing quality issues on complex tasks, and reserve the o-series models for specialized reasoning workloads.

The patterns here — tool binding, parallel execution, structured output — are what separate a prototype from something you can actually run in production. For the next step, explore LangChain agent types to see how these tool-calling primitives power full autonomous agent systems.

Frequently Asked Questions

What is the difference between function calling and tool binding in LangChain?

Function calling is the underlying OpenAI API feature. Tool binding is LangChain's abstraction on top of it — you define tools with @tool decorators and call bind_tools() on the model. LangChain handles the schema conversion and response parsing automatically.

Can I use LangChain with models other than OpenAI?

Yes. LangChain supports Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama (local models), and dozens of others. You swap the model class but keep the same chain and tool code. Not all providers support function calling with the same interface.

What is the best OpenAI model to use with LangChain in 2026?

For most production tasks, gpt-4o-mini gives the best cost-to-quality ratio. For complex multi-step reasoning or tasks requiring the highest accuracy, use gpt-4o. Avoid gpt-3.5-turbo for new projects — the quality gap compared to gpt-4o-mini doesn't justify the modest cost saving.

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

Function calling is the underlying OpenAI API feature. Tool binding is LangChain's abstraction on top of it — you define tools with @tool decorators and call bind_tools() on the model. LangChain handles the schema conversion and response parsing automatically.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

search relevance ranking showing scores — LangChain advanced RAG retrieval strategies

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

AI agent architecture with memory and tool connections — LangChain agent memory tools

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

developer coding AI agent decision loop — LangChain agent types ZeroShot ReAct Conversational

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

FastAPI server running LangChain endpoint — deploy LangChain FastAPI REST streaming

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

BookChatGPT Mastery Guide BookBuilding AI Apps: Developer's Guide NotesPrompt Engineering Cheat Sheet NotesChatGPT Tips & Tricks Cheat Sheet NotesAI Agent Development Notes NotesRAG: Retrieval-Augmented Generation Guide

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Langchain

How to Use LangChain with OpenAI API (ChatGPT + Tools)

⚡ Quick Answer

Set up LangChain with the OpenAI API — configure ChatOpenAI, implement function calling, bind tools, and run parallel tool calls in a production-ready setup.

AiTechWorlds Team May 31, 2026 13 min read

#LangChain #OpenAI #ChatGPT #Function Calling #Tool Binding

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

ChatOpenAI Configuration Deep Dive

Most tutorials show you the minimum ChatOpenAI setup. Let's go through every parameter worth knowing:

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

llm = ChatOpenAI(
    model="gpt-4o-mini",           # Model name
    temperature=0.7,                # 0=deterministic, 1=creative
    max_tokens=2048,                # Max output tokens (not input)
    max_retries=3,                  # Auto-retry on rate limits / errors
    request_timeout=60,             # Seconds before timeout
    streaming=False,                # Enable for real-time output
    
    # Advanced options
    model_kwargs={
        "top_p": 0.95,              # Nucleus sampling parameter
        "frequency_penalty": 0.1,   # Reduce repetition (0 to 2)
        "presence_penalty": 0.0,    # Topic diversity (0 to 2)
        "seed": 42,                 # Deterministic outputs (when temp=0)
    },
    
    # For Azure OpenAI deployments
    # openai_api_base="https://your-resource.openai.azure.com/",
    # openai_api_version="2024-02-01",
    # azure_deployment="gpt-4o-mini",
)

# Quick test
from langchain_core.messages import HumanMessage
response = llm.invoke([HumanMessage(content="Say 'hello' in five languages.")])
print(response.content)

Understanding Token Costs

To track usage:

from langchain_core.callbacks import UsageMetadataCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Track token usage
callback = UsageMetadataCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini")

response = llm.invoke(
    [HumanMessage(content="Explain gradient descent in 100 words.")],
    config={"callbacks": [callback]}
)

print(response.content)
# Access usage data from the response directly
if hasattr(response, 'usage_metadata') and response.usage_metadata:
    print(f"Input tokens: {response.usage_metadata['input_tokens']}")
    print(f"Output tokens: {response.usage_metadata['output_tokens']}")
    print(f"Total tokens: {response.usage_metadata['total_tokens']}")

Function Calling: The Foundation

Basic Function Calling

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import json

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define functions using OpenAI's JSON schema format
functions = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature units"
                }
            },
            "required": ["city"]
        }
    }
]

# Bind functions to the model
llm_with_functions = llm.bind(functions=functions)

response = llm_with_functions.invoke(
    [HumanMessage(content="What's the weather in Tokyo?")]
)

# Check if model wants to call a function
if response.additional_kwargs.get("function_call"):
    func_call = response.additional_kwargs["function_call"]
    print(f"Function name: {func_call['name']}")
    print(f"Arguments: {func_call['arguments']}")
    args = json.loads(func_call['arguments'])
    print(f"City: {args['city']}")

This works, but it's tedious. LangChain's tool binding does the same thing with cleaner code.

Tool Binding: The LangChain Way

LangChain's @tool decorator and bind_tools() method handle the JSON schema conversion automatically:

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from typing import Optional
from pydantic import BaseModel, Field

# Define tools with the @tool decorator
@tool
def get_weather(city: str, units: str = "celsius") -> str:
    """Get the current weather for a given city.
    
    Args:
        city: The city name (e.g., 'Tokyo', 'New York')
        units: Temperature units, either 'celsius' or 'fahrenheit'
    """
    # Real implementation would call a weather API
    return f"Weather in {city}: 22°C (clear sky)"

@tool
def search_products(
    query: str, 
    category: Optional[str] = None,
    max_price: Optional[float] = None
) -> str:
    """Search for products in the store catalog.
    
    Args:
        query: Search query string
        category: Optional product category filter
        max_price: Optional maximum price filter
    """
    return f"Found 5 products matching '{query}' in category '{category}'"

@tool
def calculate_shipping(
    weight_kg: float,
    destination_country: str,
    express: bool = False
) -> str:
    """Calculate shipping cost for an order.
    
    Args:
        weight_kg: Package weight in kilograms
        destination_country: Two-letter country code (e.g., 'US', 'DE', 'JP')
        express: Whether to use express shipping
    """
    base_cost = weight_kg * 2.5
    if express:
        base_cost *= 2.5
    return f"Shipping to {destination_country}: ${base_cost:.2f}"

# Bind multiple tools at once
tools = [get_weather, search_products, calculate_shipping]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# The model now knows about all three tools
response = llm_with_tools.invoke(
    [HumanMessage(content="I want to buy a laptop under $800 and ship it to Germany. How much will shipping cost for a 2kg package?")]
)

print(f"Tool calls: {response.tool_calls}")

The model's response includes structured tool_calls — a list of tools it wants to invoke and the arguments for each. Your application code executes those tools and feeds the results back.

Building a Complete Tool-Calling Pipeline

Here's a full pipeline that handles the tool-calling loop:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage
from typing import Any
import json

# Define tools
@tool
def web_search(query: str) -> str:
    """Search the web for current information."""
    # Mock — real version uses DuckDuckGo, Serper, or Bing API
    return f"Search results for '{query}': [Result 1: Python 3.13 released...] [Result 2: ...]"

@tool
def run_python_code(code: str) -> str:
    """Execute Python code and return the output. Safe for math and string operations."""
    try:
        # In production, use a sandboxed environment
        result = eval(code)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

@tool  
def save_note(content: str, title: str) -> str:
    """Save a note for later reference.
    
    Args:
        content: The note content
        title: A short descriptive title
    """
    # In production, write to a database or file
    print(f"[NOTE SAVED] {title}: {content[:50]}...")
    return f"Note '{title}' saved successfully."

tools = [web_search, run_python_code, save_note]
tool_map = {t.name: t for t in tools}

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

def run_tool_loop(user_message: str) -> str:
    """Run the complete tool-calling loop."""
    messages = [HumanMessage(content=user_message)]
    
    while True:
        response = llm_with_tools.invoke(messages)
        messages.append(response)
        
        # If no tool calls, we have the final answer
        if not response.tool_calls:
            return response.content
        
        # Execute all requested tools
        for tool_call in response.tool_calls:
            tool_name = tool_call["name"]
            tool_args = tool_call["args"]
            
            if tool_name in tool_map:
                result = tool_map[tool_name].invoke(tool_args)
            else:
                result = f"Error: Unknown tool '{tool_name}'"
            
            # Add tool result to messages
            messages.append(
                ToolMessage(
                    content=str(result),
                    tool_call_id=tool_call["id"]
                )
            )
            
            print(f"[Tool: {tool_name}] Input: {tool_args}")
            print(f"[Tool: {tool_name}] Output: {result[:100]}")

# Test the loop
answer = run_tool_loop(
    "Search for the latest Python version, then calculate 2 to the power of the version number's major version, and save the result as a note."
)
print(f"\nFinal Answer: {answer}")

Parallel Tool Calls

OpenAI's API supports calling multiple tools simultaneously. LangChain exposes this natively:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a ticker symbol."""
    prices = {"AAPL": "$182.50", "GOOGL": "$178.20", "MSFT": "$415.00"}
    return prices.get(ticker.upper(), f"Unknown ticker: {ticker}")

@tool
def get_company_info(ticker: str) -> str:
    """Get basic company information for a ticker."""
    info = {
        "AAPL": "Apple Inc. — Consumer electronics and software",
        "GOOGL": "Alphabet Inc. — Technology conglomerate",
        "MSFT": "Microsoft Corp. — Software and cloud services"
    }
    return info.get(ticker.upper(), f"No info for: {ticker}")

tools = [get_stock_price, get_company_info]
tool_map = {t.name: t for t in tools}

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# This query should trigger parallel calls for both tools
messages = [HumanMessage(content="Get the stock price AND company info for Apple, all at once.")]
response = llm_with_tools.invoke(messages)

print(f"Number of tool calls: {len(response.tool_calls)}")
# With gpt-4o or gpt-4o-mini, this should be 2 simultaneous calls

# Execute all tool calls (they can be parallelized in your code too)
import asyncio
from langchain_core.tools import BaseTool

async def execute_tool_calls_parallel(tool_calls: list) -> list[ToolMessage]:
    async def run_one(tc):
        tool = tool_map[tc["name"]]
        result = await asyncio.to_thread(tool.invoke, tc["args"])
        return ToolMessage(content=str(result), tool_call_id=tc["id"])
    
    return await asyncio.gather(*[run_one(tc) for tc in tool_calls])

tool_messages = asyncio.run(execute_tool_calls_parallel(response.tool_calls))
messages.append(response)
messages.extend(tool_messages)

final = llm_with_tools.invoke(messages)
print(f"\nFinal response: {final.content}")

OpenAI Model Selection Guide

Picking the right model isn't just about cost — it's about matching capability to task.

Model	Speed	Cost (per 1M tokens in/out)	Reasoning	Context	Best For
gpt-4o	Medium	$2.50 / $10.00	Excellent	128K	Complex reasoning, highest accuracy
gpt-4o-mini	Fast	$0.15 / $0.60	Very Good	128K	Most production tasks (best value)
gpt-3.5-turbo	Very Fast	$0.50 / $1.50	Good	16K	Legacy; avoid for new projects
o3	Slow	$10.00 / $40.00	Outstanding	200K	Math, code, deep reasoning tasks
o4-mini	Medium	$1.10 / $4.40	Excellent	200K	Reasoning at lower cost than o3

Avoid gpt-3.5-turbo for new projects. The $0.35 per million token cost difference compared to gpt-4o-mini is not worth the quality regression.

Dynamic Model Selection

For applications with variable task complexity, you can route between models:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Route between models based on task type
fast_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
smart_llm = ChatOpenAI(model="gpt-4o", temperature=0)

classifier_prompt = ChatPromptTemplate.from_template(
    """Classify this task as 'simple' or 'complex'.
    Simple: factual Q&A, basic summaries, formatting
    Complex: multi-step reasoning, code generation, analysis
    
    Task: {task}
    
    Respond with ONE word: 'simple' or 'complex'"""
)

classifier = classifier_prompt | fast_llm | StrOutputParser()

def smart_route(task: str) -> str:
    complexity = classifier.invoke({"task": task}).strip().lower()
    
    if complexity == "complex":
        llm = smart_llm
        print("[Using gpt-4o for complex task]")
    else:
        llm = fast_llm
        print("[Using gpt-4o-mini for simple task]")
    
    task_prompt = ChatPromptTemplate.from_template("Complete this task:\n{task}")
    chain = task_prompt | llm | StrOutputParser()
    return chain.invoke({"task": task})

print(smart_route("What is 2 + 2?"))
print(smart_route("Analyze the time complexity of merge sort and compare it to quicksort with mathematical proof."))

Structured Output with OpenAI

Getting structured data back from OpenAI is cleaner with LangChain's with_structured_output() method:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List

class ProductReview(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    score: int = Field(description="Score from 1-10", ge=1, le=10)
    key_points: List[str] = Field(description="3-5 key points from the review")
    topics_mentioned: List[str] = Field(description="Product aspects mentioned")
    would_recommend: bool = Field(description="Whether reviewer recommends the product")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(ProductReview)

review_text = """
This laptop is absolutely amazing. The battery life is incredible - I get 
12 hours easily. The keyboard feels great and the screen is stunning. 
Only minor complaint is it runs a bit warm under heavy load. 
Definitely recommending this to my friends.
"""

analysis = structured_llm.invoke(f"Analyze this review:\n\n{review_text}")
print(f"Sentiment: {analysis.sentiment}")
print(f"Score: {analysis.score}/10")
print(f"Key Points: {analysis.key_points}")
print(f"Would Recommend: {analysis.would_recommend}")

with_structured_output() uses function calling under the hood to guarantee Pydantic-validated output. No more regex parsing or hoping the model returns valid JSON.

Production Considerations

Rate Limiting and Backoff

from langchain_openai import ChatOpenAI
import time

# LangChain handles basic retries automatically
llm = ChatOpenAI(
    model="gpt-4o-mini",
    max_retries=5,
    request_timeout=60,
)

# For batch processing with rate limit awareness
async def process_batch_with_rate_limit(prompts: list, delay: float = 0.1):
    """Process a batch of prompts respecting rate limits."""
    import asyncio
    from langchain_core.messages import HumanMessage
    
    results = []
    for prompt in prompts:
        response = await llm.ainvoke([HumanMessage(content=prompt)])
        results.append(response.content)
        await asyncio.sleep(delay)  # Basic rate limiting
    return results

Caching for Cost Reduction

from langchain_community.cache import SQLiteCache
from langchain_core.globals import set_llm_cache

# Persistent cache — identical prompts won't hit the API
set_llm_cache(SQLiteCache(database_path=".openai_cache.db"))

Handling Content Policy Errors

from openai import BadRequestError
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-4o-mini")

def safe_invoke(message: str) -> str:
    try:
        response = llm.invoke([HumanMessage(content=message)])
        return response.content
    except BadRequestError as e:
        if "content_policy_violation" in str(e):
            return "I can't help with that request."
        raise  # Re-raise unexpected errors
    except Exception as e:
        return f"An error occurred: {str(e)}"

Conclusion

Frequently Asked Questions

What is the difference between function calling and tool binding in LangChain?

Can I use LangChain with models other than OpenAI?

What is the best OpenAI model to use with LangChain in 2026?

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

How to Use LangChain with OpenAI API (ChatGPT + Tools)

ChatOpenAI Configuration Deep Dive

Understanding Token Costs

Function Calling: The Foundation

Basic Function Calling

Tool Binding: The LangChain Way

Building a Complete Tool-Calling Pipeline

Parallel Tool Calls

OpenAI Model Selection Guide

Dynamic Model Selection

Structured Output with OpenAI

Production Considerations

Rate Limiting and Backoff

Caching for Cost Reduction

Handling Content Policy Errors

Conclusion

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily

How to Use LangChain with OpenAI API (ChatGPT + Tools)

ChatOpenAI Configuration Deep Dive

Understanding Token Costs

Function Calling: The Foundation

Basic Function Calling

Tool Binding: The LangChain Way

Building a Complete Tool-Calling Pipeline

Parallel Tool Calls

OpenAI Model Selection Guide

Dynamic Model Selection

Structured Output with OpenAI

Production Considerations

Rate Limiting and Backoff

Caching for Cost Reduction

Handling Content Policy Errors

Conclusion

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily