How to Use LangChain with OpenAI API (ChatGPT + Tools)
Set up LangChain with the OpenAI API — configure ChatOpenAI, implement function calling, bind tools, and run parallel tool calls in a production-ready setup.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Getting LangChain to talk to OpenAI's API is about four lines of code. Getting it to work well in production — with function calling, parallel tool execution, proper error handling, and cost-conscious model selection — is a different conversation.
I've watched people burn through $200 in API credits debugging an agent because they didn't understand how LangChain manages token usage, or how to pick the right model for the task at hand. This guide covers both the setup and the production considerations.
We'll go from basic ChatOpenAI configuration through function calling, tool binding, and parallel tool execution. By the end, you'll have a solid mental model of the full LangChain/OpenAI stack and know which model to reach for in different scenarios.
For the foundation — if you haven't set up LangChain yet — start with the LangChain tutorial 2025. For the full picture of how OpenAI integrates with the broader AI stack, OpenAI API integration covers the lower-level details.
ChatOpenAI Configuration Deep Dive
Most tutorials show you the minimum ChatOpenAI setup. Let's go through every parameter worth knowing:
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
llm = ChatOpenAI(
model="gpt-4o-mini", # Model name
temperature=0.7, # 0=deterministic, 1=creative
max_tokens=2048, # Max output tokens (not input)
max_retries=3, # Auto-retry on rate limits / errors
request_timeout=60, # Seconds before timeout
streaming=False, # Enable for real-time output
# Advanced options
model_kwargs={
"top_p": 0.95, # Nucleus sampling parameter
"frequency_penalty": 0.1, # Reduce repetition (0 to 2)
"presence_penalty": 0.0, # Topic diversity (0 to 2)
"seed": 42, # Deterministic outputs (when temp=0)
},
# For Azure OpenAI deployments
# openai_api_base="https://your-resource.openai.azure.com/",
# openai_api_version="2024-02-01",
# azure_deployment="gpt-4o-mini",
)
# Quick test
from langchain_core.messages import HumanMessage
response = llm.invoke([HumanMessage(content="Say 'hello' in five languages.")])
print(response.content)
Understanding Token Costs
max_tokens controls the output length, not the total context. The input tokens (your prompt + history + tool results) are charged separately. For long conversations or RAG contexts, your input can easily dwarf your output in token count.
To track usage:
from langchain_core.callbacks import UsageMetadataCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# Track token usage
callback = UsageMetadataCallbackHandler()
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke(
[HumanMessage(content="Explain gradient descent in 100 words.")],
config={"callbacks": [callback]}
)
print(response.content)
# Access usage data from the response directly
if hasattr(response, 'usage_metadata') and response.usage_metadata:
print(f"Input tokens: {response.usage_metadata['input_tokens']}")
print(f"Output tokens: {response.usage_metadata['output_tokens']}")
print(f"Total tokens: {response.usage_metadata['total_tokens']}")
Function Calling: The Foundation
Function calling is OpenAI's mechanism for having models call external functions with structured JSON arguments. It's more reliable than asking the model to "return JSON" because the format is enforced at the API level.
Basic Function Calling
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import json
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Define functions using OpenAI's JSON schema format
functions = [
{
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
]
# Bind functions to the model
llm_with_functions = llm.bind(functions=functions)
response = llm_with_functions.invoke(
[HumanMessage(content="What's the weather in Tokyo?")]
)
# Check if model wants to call a function
if response.additional_kwargs.get("function_call"):
func_call = response.additional_kwargs["function_call"]
print(f"Function name: {func_call['name']}")
print(f"Arguments: {func_call['arguments']}")
args = json.loads(func_call['arguments'])
print(f"City: {args['city']}")
This works, but it's tedious. LangChain's tool binding does the same thing with cleaner code.
Tool Binding: The LangChain Way
LangChain's @tool decorator and bind_tools() method handle the JSON schema conversion automatically:
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from typing import Optional
from pydantic import BaseModel, Field
# Define tools with the @tool decorator
@tool
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a given city.
Args:
city: The city name (e.g., 'Tokyo', 'New York')
units: Temperature units, either 'celsius' or 'fahrenheit'
"""
# Real implementation would call a weather API
return f"Weather in {city}: 22°C (clear sky)"
@tool
def search_products(
query: str,
category: Optional[str] = None,
max_price: Optional[float] = None
) -> str:
"""Search for products in the store catalog.
Args:
query: Search query string
category: Optional product category filter
max_price: Optional maximum price filter
"""
return f"Found 5 products matching '{query}' in category '{category}'"
@tool
def calculate_shipping(
weight_kg: float,
destination_country: str,
express: bool = False
) -> str:
"""Calculate shipping cost for an order.
Args:
weight_kg: Package weight in kilograms
destination_country: Two-letter country code (e.g., 'US', 'DE', 'JP')
express: Whether to use express shipping
"""
base_cost = weight_kg * 2.5
if express:
base_cost *= 2.5
return f"Shipping to {destination_country}: ${base_cost:.2f}"
# Bind multiple tools at once
tools = [get_weather, search_products, calculate_shipping]
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)
# The model now knows about all three tools
response = llm_with_tools.invoke(
[HumanMessage(content="I want to buy a laptop under $800 and ship it to Germany. How much will shipping cost for a 2kg package?")]
)
print(f"Tool calls: {response.tool_calls}")
The model's response includes structured tool_calls — a list of tools it wants to invoke and the arguments for each. Your application code executes those tools and feeds the results back.
Building a Complete Tool-Calling Pipeline
Here's a full pipeline that handles the tool-calling loop:
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage
from typing import Any
import json
# Define tools
@tool
def web_search(query: str) -> str:
"""Search the web for current information."""
# Mock — real version uses DuckDuckGo, Serper, or Bing API
return f"Search results for '{query}': [Result 1: Python 3.13 released...] [Result 2: ...]"
@tool
def run_python_code(code: str) -> str:
"""Execute Python code and return the output. Safe for math and string operations."""
try:
# In production, use a sandboxed environment
result = eval(code)
return str(result)
except Exception as e:
return f"Error: {e}"
@tool
def save_note(content: str, title: str) -> str:
"""Save a note for later reference.
Args:
content: The note content
title: A short descriptive title
"""
# In production, write to a database or file
print(f"[NOTE SAVED] {title}: {content[:50]}...")
return f"Note '{title}' saved successfully."
tools = [web_search, run_python_code, save_note]
tool_map = {t.name: t for t in tools}
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)
def run_tool_loop(user_message: str) -> str:
"""Run the complete tool-calling loop."""
messages = [HumanMessage(content=user_message)]
while True:
response = llm_with_tools.invoke(messages)
messages.append(response)
# If no tool calls, we have the final answer
if not response.tool_calls:
return response.content
# Execute all requested tools
for tool_call in response.tool_calls:
tool_name = tool_call["name"]
tool_args = tool_call["args"]
if tool_name in tool_map:
result = tool_map[tool_name].invoke(tool_args)
else:
result = f"Error: Unknown tool '{tool_name}'"
# Add tool result to messages
messages.append(
ToolMessage(
content=str(result),
tool_call_id=tool_call["id"]
)
)
print(f"[Tool: {tool_name}] Input: {tool_args}")
print(f"[Tool: {tool_name}] Output: {result[:100]}")
# Test the loop
answer = run_tool_loop(
"Search for the latest Python version, then calculate 2 to the power of the version number's major version, and save the result as a note."
)
print(f"\nFinal Answer: {answer}")
Parallel Tool Calls
OpenAI's API supports calling multiple tools simultaneously. LangChain exposes this natively:
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, ToolMessage
@tool
def get_stock_price(ticker: str) -> str:
"""Get the current stock price for a ticker symbol."""
prices = {"AAPL": "$182.50", "GOOGL": "$178.20", "MSFT": "$415.00"}
return prices.get(ticker.upper(), f"Unknown ticker: {ticker}")
@tool
def get_company_info(ticker: str) -> str:
"""Get basic company information for a ticker."""
info = {
"AAPL": "Apple Inc. — Consumer electronics and software",
"GOOGL": "Alphabet Inc. — Technology conglomerate",
"MSFT": "Microsoft Corp. — Software and cloud services"
}
return info.get(ticker.upper(), f"No info for: {ticker}")
tools = [get_stock_price, get_company_info]
tool_map = {t.name: t for t in tools}
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)
# This query should trigger parallel calls for both tools
messages = [HumanMessage(content="Get the stock price AND company info for Apple, all at once.")]
response = llm_with_tools.invoke(messages)
print(f"Number of tool calls: {len(response.tool_calls)}")
# With gpt-4o or gpt-4o-mini, this should be 2 simultaneous calls
# Execute all tool calls (they can be parallelized in your code too)
import asyncio
from langchain_core.tools import BaseTool
async def execute_tool_calls_parallel(tool_calls: list) -> list[ToolMessage]:
async def run_one(tc):
tool = tool_map[tc["name"]]
result = await asyncio.to_thread(tool.invoke, tc["args"])
return ToolMessage(content=str(result), tool_call_id=tc["id"])
return await asyncio.gather(*[run_one(tc) for tc in tool_calls])
tool_messages = asyncio.run(execute_tool_calls_parallel(response.tool_calls))
messages.append(response)
messages.extend(tool_messages)
final = llm_with_tools.invoke(messages)
print(f"\nFinal response: {final.content}")
Parallel tool calls can reduce multi-step agent latency significantly. A task that previously required three sequential LLM calls might complete in one call with three parallel tool executions. For user-facing applications with complex queries, this is worth implementing.
OpenAI Model Selection Guide
Picking the right model isn't just about cost — it's about matching capability to task.
| Model | Speed | Cost (per 1M tokens in/out) | Reasoning | Context | Best For |
|---|---|---|---|---|---|
| gpt-4o | Medium | $2.50 / $10.00 | Excellent | 128K | Complex reasoning, highest accuracy |
| gpt-4o-mini | Fast | $0.15 / $0.60 | Very Good | 128K | Most production tasks (best value) |
| gpt-3.5-turbo | Very Fast | $0.50 / $1.50 | Good | 16K | Legacy; avoid for new projects |
| o3 | Slow | $10.00 / $40.00 | Outstanding | 200K | Math, code, deep reasoning tasks |
| o4-mini | Medium | $1.10 / $4.40 | Excellent | 200K | Reasoning at lower cost than o3 |
My honest view: gpt-4o-mini is the right default for 90% of production tasks. It's fast, cheap, and handles most real-world workloads well. Move to gpt-4o when you're seeing quality failures on complex tasks. The o-series models are for specialized reasoning tasks where accuracy matters more than speed or cost.
Avoid gpt-3.5-turbo for new projects. The $0.35 per million token cost difference compared to gpt-4o-mini is not worth the quality regression.
Dynamic Model Selection
For applications with variable task complexity, you can route between models:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Route between models based on task type
fast_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
smart_llm = ChatOpenAI(model="gpt-4o", temperature=0)
classifier_prompt = ChatPromptTemplate.from_template(
"""Classify this task as 'simple' or 'complex'.
Simple: factual Q&A, basic summaries, formatting
Complex: multi-step reasoning, code generation, analysis
Task: {task}
Respond with ONE word: 'simple' or 'complex'"""
)
classifier = classifier_prompt | fast_llm | StrOutputParser()
def smart_route(task: str) -> str:
complexity = classifier.invoke({"task": task}).strip().lower()
if complexity == "complex":
llm = smart_llm
print("[Using gpt-4o for complex task]")
else:
llm = fast_llm
print("[Using gpt-4o-mini for simple task]")
task_prompt = ChatPromptTemplate.from_template("Complete this task:\n{task}")
chain = task_prompt | llm | StrOutputParser()
return chain.invoke({"task": task})
print(smart_route("What is 2 + 2?"))
print(smart_route("Analyze the time complexity of merge sort and compare it to quicksort with mathematical proof."))
Structured Output with OpenAI
Getting structured data back from OpenAI is cleaner with LangChain's with_structured_output() method:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List
class ProductReview(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
score: int = Field(description="Score from 1-10", ge=1, le=10)
key_points: List[str] = Field(description="3-5 key points from the review")
topics_mentioned: List[str] = Field(description="Product aspects mentioned")
would_recommend: bool = Field(description="Whether reviewer recommends the product")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(ProductReview)
review_text = """
This laptop is absolutely amazing. The battery life is incredible - I get
12 hours easily. The keyboard feels great and the screen is stunning.
Only minor complaint is it runs a bit warm under heavy load.
Definitely recommending this to my friends.
"""
analysis = structured_llm.invoke(f"Analyze this review:\n\n{review_text}")
print(f"Sentiment: {analysis.sentiment}")
print(f"Score: {analysis.score}/10")
print(f"Key Points: {analysis.key_points}")
print(f"Would Recommend: {analysis.would_recommend}")
with_structured_output() uses function calling under the hood to guarantee Pydantic-validated output. No more regex parsing or hoping the model returns valid JSON.
Production Considerations
Rate Limiting and Backoff
from langchain_openai import ChatOpenAI
import time
# LangChain handles basic retries automatically
llm = ChatOpenAI(
model="gpt-4o-mini",
max_retries=5,
request_timeout=60,
)
# For batch processing with rate limit awareness
async def process_batch_with_rate_limit(prompts: list, delay: float = 0.1):
"""Process a batch of prompts respecting rate limits."""
import asyncio
from langchain_core.messages import HumanMessage
results = []
for prompt in prompts:
response = await llm.ainvoke([HumanMessage(content=prompt)])
results.append(response.content)
await asyncio.sleep(delay) # Basic rate limiting
return results
Caching for Cost Reduction
from langchain_community.cache import SQLiteCache
from langchain_core.globals import set_llm_cache
# Persistent cache — identical prompts won't hit the API
set_llm_cache(SQLiteCache(database_path=".openai_cache.db"))
For a production deployment guide that covers rate limits, caching, and observability in more detail, see deploy AI model to production. The LangChain callbacks guide covers how to add proper logging and tracing to your OpenAI calls.
Handling Content Policy Errors
from openai import BadRequestError
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o-mini")
def safe_invoke(message: str) -> str:
try:
response = llm.invoke([HumanMessage(content=message)])
return response.content
except BadRequestError as e:
if "content_policy_violation" in str(e):
return "I can't help with that request."
raise # Re-raise unexpected errors
except Exception as e:
return f"An error occurred: {str(e)}"
Conclusion
LangChain's OpenAI integration goes well beyond basic API calls. Function calling and tool binding give you a structured, reliable way for models to interact with external systems. Parallel tool calls reduce latency for multi-step tasks. Structured output eliminates the parsing hell that comes with asking models to "return JSON."
On model selection: default to gpt-4o-mini, it covers most production use cases effectively. Move to gpt-4o when you're seeing quality issues on complex tasks, and reserve the o-series models for specialized reasoning workloads.
The patterns here — tool binding, parallel execution, structured output — are what separate a prototype from something you can actually run in production. For the next step, explore LangChain agent types to see how these tool-calling primitives power full autonomous agent systems.
Frequently Asked Questions
What is the difference between function calling and tool binding in LangChain?
Function calling is the underlying OpenAI API feature. Tool binding is LangChain's abstraction on top of it — you define tools with @tool decorators and call bind_tools() on the model. LangChain handles the schema conversion and response parsing automatically.
Can I use LangChain with models other than OpenAI?
Yes. LangChain supports Anthropic Claude, Google Gemini, Mistral, Cohere, Ollama (local models), and dozens of others. You swap the model class but keep the same chain and tool code. Not all providers support function calling with the same interface.
What is the best OpenAI model to use with LangChain in 2026?
For most production tasks, gpt-4o-mini gives the best cost-to-quality ratio. For complex multi-step reasoning or tasks requiring the highest accuracy, use gpt-4o. Avoid gpt-3.5-turbo for new projects — the quality gap compared to gpt-4o-mini doesn't justify the modest cost saving.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs ChatGPT: Autonomous vs Assisted AI (2026)
AutoGPT vs ChatGPT compared across control, cost, reliability, and speed. An honest 2026 verdict on when to choose autonomous agents vs assisted AI chat.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.