OpenAI API Integration: Complete Python Guide for Building AI Applications
OpenAI API integration guide — complete Python tutorial covering authentication, chat completions, function calling, assistants, embeddings, vision, and production best practices.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
OpenAI API Integration: Complete Python Guide for Building AI Applications
The OpenAI API is one of the most well-designed developer APIs I've worked with — clear documentation, consistent behavior, and a Python SDK that handles the difficult parts. Getting your first API call working takes minutes.
Getting it production-ready — with proper error handling, cost management, streaming, and function calling — takes more. This guide covers everything from the first API call to the patterns I use in real applications.
Setup and Authentication
# Install the OpenAI Python SDK
# pip install openai
# Recommended: Use environment variables for API keys
# Never hardcode API keys in source code
# Option 1: Environment variable (recommended)
import os
os.environ["OPENAI_API_KEY"] = "sk-..." # Usually set in .env file
# Option 2: .env file with python-dotenv
# pip install python-dotenv
from dotenv import load_dotenv
load_dotenv() # Loads from .env file
# Option 3: Pass directly (only for testing)
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# Recommended: let SDK read from environment
client = OpenAI() # Reads OPENAI_API_KEY automatically
# Test your connection
response = client.models.list()
print([model.id for model in response.data[:5]])
Chat Completions: Core API
from openai import OpenAI
client = OpenAI()
# Basic completion
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the difference between a list and a tuple in Python?"}
],
temperature=0.7, # 0 = deterministic, 2 = very creative
max_tokens=500, # Maximum response length
n=1, # Number of responses to generate
)
# Access the response
message = response.choices[0].message.content
print(message)
# Token usage
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
# Streaming
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain decorators in Python"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print()
# JSON mode (guaranteed valid JSON output)
json_response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Extract: name, age, email from: 'John Smith, 30, john@example.com'"
}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(json_response.choices[0].message.content)
print(data) # {"name": "John Smith", "age": 30, "email": "john@example.com"}
Function Calling (Tool Use)
import json
# Define tools (functions the model can call)
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g. 'London, UK'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search product database for items matching a query",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "description": "Maximum results to return"}
},
"required": ["query"]
}
}
}
]
# Actual function implementations
def get_current_weather(location: str, unit: str = "celsius") -> str:
# In production: call a real weather API
return f"The weather in {location} is 22°{unit[0].upper()}, partly cloudy."
def search_database(query: str, max_results: int = 5) -> str:
# In production: query your database
return f"Found {max_results} products matching '{query}': [Product 1, Product 2...]"
def execute_tool(tool_call) -> str:
"""Execute the appropriate function based on tool call."""
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
if function_name == "get_current_weather":
return get_current_weather(**arguments)
elif function_name == "search_database":
return search_database(**arguments)
else:
return f"Unknown function: {function_name}"
def agent_with_tools(user_message: str) -> str:
"""Simple agent loop with tool execution."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
tool_choice="auto" # Model decides when to use tools
)
choice = response.choices[0]
if choice.finish_reason == "stop":
# Model finished without calling a tool
return choice.message.content
elif choice.finish_reason == "tool_calls":
# Model wants to call one or more tools
messages.append(choice.message) # Add assistant message with tool calls
for tool_call in choice.message.tool_calls:
result = execute_tool(tool_call)
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
else:
break
return messages[-1]["content"]
# Example
answer = agent_with_tools("What's the weather in Tokyo, and search for umbrellas?")
print(answer)
Embeddings API
# Generate embeddings for semantic search
response = client.embeddings.create(
model="text-embedding-3-small", # 1536 dimensions, $0.02/1M tokens
input=[
"Machine learning is a subset of AI.",
"Python is great for data science.",
"The weather is sunny today."
]
)
embeddings = [item.embedding for item in response.data]
print(f"Embedding dimension: {len(embeddings[0])}") # 1536
# Semantic similarity
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# ML and data science should be more similar than weather
sim_ml_ds = cosine_similarity(embeddings[0], embeddings[1])
sim_ml_weather = cosine_similarity(embeddings[0], embeddings[2])
print(f"ML ↔ Data Science: {sim_ml_ds:.3f}") # ~0.82
print(f"ML ↔ Weather: {sim_ml_weather:.3f}") # ~0.12
# Dimension reduction (Matryoshka embeddings)
response_reduced = client.embeddings.create(
model="text-embedding-3-small",
input="Machine learning explanation",
dimensions=256 # Reduce from 1536 to 256 (cost savings, small quality loss)
)
Vision API
import base64
def analyze_image(image_path: str, prompt: str) -> str:
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}",
"detail": "high" # or "low" for faster/cheaper
}
},
{"type": "text", "text": prompt}
]
}
]
)
return response.choices[0].message.content
# Extract data from invoice
invoice_data = analyze_image(
"invoice.jpg",
"Extract all data as JSON: invoice number, date, items with prices, total."
)
Error Handling and Retries
import time
from openai import OpenAI, RateLimitError, APIConnectionError, APIStatusError
client = OpenAI(max_retries=5) # SDK handles retries automatically
def robust_completion(messages: list, max_attempts: int = 3) -> str:
"""Completion with explicit error handling."""
for attempt in range(max_attempts):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
timeout=30.0 # 30 second timeout
)
return response.choices[0].message.content
except RateLimitError as e:
if attempt == max_attempts - 1:
raise
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
except APIConnectionError:
if attempt == max_attempts - 1:
raise
time.sleep(1)
except APIStatusError as e:
if e.status_code == 400:
raise # Bad request — don't retry
if attempt == max_attempts - 1:
raise
time.sleep(1)
raise RuntimeError("Max attempts reached")
Conclusion
The OpenAI API covers most AI application patterns: chat completions for conversations, function calling for tool-using agents, embeddings for semantic search, vision for image analysis, and the Assistants API for managed conversations with file access.
The core pattern for production: use the standard Chat Completions API with gpt-4o-mini for most tasks, add function calling when you need structured outputs or tool use, and switch to gpt-4o only when quality genuinely matters for the task.
For building complete AI applications with LangChain on top of this API, see our LangChain tutorial. For cost optimization strategies, see our LLM token pricing guide.
Frequently Asked Questions
How do I get started with the OpenAI API?
Create an account at platform.openai.com, generate an API key, pip install openai, set OPENAI_API_KEY environment variable, and call client.chat.completions.create(). The minimal working example is 8 lines. Add a spending limit before going to production.
What is function calling in OpenAI API?
Describe functions as JSON schemas; the model generates structured JSON arguments when it decides to call a function; you execute the function and return results. This enables reliable structured extraction, tool-using agents, and connecting LLMs to external systems. Far more reliable than asking models to return JSON in message content.
What is the Assistants API and when should I use it?
Manages conversation threads, file uploads, and tool execution server-side. Use it for built-in Code Interpreter, managed file search (RAG), and avoiding custom conversation state management. Use Chat Completions for custom RAG, multi-model systems, or when you need full control.
How do I handle rate limits and errors?
The SDK retries automatically with exponential backoff (set max_retries=5). For custom handling: catch RateLimitError for 429s, APIConnectionError for network issues, APIStatusError for other HTTP errors. Never retry on 400 BadRequestError.
How do I reduce OpenAI API costs in production?
Use gpt-4o-mini instead of gpt-4o (33× cheaper). Control output length with explicit instructions. Use Batch API for non-real-time work (50% discount). Cache responses for repeated queries. Send only relevant context via RAG instead of full documents.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality
AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.
Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.
Build a Personal AI Assistant: Complete Python Project with Memory and Tools
Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.
CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together
CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.