How to Use LangChain with Azure OpenAI Service (Enterprise)
Connect LangChain to Azure OpenAI Service for enterprise deployments. Covers AzureChatOpenAI, managed identity, embeddings, content filtering, and a comparison table.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
If your organization has already standardized on Azure, using the public OpenAI API is often off the table. Procurement, security, and compliance teams have questions: Where does the data go? Does it leave the EU? Who has access to our prompts? Azure OpenAI Service answers all of these with a Microsoft Enterprise Agreement, VNET support, data residency options, and the same SOC2/ISO27001/HIPAA compliance posture as the rest of Azure.
LangChain integrates with Azure OpenAI through dedicated AzureChatOpenAI and AzureOpenAIEmbeddings classes that expose the same interface as their non-Azure counterparts. This guide covers everything from initial setup to managed identity auth, content filtering, and enterprise patterns.
Azure OpenAI vs Direct OpenAI API
| Feature | Azure OpenAI | Direct OpenAI API |
|---|---|---|
| Data residency | Configurable (US, EU, Asia) | US-based |
| VNET integration | Yes | No |
| Managed identity auth | Yes (Azure AD) | API key only |
| Content filtering | Configurable with policy | Fixed |
| SLA | 99.9% enterprise SLA | Best effort |
| Model availability | Slight lag vs OpenAI | Latest first |
| Compliance | HIPAA, SOC2, ISO27001 | SOC2 |
| Rate limits | Per-deployment | Per-org |
| Cost | Same token pricing + Azure commitment | Pay as you go |
For enterprises in regulated industries — healthcare, finance, government — the Azure column addresses requirements that block public API usage entirely.
Prerequisites and Azure Resource Setup
You need an Azure OpenAI resource and at least one model deployment before writing Python code.
# Install dependencies
pip install langchain-openai langchain-community azure-identity openai
# Azure CLI commands to create resources (run once)
az login
az cognitiveservices account create \
--name "my-openai-resource" \
--resource-group "my-rg" \
--kind OpenAI \
--sku S0 \
--location "eastus2" \
--yes
# Deploy a model
az cognitiveservices account deployment create \
--name "my-openai-resource" \
--resource-group "my-rg" \
--deployment-name "gpt-4o-prod" \
--model-name "gpt-4o" \
--model-version "2024-11-20" \
--model-format OpenAI \
--sku-name "Standard" \
--sku-capacity 120 # TPM in thousands
Note the endpoint URL and deployment name — you need both in LangChain.
Basic Connection with API Key
import os
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_core.messages import HumanMessage
# Set these in your environment or .env file
# Never hardcode credentials
AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"] # https://my-resource.openai.azure.com/
AZURE_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
API_VERSION = "2024-02-01" # Check Azure docs for latest stable version
# Chat model
llm = AzureChatOpenAI(
azure_endpoint=AZURE_ENDPOINT,
api_key=AZURE_API_KEY,
api_version=API_VERSION,
azure_deployment="gpt-4o-prod", # Your deployment name, not the model name
temperature=0,
max_tokens=2000,
)
# Test it
response = llm.invoke([HumanMessage(content="Explain Azure OpenAI in one paragraph.")])
print(response.content)
# Embeddings model (deploy text-embedding-3-small separately)
embeddings = AzureOpenAIEmbeddings(
azure_endpoint=AZURE_ENDPOINT,
api_key=AZURE_API_KEY,
api_version=API_VERSION,
azure_deployment="text-embedding-3-small-prod",
)
vector = embeddings.embed_query("What is Azure OpenAI Service?")
print(f"Embedding dimension: {len(vector)}")
Managed Identity Authentication (Recommended for Production)
API keys in environment variables work, but enterprise deployments should use managed identity. This eliminates credentials entirely — the Azure compute environment handles authentication automatically.
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.identity import get_bearer_token_provider
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os
AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
API_VERSION = "2024-02-01"
# DefaultAzureCredential tries in order:
# 1. Environment variables (for local dev)
# 2. Managed identity (for Azure compute)
# 3. Azure CLI (for developer machines)
# 4. VS Code credentials
credential = DefaultAzureCredential()
# Create a token provider for the Azure OpenAI scope
token_provider = get_bearer_token_provider(
credential,
"https://cognitiveservices.azure.com/.default"
)
# Pass the token provider — no API key needed
llm = AzureChatOpenAI(
azure_endpoint=AZURE_ENDPOINT,
azure_ad_token_provider=token_provider,
api_version=API_VERSION,
azure_deployment="gpt-4o-prod",
temperature=0,
)
embeddings = AzureOpenAIEmbeddings(
azure_endpoint=AZURE_ENDPOINT,
azure_ad_token_provider=token_provider,
api_version=API_VERSION,
azure_deployment="text-embedding-3-small-prod",
)
# Same usage as API key auth
response = llm.invoke([HumanMessage(content="What security certifications does Azure OpenAI have?")])
print(response.content)
To enable managed identity on your Azure compute:
# Enable system-assigned managed identity on an App Service
az webapp identity assign \
--name "my-langchain-app" \
--resource-group "my-rg"
# Grant the identity access to the OpenAI resource
az role assignment create \
--role "Cognitive Services OpenAI User" \
--assignee-object-id "<identity-object-id>" \
--scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/my-openai-resource"
This is the right path for AKS deployments, Azure Functions, Azure Container Apps, and App Service. The Deploy AI model to production guide covers the full Azure deployment infrastructure.
Building a RAG Pipeline with Azure OpenAI
Once the LLM and embeddings are configured, the rest of your LangChain code works unchanged:
from langchain_community.vectorstores import AzureSearch
from langchain_community.document_loaders import AzureBlobStorageContainerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# Load documents from Azure Blob Storage
loader = AzureBlobStorageContainerLoader(
conn_str=os.environ["AZURE_STORAGE_CONNECTION_STRING"],
container="knowledge-base-docs"
)
documents = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(documents)
# Store in Azure AI Search (formerly Cognitive Search)
vector_store = AzureSearch(
azure_search_endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
azure_search_key=os.environ["AZURE_SEARCH_KEY"],
index_name="langchain-rag-index",
embedding_function=embeddings.embed_query,
)
# Add documents
vector_store.add_documents(chunks)
print(f"Indexed {len(chunks)} chunks in Azure AI Search")
# Build RAG chain
retriever = vector_store.as_retriever(search_type="semantic_hybrid", k=5)
rag_prompt = ChatPromptTemplate.from_template("""
Use the following context to answer the question.
If the answer is not in the context, say so.
Context:
{context}
Question: {question}
""")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What is the data retention policy?")
print(answer)
This pattern integrates with the full pipeline shown in the RAG system tutorial. The Azure AI Search backend provides native hybrid search without additional configuration.
Tool Calling with AzureChatOpenAI
Tool calling works identically to the standard OpenAI API:
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
@tool
def get_azure_resource_info(resource_group: str) -> str:
"""Get information about Azure resources in a resource group."""
# In production, use the Azure SDK here
return f"Resource group {resource_group} contains: 1 App Service, 2 Storage Accounts, 1 OpenAI resource"
@tool
def estimate_monthly_cost(resource_type: str, tier: str) -> str:
"""Estimate monthly cost for an Azure resource."""
cost_data = {
("App Service", "B1"): "$13.14/month",
("OpenAI", "S0"): "Usage-based, ~$0.002/1K tokens",
("Storage", "LRS"): "$0.018/GB/month"
}
return cost_data.get((resource_type, tier), "Cost data not available")
tools = [get_azure_resource_info, estimate_monthly_cost]
# Bind tools to Azure OpenAI LLM
llm_with_tools = llm.bind_tools(tools)
agent_prompt = ChatPromptTemplate.from_messages([
("system", "You are an Azure infrastructure assistant. Use tools to answer questions about Azure resources."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({
"input": "What's in the production resource group and what does it cost?"
})
print(result["output"])
This pattern connects directly to what is covered in Build AI agent with LangChain — the Azure-specific connection is transparent to the agent orchestration layer.
Structured Output with Azure OpenAI
from pydantic import BaseModel, Field
from typing import List, Optional
class AzureArchitectureReview(BaseModel):
"""Structured output for architecture review responses."""
recommendations: List[str] = Field(description="List of architectural recommendations")
security_concerns: List[str] = Field(description="Identified security risks")
estimated_monthly_cost: Optional[str] = Field(description="Rough monthly cost estimate")
priority: str = Field(description="Review priority: low, medium, high, critical")
# Azure OpenAI supports structured output via with_structured_output
structured_llm = llm.with_structured_output(AzureArchitectureReview)
review = structured_llm.invoke(
"Review this architecture: React frontend on App Service, FastAPI backend on AKS, "
"Azure SQL Database, Azure OpenAI integration, no VNET, public endpoints only."
)
print("Recommendations:")
for rec in review.recommendations:
print(f" - {rec}")
print("\nSecurity Concerns:")
for concern in review.security_concerns:
print(f" - {concern}")
print(f"\nPriority: {review.priority}")
print(f"Est. Cost: {review.estimated_monthly_cost}")
Content Filtering Configuration
Azure OpenAI's content filtering is configurable per deployment — you can tighten or relax filters based on your use case:
from openai import AzureOpenAI
# Direct Azure OpenAI client for management operations
management_client = AzureOpenAI(
azure_endpoint=AZURE_ENDPOINT,
api_key=AZURE_API_KEY,
api_version=API_VERSION
)
# Handle content filter errors gracefully in LangChain
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
import json
class ContentFilterLogger(BaseCallbackHandler):
"""Log content filter triggers for compliance reporting."""
def __init__(self):
super().__init__()
self.filter_events = []
def on_llm_error(self, error: Exception, **kwargs) -> None:
error_str = str(error)
if "content_filter" in error_str.lower() or "ResponsibleAIPolicyViolation" in error_str:
event = {
"timestamp": __import__("time").strftime("%Y-%m-%dT%H:%M:%SZ"),
"error_type": "content_filter",
"error_message": error_str[:500],
"run_id": str(kwargs.get("run_id", ""))
}
self.filter_events.append(event)
print(f"CONTENT FILTER: Request blocked - {event['timestamp']}")
filter_logger = ContentFilterLogger()
# Attach the logger to your LLM
llm_with_filter_log = AzureChatOpenAI(
azure_endpoint=AZURE_ENDPOINT,
azure_ad_token_provider=token_provider,
api_version=API_VERSION,
azure_deployment="gpt-4o-prod",
callbacks=[filter_logger],
temperature=0,
)
# Any content filter triggers will be logged to filter_logger.filter_events
# This is important for SOC2 compliance audit trails
Multi-Region Deployment for Resilience
For enterprise applications requiring high availability:
from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import RunnableLambda
import random
# Configure multiple Azure OpenAI deployments in different regions
DEPLOYMENTS = [
{
"endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_EASTUS", ""),
"deployment": "gpt-4o-prod-eastus",
"region": "eastus2"
},
{
"endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_WESTEU", ""),
"deployment": "gpt-4o-prod-westeu",
"region": "westeurope"
}
]
def create_regional_llm(deployment_config: dict) -> AzureChatOpenAI:
return AzureChatOpenAI(
azure_endpoint=deployment_config["endpoint"],
azure_ad_token_provider=token_provider,
api_version=API_VERSION,
azure_deployment=deployment_config["deployment"],
temperature=0,
timeout=30,
)
# Create LLMs for each region
regional_llms = [create_regional_llm(d) for d in DEPLOYMENTS if d["endpoint"]]
def invoke_with_failover(input_data: dict) -> str:
"""Try primary region, fall back to secondary on failure."""
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("{question}")
for i, regional_llm in enumerate(regional_llms):
try:
chain = prompt | regional_llm | StrOutputParser()
return chain.invoke(input_data)
except Exception as e:
region = DEPLOYMENTS[i]["region"] if i < len(DEPLOYMENTS) else f"region_{i}"
print(f"Region {region} failed: {e}. Trying next region...")
raise RuntimeError("All regional deployments failed")
# Use with_fallbacks for cleaner LangChain integration
if len(regional_llms) >= 2:
resilient_llm = regional_llms[0].with_fallbacks(regional_llms[1:])
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
resilient_chain = (
ChatPromptTemplate.from_template("{question}")
| resilient_llm
| StrOutputParser()
)
Async Support for High-Throughput APIs
Azure OpenAI fully supports async through LangChain:
import asyncio
from fastapi import FastAPI
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel
app = FastAPI(title="Azure OpenAI Enterprise API")
# Async LLM — same setup, just used with await
async_llm = AzureChatOpenAI(
azure_endpoint=AZURE_ENDPOINT,
azure_ad_token_provider=token_provider,
api_version=API_VERSION,
azure_deployment="gpt-4o-prod",
temperature=0,
)
chain = (
ChatPromptTemplate.from_template("{question}")
| async_llm
| StrOutputParser()
)
class QuestionRequest(BaseModel):
question: str
user_id: str = "anonymous"
@app.post("/ask")
async def ask_question(request: QuestionRequest):
try:
answer = await chain.ainvoke({"question": request.question})
return {"answer": answer, "model": "azure-gpt-4o"}
except Exception as e:
return {"error": str(e), "answer": None}
@app.post("/batch")
async def ask_batch(requests: list[QuestionRequest]):
"""Process multiple questions in parallel."""
tasks = [chain.ainvoke({"question": req.question}) for req in requests]
answers = await asyncio.gather(*tasks, return_exceptions=True)
return [
{"answer": a if not isinstance(a, Exception) else None,
"error": str(a) if isinstance(a, Exception) else None}
for a in answers
]
Monitoring with Azure Monitor
Azure OpenAI integrates with Azure Monitor automatically. Add application-level monitoring:
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging
# Configure Azure Monitor logging
logger = logging.getLogger("langchain_azure")
logger.addHandler(AzureLogHandler(
connection_string=os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING", "")
))
logger.setLevel(logging.INFO)
class AzureMonitorCallback(BaseCallbackHandler):
"""Send LLM metrics to Azure Application Insights."""
def on_llm_end(self, response, **kwargs):
if not response.llm_output:
return
usage = response.llm_output.get("token_usage", {})
model = response.llm_output.get("model_name", "unknown")
logger.info("LLM call completed", extra={
"custom_dimensions": {
"model": model,
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0)
}
})
This monitoring approach pairs with the cost tracking patterns in the LangChain callbacks token usage guide and gives you Azure-native dashboards in Application Insights.
Migration from Direct OpenAI API
If you have existing LangChain code using ChatOpenAI, migrating to Azure is mostly a configuration change:
# Before: Direct OpenAI API
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
# After: Azure OpenAI (same interface, different connection)
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_ad_token_provider=token_provider,
api_version="2024-02-01",
azure_deployment="gpt-4o-prod", # deployment name, not model name
)
# Everything below this line stays identical
chain = some_prompt | llm | StrOutputParser()
result = chain.invoke(some_input)
The key difference is azure_deployment (the deployment name you chose in the Azure portal) vs model (the model identifier in OpenAI's API). Keep these separate in your configuration.
The patterns shown here apply directly to the OpenAI API integration patterns — once you have the connection configured, all of LangChain's capabilities work the same way. For the full picture of enterprise AI deployment, the Deploy AI model to production guide covers Kubernetes deployment, auto-scaling, and observability alongside the Azure OpenAI integration.
Key Takeaways
Azure OpenAI is the right choice when your organization needs data residency guarantees, VNET integration, managed identity authentication, or enterprise compliance certifications that the public OpenAI API does not offer. LangChain's AzureChatOpenAI and AzureOpenAIEmbeddings give you all of Azure OpenAI's enterprise features through the same interface as ChatOpenAI, so the learning curve is minimal if you already know LangChain.
The managed identity pattern is the production-critical piece: no credentials in environment variables, no rotation risk, automatic token refresh, and a full audit trail through Azure Active Directory.
Frequently Asked Questions
What is the difference between Azure OpenAI and the OpenAI API? Both expose the same GPT-4o and embedding models, but Azure OpenAI runs inside your Azure subscription with VNET integration, managed identity auth, Azure Policy compliance, data residency guarantees, and Microsoft's enterprise SLA. The OpenAI API is a direct consumer endpoint with no enterprise compliance controls.
How do I authenticate LangChain with Azure OpenAI using managed identity? Install azure-identity, then pass DefaultAzureCredential to the azure_ad_token_provider parameter of AzureChatOpenAI. This works in Azure VMs, App Service, AKS pods, and Azure Functions — no API keys in environment variables.
Does LangChain's AzureChatOpenAI support streaming and tool calling? Yes. AzureChatOpenAI exposes the same interface as ChatOpenAI — it supports .stream(), .astream(), bind_tools(), and structured output via with_structured_output(). Existing LangChain chains that use ChatOpenAI work with AzureChatOpenAI after a one-line change.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
How to Use AutoGen with Azure OpenAI (Enterprise Security)
Connect Microsoft AutoGen to Azure OpenAI for enterprise-grade AI agents. Step-by-step setup with private endpoints, OAI_CONFIG_LIST, and deployment config.
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.
10 LangChain Retrieval Strategies for Better RAG Results
Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.