AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Azure cloud console with OpenAI settings — LangChain Azure OpenAI enterprise integration

How to Use LangChain with Azure OpenAI Service (Enterprise)

⚡ Quick Answer

Connect LangChain to Azure OpenAI Service for enterprise deployments. Covers AzureChatOpenAI, managed identity, embeddings, content filtering, and a comparison table.

AiTechWorlds Team May 31, 2026 11 min read

#LangChain #Azure OpenAI #enterprise #managed identity #Azure

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

If your organization has already standardized on Azure, using the public OpenAI API is often off the table. Procurement, security, and compliance teams have questions: Where does the data go? Does it leave the EU? Who has access to our prompts? Azure OpenAI Service answers all of these with a Microsoft Enterprise Agreement, VNET support, data residency options, and the same SOC2/ISO27001/HIPAA compliance posture as the rest of Azure.

LangChain integrates with Azure OpenAI through dedicated AzureChatOpenAI and AzureOpenAIEmbeddings classes that expose the same interface as their non-Azure counterparts. This guide covers everything from initial setup to managed identity auth, content filtering, and enterprise patterns.

Azure OpenAI vs Direct OpenAI API

Feature	Azure OpenAI	Direct OpenAI API
Data residency	Configurable (US, EU, Asia)	US-based
VNET integration	Yes	No
Managed identity auth	Yes (Azure AD)	API key only
Content filtering	Configurable with policy	Fixed
SLA	99.9% enterprise SLA	Best effort
Model availability	Slight lag vs OpenAI	Latest first
Compliance	HIPAA, SOC2, ISO27001	SOC2
Rate limits	Per-deployment	Per-org
Cost	Same token pricing + Azure commitment	Pay as you go

For enterprises in regulated industries — healthcare, finance, government — the Azure column addresses requirements that block public API usage entirely.

Prerequisites and Azure Resource Setup

You need an Azure OpenAI resource and at least one model deployment before writing Python code.

# Install dependencies
pip install langchain-openai langchain-community azure-identity openai

# Azure CLI commands to create resources (run once)
az login
az cognitiveservices account create \
    --name "my-openai-resource" \
    --resource-group "my-rg" \
    --kind OpenAI \
    --sku S0 \
    --location "eastus2" \
    --yes

# Deploy a model
az cognitiveservices account deployment create \
    --name "my-openai-resource" \
    --resource-group "my-rg" \
    --deployment-name "gpt-4o-prod" \
    --model-name "gpt-4o" \
    --model-version "2024-11-20" \
    --model-format OpenAI \
    --sku-name "Standard" \
    --sku-capacity 120  # TPM in thousands

Note the endpoint URL and deployment name — you need both in LangChain.

Basic Connection with API Key

import os
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_core.messages import HumanMessage

# Set these in your environment or .env file
# Never hardcode credentials
AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]        # https://my-resource.openai.azure.com/
AZURE_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
API_VERSION = "2024-02-01"  # Check Azure docs for latest stable version

# Chat model
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",  # Your deployment name, not the model name
    temperature=0,
    max_tokens=2000,
)

# Test it
response = llm.invoke([HumanMessage(content="Explain Azure OpenAI in one paragraph.")])
print(response.content)

# Embeddings model (deploy text-embedding-3-small separately)
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION,
    azure_deployment="text-embedding-3-small-prod",
)

vector = embeddings.embed_query("What is Azure OpenAI Service?")
print(f"Embedding dimension: {len(vector)}")

Managed Identity Authentication (Recommended for Production)

API keys in environment variables work, but enterprise deployments should use managed identity. This eliminates credentials entirely — the Azure compute environment handles authentication automatically.

from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.identity import get_bearer_token_provider
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os

AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
API_VERSION = "2024-02-01"

# DefaultAzureCredential tries in order:
# 1. Environment variables (for local dev)
# 2. Managed identity (for Azure compute)
# 3. Azure CLI (for developer machines)
# 4. VS Code credentials
credential = DefaultAzureCredential()

# Create a token provider for the Azure OpenAI scope
token_provider = get_bearer_token_provider(
    credential,
    "https://cognitiveservices.azure.com/.default"
)

# Pass the token provider — no API key needed
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    temperature=0,
)

embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="text-embedding-3-small-prod",
)

# Same usage as API key auth
response = llm.invoke([HumanMessage(content="What security certifications does Azure OpenAI have?")])
print(response.content)

To enable managed identity on your Azure compute:

# Enable system-assigned managed identity on an App Service
az webapp identity assign \
    --name "my-langchain-app" \
    --resource-group "my-rg"

# Grant the identity access to the OpenAI resource
az role assignment create \
    --role "Cognitive Services OpenAI User" \
    --assignee-object-id "<identity-object-id>" \
    --scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/my-openai-resource"

This is the right path for AKS deployments, Azure Functions, Azure Container Apps, and App Service. The Deploy AI model to production guide covers the full Azure deployment infrastructure.

Building a RAG Pipeline with Azure OpenAI

Once the LLM and embeddings are configured, the rest of your LangChain code works unchanged:

from langchain_community.vectorstores import AzureSearch
from langchain_community.document_loaders import AzureBlobStorageContainerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Load documents from Azure Blob Storage
loader = AzureBlobStorageContainerLoader(
    conn_str=os.environ["AZURE_STORAGE_CONNECTION_STRING"],
    container="knowledge-base-docs"
)
documents = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(documents)

# Store in Azure AI Search (formerly Cognitive Search)
vector_store = AzureSearch(
    azure_search_endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
    azure_search_key=os.environ["AZURE_SEARCH_KEY"],
    index_name="langchain-rag-index",
    embedding_function=embeddings.embed_query,
)

# Add documents
vector_store.add_documents(chunks)
print(f"Indexed {len(chunks)} chunks in Azure AI Search")

# Build RAG chain
retriever = vector_store.as_retriever(search_type="semantic_hybrid", k=5)

rag_prompt = ChatPromptTemplate.from_template("""
Use the following context to answer the question. 
If the answer is not in the context, say so.

Context:
{context}

Question: {question}
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke("What is the data retention policy?")
print(answer)

This pattern integrates with the full pipeline shown in the RAG system tutorial. The Azure AI Search backend provides native hybrid search without additional configuration.

Tool Calling with AzureChatOpenAI

Tool calling works identically to the standard OpenAI API:

from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def get_azure_resource_info(resource_group: str) -> str:
    """Get information about Azure resources in a resource group."""
    # In production, use the Azure SDK here
    return f"Resource group {resource_group} contains: 1 App Service, 2 Storage Accounts, 1 OpenAI resource"

@tool
def estimate_monthly_cost(resource_type: str, tier: str) -> str:
    """Estimate monthly cost for an Azure resource."""
    cost_data = {
        ("App Service", "B1"): "$13.14/month",
        ("OpenAI", "S0"): "Usage-based, ~$0.002/1K tokens",
        ("Storage", "LRS"): "$0.018/GB/month"
    }
    return cost_data.get((resource_type, tier), "Cost data not available")

tools = [get_azure_resource_info, estimate_monthly_cost]

# Bind tools to Azure OpenAI LLM
llm_with_tools = llm.bind_tools(tools)

agent_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an Azure infrastructure assistant. Use tools to answer questions about Azure resources."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "What's in the production resource group and what does it cost?"
})
print(result["output"])

This pattern connects directly to what is covered in Build AI agent with LangChain — the Azure-specific connection is transparent to the agent orchestration layer.

Structured Output with Azure OpenAI

from pydantic import BaseModel, Field
from typing import List, Optional

class AzureArchitectureReview(BaseModel):
    """Structured output for architecture review responses."""
    recommendations: List[str] = Field(description="List of architectural recommendations")
    security_concerns: List[str] = Field(description="Identified security risks")
    estimated_monthly_cost: Optional[str] = Field(description="Rough monthly cost estimate")
    priority: str = Field(description="Review priority: low, medium, high, critical")

# Azure OpenAI supports structured output via with_structured_output
structured_llm = llm.with_structured_output(AzureArchitectureReview)

review = structured_llm.invoke(
    "Review this architecture: React frontend on App Service, FastAPI backend on AKS, "
    "Azure SQL Database, Azure OpenAI integration, no VNET, public endpoints only."
)

print("Recommendations:")
for rec in review.recommendations:
    print(f"  - {rec}")

print("\nSecurity Concerns:")
for concern in review.security_concerns:
    print(f"  - {concern}")

print(f"\nPriority: {review.priority}")
print(f"Est. Cost: {review.estimated_monthly_cost}")

Content Filtering Configuration

Azure OpenAI's content filtering is configurable per deployment — you can tighten or relax filters based on your use case:

from openai import AzureOpenAI

# Direct Azure OpenAI client for management operations
management_client = AzureOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION
)

# Handle content filter errors gracefully in LangChain
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
import json

class ContentFilterLogger(BaseCallbackHandler):
    """Log content filter triggers for compliance reporting."""
    
    def __init__(self):
        super().__init__()
        self.filter_events = []
    
    def on_llm_error(self, error: Exception, **kwargs) -> None:
        error_str = str(error)
        if "content_filter" in error_str.lower() or "ResponsibleAIPolicyViolation" in error_str:
            event = {
                "timestamp": __import__("time").strftime("%Y-%m-%dT%H:%M:%SZ"),
                "error_type": "content_filter",
                "error_message": error_str[:500],
                "run_id": str(kwargs.get("run_id", ""))
            }
            self.filter_events.append(event)
            print(f"CONTENT FILTER: Request blocked - {event['timestamp']}")

filter_logger = ContentFilterLogger()

# Attach the logger to your LLM
llm_with_filter_log = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    callbacks=[filter_logger],
    temperature=0,
)

# Any content filter triggers will be logged to filter_logger.filter_events
# This is important for SOC2 compliance audit trails

Multi-Region Deployment for Resilience

For enterprise applications requiring high availability:

from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import RunnableLambda
import random

# Configure multiple Azure OpenAI deployments in different regions
DEPLOYMENTS = [
    {
        "endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_EASTUS", ""),
        "deployment": "gpt-4o-prod-eastus",
        "region": "eastus2"
    },
    {
        "endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_WESTEU", ""),
        "deployment": "gpt-4o-prod-westeu",
        "region": "westeurope"
    }
]

def create_regional_llm(deployment_config: dict) -> AzureChatOpenAI:
    return AzureChatOpenAI(
        azure_endpoint=deployment_config["endpoint"],
        azure_ad_token_provider=token_provider,
        api_version=API_VERSION,
        azure_deployment=deployment_config["deployment"],
        temperature=0,
        timeout=30,
    )

# Create LLMs for each region
regional_llms = [create_regional_llm(d) for d in DEPLOYMENTS if d["endpoint"]]

def invoke_with_failover(input_data: dict) -> str:
    """Try primary region, fall back to secondary on failure."""
    from langchain_core.messages import HumanMessage
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.prompts import ChatPromptTemplate
    
    prompt = ChatPromptTemplate.from_template("{question}")
    
    for i, regional_llm in enumerate(regional_llms):
        try:
            chain = prompt | regional_llm | StrOutputParser()
            return chain.invoke(input_data)
        except Exception as e:
            region = DEPLOYMENTS[i]["region"] if i < len(DEPLOYMENTS) else f"region_{i}"
            print(f"Region {region} failed: {e}. Trying next region...")
    
    raise RuntimeError("All regional deployments failed")

# Use with_fallbacks for cleaner LangChain integration
if len(regional_llms) >= 2:
    resilient_llm = regional_llms[0].with_fallbacks(regional_llms[1:])
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.output_parsers import StrOutputParser
    
    resilient_chain = (
        ChatPromptTemplate.from_template("{question}")
        | resilient_llm
        | StrOutputParser()
    )

Async Support for High-Throughput APIs

Azure OpenAI fully supports async through LangChain:

import asyncio
from fastapi import FastAPI
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel

app = FastAPI(title="Azure OpenAI Enterprise API")

# Async LLM — same setup, just used with await
async_llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    temperature=0,
)

chain = (
    ChatPromptTemplate.from_template("{question}")
    | async_llm
    | StrOutputParser()
)

class QuestionRequest(BaseModel):
    question: str
    user_id: str = "anonymous"

@app.post("/ask")
async def ask_question(request: QuestionRequest):
    try:
        answer = await chain.ainvoke({"question": request.question})
        return {"answer": answer, "model": "azure-gpt-4o"}
    except Exception as e:
        return {"error": str(e), "answer": None}

@app.post("/batch")
async def ask_batch(requests: list[QuestionRequest]):
    """Process multiple questions in parallel."""
    tasks = [chain.ainvoke({"question": req.question}) for req in requests]
    answers = await asyncio.gather(*tasks, return_exceptions=True)
    return [
        {"answer": a if not isinstance(a, Exception) else None,
         "error": str(a) if isinstance(a, Exception) else None}
        for a in answers
    ]

Monitoring with Azure Monitor

Azure OpenAI integrates with Azure Monitor automatically. Add application-level monitoring:

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

# Configure Azure Monitor logging
logger = logging.getLogger("langchain_azure")
logger.addHandler(AzureLogHandler(
    connection_string=os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING", "")
))
logger.setLevel(logging.INFO)

class AzureMonitorCallback(BaseCallbackHandler):
    """Send LLM metrics to Azure Application Insights."""
    
    def on_llm_end(self, response, **kwargs):
        if not response.llm_output:
            return
        
        usage = response.llm_output.get("token_usage", {})
        model = response.llm_output.get("model_name", "unknown")
        
        logger.info("LLM call completed", extra={
            "custom_dimensions": {
                "model": model,
                "prompt_tokens": usage.get("prompt_tokens", 0),
                "completion_tokens": usage.get("completion_tokens", 0),
                "total_tokens": usage.get("total_tokens", 0)
            }
        })

This monitoring approach pairs with the cost tracking patterns in the LangChain callbacks token usage guide and gives you Azure-native dashboards in Application Insights.

Migration from Direct OpenAI API

If you have existing LangChain code using ChatOpenAI, migrating to Azure is mostly a configuration change:

# Before: Direct OpenAI API
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])

# After: Azure OpenAI (same interface, different connection)
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=token_provider,
    api_version="2024-02-01",
    azure_deployment="gpt-4o-prod",  # deployment name, not model name
)

# Everything below this line stays identical
chain = some_prompt | llm | StrOutputParser()
result = chain.invoke(some_input)

The key difference is azure_deployment (the deployment name you chose in the Azure portal) vs model (the model identifier in OpenAI's API). Keep these separate in your configuration.

The patterns shown here apply directly to the OpenAI API integration patterns — once you have the connection configured, all of LangChain's capabilities work the same way. For the full picture of enterprise AI deployment, the Deploy AI model to production guide covers Kubernetes deployment, auto-scaling, and observability alongside the Azure OpenAI integration.

Key Takeaways

Azure OpenAI is the right choice when your organization needs data residency guarantees, VNET integration, managed identity authentication, or enterprise compliance certifications that the public OpenAI API does not offer. LangChain's AzureChatOpenAI and AzureOpenAIEmbeddings give you all of Azure OpenAI's enterprise features through the same interface as ChatOpenAI, so the learning curve is minimal if you already know LangChain.

The managed identity pattern is the production-critical piece: no credentials in environment variables, no rotation risk, automatic token refresh, and a full audit trail through Azure Active Directory.

Frequently Asked Questions

What is the difference between Azure OpenAI and the OpenAI API? Both expose the same GPT-4o and embedding models, but Azure OpenAI runs inside your Azure subscription with VNET integration, managed identity auth, Azure Policy compliance, data residency guarantees, and Microsoft's enterprise SLA. The OpenAI API is a direct consumer endpoint with no enterprise compliance controls.

How do I authenticate LangChain with Azure OpenAI using managed identity? Install azure-identity, then pass DefaultAzureCredential to the azure_ad_token_provider parameter of AzureChatOpenAI. This works in Azure VMs, App Service, AKS pods, and Azure Functions — no API keys in environment variables.

Does LangChain's AzureChatOpenAI support streaming and tool calling? Yes. AzureChatOpenAI exposes the same interface as ChatOpenAI — it supports .stream(), .astream(), bind_tools(), and structured output via with_structured_output(). Existing LangChain chains that use ChatOpenAI work with AzureChatOpenAI after a one-line change.

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

Both expose the same GPT-4o and embedding models, but Azure OpenAI runs inside your Azure subscription with VNET integration, managed identity auth, Azure Policy compliance, data residency guarantees, and Microsoft's enterprise SLA. The OpenAI API is a direct consumer endpoint with no enterprise compliance controls.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

search relevance ranking showing scores — LangChain advanced RAG retrieval strategies

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

AI agent architecture with memory and tool connections — LangChain agent memory tools

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

developer coding AI agent decision loop — LangChain agent types ZeroShot ReAct Conversational

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

FastAPI server running LangChain endpoint — deploy LangChain FastAPI REST streaming

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

NotesAI Agent Development Notes NotesRAG: Retrieval-Augmented Generation Guide BookAI Agent Development Guide BookBuilding AI Apps: Developer's Guide CourseAI Agent Development Course ProjectAutonomous Multi-Agent System for Software Development

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Langchain

How to Use LangChain with Azure OpenAI Service (Enterprise)

⚡ Quick Answer

Connect LangChain to Azure OpenAI Service for enterprise deployments. Covers AzureChatOpenAI, managed identity, embeddings, content filtering, and a comparison table.

AiTechWorlds Team May 31, 2026 11 min read

#LangChain #Azure OpenAI #enterprise #managed identity #Azure

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Azure OpenAI vs Direct OpenAI API

Feature	Azure OpenAI	Direct OpenAI API
Data residency	Configurable (US, EU, Asia)	US-based
VNET integration	Yes	No
Managed identity auth	Yes (Azure AD)	API key only
Content filtering	Configurable with policy	Fixed
SLA	99.9% enterprise SLA	Best effort
Model availability	Slight lag vs OpenAI	Latest first
Compliance	HIPAA, SOC2, ISO27001	SOC2
Rate limits	Per-deployment	Per-org
Cost	Same token pricing + Azure commitment	Pay as you go

For enterprises in regulated industries — healthcare, finance, government — the Azure column addresses requirements that block public API usage entirely.

Prerequisites and Azure Resource Setup

You need an Azure OpenAI resource and at least one model deployment before writing Python code.

# Install dependencies
pip install langchain-openai langchain-community azure-identity openai

# Azure CLI commands to create resources (run once)
az login
az cognitiveservices account create \
    --name "my-openai-resource" \
    --resource-group "my-rg" \
    --kind OpenAI \
    --sku S0 \
    --location "eastus2" \
    --yes

# Deploy a model
az cognitiveservices account deployment create \
    --name "my-openai-resource" \
    --resource-group "my-rg" \
    --deployment-name "gpt-4o-prod" \
    --model-name "gpt-4o" \
    --model-version "2024-11-20" \
    --model-format OpenAI \
    --sku-name "Standard" \
    --sku-capacity 120  # TPM in thousands

Note the endpoint URL and deployment name — you need both in LangChain.

Basic Connection with API Key

import os
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_core.messages import HumanMessage

# Set these in your environment or .env file
# Never hardcode credentials
AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]        # https://my-resource.openai.azure.com/
AZURE_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
API_VERSION = "2024-02-01"  # Check Azure docs for latest stable version

# Chat model
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",  # Your deployment name, not the model name
    temperature=0,
    max_tokens=2000,
)

# Test it
response = llm.invoke([HumanMessage(content="Explain Azure OpenAI in one paragraph.")])
print(response.content)

# Embeddings model (deploy text-embedding-3-small separately)
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION,
    azure_deployment="text-embedding-3-small-prod",
)

vector = embeddings.embed_query("What is Azure OpenAI Service?")
print(f"Embedding dimension: {len(vector)}")

Managed Identity Authentication (Recommended for Production)

from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.identity import get_bearer_token_provider
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os

AZURE_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
API_VERSION = "2024-02-01"

# DefaultAzureCredential tries in order:
# 1. Environment variables (for local dev)
# 2. Managed identity (for Azure compute)
# 3. Azure CLI (for developer machines)
# 4. VS Code credentials
credential = DefaultAzureCredential()

# Create a token provider for the Azure OpenAI scope
token_provider = get_bearer_token_provider(
    credential,
    "https://cognitiveservices.azure.com/.default"
)

# Pass the token provider — no API key needed
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    temperature=0,
)

embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="text-embedding-3-small-prod",
)

# Same usage as API key auth
response = llm.invoke([HumanMessage(content="What security certifications does Azure OpenAI have?")])
print(response.content)

To enable managed identity on your Azure compute:

# Enable system-assigned managed identity on an App Service
az webapp identity assign \
    --name "my-langchain-app" \
    --resource-group "my-rg"

# Grant the identity access to the OpenAI resource
az role assignment create \
    --role "Cognitive Services OpenAI User" \
    --assignee-object-id "<identity-object-id>" \
    --scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/my-openai-resource"

This is the right path for AKS deployments, Azure Functions, Azure Container Apps, and App Service. The Deploy AI model to production guide covers the full Azure deployment infrastructure.

Building a RAG Pipeline with Azure OpenAI

Once the LLM and embeddings are configured, the rest of your LangChain code works unchanged:

from langchain_community.vectorstores import AzureSearch
from langchain_community.document_loaders import AzureBlobStorageContainerLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Load documents from Azure Blob Storage
loader = AzureBlobStorageContainerLoader(
    conn_str=os.environ["AZURE_STORAGE_CONNECTION_STRING"],
    container="knowledge-base-docs"
)
documents = loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
chunks = splitter.split_documents(documents)

# Store in Azure AI Search (formerly Cognitive Search)
vector_store = AzureSearch(
    azure_search_endpoint=os.environ["AZURE_SEARCH_ENDPOINT"],
    azure_search_key=os.environ["AZURE_SEARCH_KEY"],
    index_name="langchain-rag-index",
    embedding_function=embeddings.embed_query,
)

# Add documents
vector_store.add_documents(chunks)
print(f"Indexed {len(chunks)} chunks in Azure AI Search")

# Build RAG chain
retriever = vector_store.as_retriever(search_type="semantic_hybrid", k=5)

rag_prompt = ChatPromptTemplate.from_template("""
Use the following context to answer the question. 
If the answer is not in the context, say so.

Context:
{context}

Question: {question}
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke("What is the data retention policy?")
print(answer)

This pattern integrates with the full pipeline shown in the RAG system tutorial. The Azure AI Search backend provides native hybrid search without additional configuration.

Tool Calling with AzureChatOpenAI

Tool calling works identically to the standard OpenAI API:

from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def get_azure_resource_info(resource_group: str) -> str:
    """Get information about Azure resources in a resource group."""
    # In production, use the Azure SDK here
    return f"Resource group {resource_group} contains: 1 App Service, 2 Storage Accounts, 1 OpenAI resource"

@tool
def estimate_monthly_cost(resource_type: str, tier: str) -> str:
    """Estimate monthly cost for an Azure resource."""
    cost_data = {
        ("App Service", "B1"): "$13.14/month",
        ("OpenAI", "S0"): "Usage-based, ~$0.002/1K tokens",
        ("Storage", "LRS"): "$0.018/GB/month"
    }
    return cost_data.get((resource_type, tier), "Cost data not available")

tools = [get_azure_resource_info, estimate_monthly_cost]

# Bind tools to Azure OpenAI LLM
llm_with_tools = llm.bind_tools(tools)

agent_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an Azure infrastructure assistant. Use tools to answer questions about Azure resources."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "What's in the production resource group and what does it cost?"
})
print(result["output"])

This pattern connects directly to what is covered in Build AI agent with LangChain — the Azure-specific connection is transparent to the agent orchestration layer.

Structured Output with Azure OpenAI

from pydantic import BaseModel, Field
from typing import List, Optional

class AzureArchitectureReview(BaseModel):
    """Structured output for architecture review responses."""
    recommendations: List[str] = Field(description="List of architectural recommendations")
    security_concerns: List[str] = Field(description="Identified security risks")
    estimated_monthly_cost: Optional[str] = Field(description="Rough monthly cost estimate")
    priority: str = Field(description="Review priority: low, medium, high, critical")

# Azure OpenAI supports structured output via with_structured_output
structured_llm = llm.with_structured_output(AzureArchitectureReview)

review = structured_llm.invoke(
    "Review this architecture: React frontend on App Service, FastAPI backend on AKS, "
    "Azure SQL Database, Azure OpenAI integration, no VNET, public endpoints only."
)

print("Recommendations:")
for rec in review.recommendations:
    print(f"  - {rec}")

print("\nSecurity Concerns:")
for concern in review.security_concerns:
    print(f"  - {concern}")

print(f"\nPriority: {review.priority}")
print(f"Est. Cost: {review.estimated_monthly_cost}")

Content Filtering Configuration

Azure OpenAI's content filtering is configurable per deployment — you can tighten or relax filters based on your use case:

from openai import AzureOpenAI

# Direct Azure OpenAI client for management operations
management_client = AzureOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    api_version=API_VERSION
)

# Handle content filter errors gracefully in LangChain
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult
import json

class ContentFilterLogger(BaseCallbackHandler):
    """Log content filter triggers for compliance reporting."""
    
    def __init__(self):
        super().__init__()
        self.filter_events = []
    
    def on_llm_error(self, error: Exception, **kwargs) -> None:
        error_str = str(error)
        if "content_filter" in error_str.lower() or "ResponsibleAIPolicyViolation" in error_str:
            event = {
                "timestamp": __import__("time").strftime("%Y-%m-%dT%H:%M:%SZ"),
                "error_type": "content_filter",
                "error_message": error_str[:500],
                "run_id": str(kwargs.get("run_id", ""))
            }
            self.filter_events.append(event)
            print(f"CONTENT FILTER: Request blocked - {event['timestamp']}")

filter_logger = ContentFilterLogger()

# Attach the logger to your LLM
llm_with_filter_log = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    callbacks=[filter_logger],
    temperature=0,
)

# Any content filter triggers will be logged to filter_logger.filter_events
# This is important for SOC2 compliance audit trails

Multi-Region Deployment for Resilience

For enterprise applications requiring high availability:

from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import RunnableLambda
import random

# Configure multiple Azure OpenAI deployments in different regions
DEPLOYMENTS = [
    {
        "endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_EASTUS", ""),
        "deployment": "gpt-4o-prod-eastus",
        "region": "eastus2"
    },
    {
        "endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT_WESTEU", ""),
        "deployment": "gpt-4o-prod-westeu",
        "region": "westeurope"
    }
]

def create_regional_llm(deployment_config: dict) -> AzureChatOpenAI:
    return AzureChatOpenAI(
        azure_endpoint=deployment_config["endpoint"],
        azure_ad_token_provider=token_provider,
        api_version=API_VERSION,
        azure_deployment=deployment_config["deployment"],
        temperature=0,
        timeout=30,
    )

# Create LLMs for each region
regional_llms = [create_regional_llm(d) for d in DEPLOYMENTS if d["endpoint"]]

def invoke_with_failover(input_data: dict) -> str:
    """Try primary region, fall back to secondary on failure."""
    from langchain_core.messages import HumanMessage
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.prompts import ChatPromptTemplate
    
    prompt = ChatPromptTemplate.from_template("{question}")
    
    for i, regional_llm in enumerate(regional_llms):
        try:
            chain = prompt | regional_llm | StrOutputParser()
            return chain.invoke(input_data)
        except Exception as e:
            region = DEPLOYMENTS[i]["region"] if i < len(DEPLOYMENTS) else f"region_{i}"
            print(f"Region {region} failed: {e}. Trying next region...")
    
    raise RuntimeError("All regional deployments failed")

# Use with_fallbacks for cleaner LangChain integration
if len(regional_llms) >= 2:
    resilient_llm = regional_llms[0].with_fallbacks(regional_llms[1:])
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.output_parsers import StrOutputParser
    
    resilient_chain = (
        ChatPromptTemplate.from_template("{question}")
        | resilient_llm
        | StrOutputParser()
    )

Async Support for High-Throughput APIs

Azure OpenAI fully supports async through LangChain:

import asyncio
from fastapi import FastAPI
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel

app = FastAPI(title="Azure OpenAI Enterprise API")

# Async LLM — same setup, just used with await
async_llm = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    azure_ad_token_provider=token_provider,
    api_version=API_VERSION,
    azure_deployment="gpt-4o-prod",
    temperature=0,
)

chain = (
    ChatPromptTemplate.from_template("{question}")
    | async_llm
    | StrOutputParser()
)

class QuestionRequest(BaseModel):
    question: str
    user_id: str = "anonymous"

@app.post("/ask")
async def ask_question(request: QuestionRequest):
    try:
        answer = await chain.ainvoke({"question": request.question})
        return {"answer": answer, "model": "azure-gpt-4o"}
    except Exception as e:
        return {"error": str(e), "answer": None}

@app.post("/batch")
async def ask_batch(requests: list[QuestionRequest]):
    """Process multiple questions in parallel."""
    tasks = [chain.ainvoke({"question": req.question}) for req in requests]
    answers = await asyncio.gather(*tasks, return_exceptions=True)
    return [
        {"answer": a if not isinstance(a, Exception) else None,
         "error": str(a) if isinstance(a, Exception) else None}
        for a in answers
    ]

Monitoring with Azure Monitor

Azure OpenAI integrates with Azure Monitor automatically. Add application-level monitoring:

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

# Configure Azure Monitor logging
logger = logging.getLogger("langchain_azure")
logger.addHandler(AzureLogHandler(
    connection_string=os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING", "")
))
logger.setLevel(logging.INFO)

class AzureMonitorCallback(BaseCallbackHandler):
    """Send LLM metrics to Azure Application Insights."""
    
    def on_llm_end(self, response, **kwargs):
        if not response.llm_output:
            return
        
        usage = response.llm_output.get("token_usage", {})
        model = response.llm_output.get("model_name", "unknown")
        
        logger.info("LLM call completed", extra={
            "custom_dimensions": {
                "model": model,
                "prompt_tokens": usage.get("prompt_tokens", 0),
                "completion_tokens": usage.get("completion_tokens", 0),
                "total_tokens": usage.get("total_tokens", 0)
            }
        })

This monitoring approach pairs with the cost tracking patterns in the LangChain callbacks token usage guide and gives you Azure-native dashboards in Application Insights.

Migration from Direct OpenAI API

If you have existing LangChain code using ChatOpenAI, migrating to Azure is mostly a configuration change:

# Before: Direct OpenAI API
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])

# After: Azure OpenAI (same interface, different connection)
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=token_provider,
    api_version="2024-02-01",
    azure_deployment="gpt-4o-prod",  # deployment name, not model name
)

# Everything below this line stays identical
chain = some_prompt | llm | StrOutputParser()
result = chain.invoke(some_input)

The key difference is azure_deployment (the deployment name you chose in the Azure portal) vs model (the model identifier in OpenAI's API). Keep these separate in your configuration.

Key Takeaways

Frequently Asked Questions

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

How to Use LangChain with Azure OpenAI Service (Enterprise)

Azure OpenAI vs Direct OpenAI API

Prerequisites and Azure Resource Setup

Basic Connection with API Key

Managed Identity Authentication (Recommended for Production)

Building a RAG Pipeline with Azure OpenAI

Tool Calling with AzureChatOpenAI

Structured Output with Azure OpenAI

Content Filtering Configuration

Multi-Region Deployment for Resilience

Async Support for High-Throughput APIs

Monitoring with Azure Monitor

Migration from Direct OpenAI API

Key Takeaways

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily

How to Use LangChain with Azure OpenAI Service (Enterprise)

Azure OpenAI vs Direct OpenAI API

Prerequisites and Azure Resource Setup

Basic Connection with API Key

Managed Identity Authentication (Recommended for Production)

Building a RAG Pipeline with Azure OpenAI

Tool Calling with AzureChatOpenAI

Structured Output with Azure OpenAI

Content Filtering Configuration

Multi-Region Deployment for Resilience

Async Support for High-Throughput APIs

Monitoring with Azure Monitor

Migration from Direct OpenAI API

Key Takeaways

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily