AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

Google Cloud Vertex AI with LangChain — Gemini integration

How to Use LangChain with Vertex AI (Google Gemini 2026)

⚡ Quick Answer

Integrate LangChain with Google Vertex AI and Gemini models. Complete guide covering ChatVertexAI, embeddings, multimodal inputs, function calling, and cost comparison.

AiTechWorlds Team May 31, 2026 12 min read

#LangChain #Vertex AI #Gemini #Google Cloud #multimodal

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Google's Gemini models on Vertex AI offer competitive performance, massive context windows, and first-class multimodal support. If your infrastructure already runs on Google Cloud — or if you need a 1-million-token context window for long-document processing — Vertex AI is the natural choice.

This guide covers everything you need to integrate LangChain with Vertex AI: authentication, the ChatVertexAI and VertexAI classes, embeddings, multimodal inputs, function calling, and a head-to-head cost comparison with OpenAI.

If you're building the same applications with OpenAI, see OpenAI API integration for comparison. The LangChain tutorial 2025 covers the shared LangChain patterns.

Why Vertex AI with LangChain?

Three reasons teams choose Vertex AI over other providers:

Context window — Gemini 1.5 Pro supports 1M tokens; Gemini 1.5 Flash supports 1M at lower cost
Google Cloud integration — Native access to BigQuery, Cloud Storage, GCS data, and GCP IAM
Multimodal — Video, audio, image, and text in a single API call

The LangChain langchain-google-vertexai package provides drop-in replacements for OpenAI classes — swap ChatOpenAI → ChatVertexAI and most of your existing code continues working.

Installation and Authentication

pip install langchain langchain-google-vertexai google-cloud-aiplatform

Authentication Option 1: Application Default Credentials (local development)

gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID

Authentication Option 2: Service Account (production)

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/service-account-key.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"

Authentication Option 3: Explicit credentials in code

from google.oauth2 import service_account
from langchain_google_vertexai import ChatVertexAI

credentials = service_account.Credentials.from_service_account_file(
    "service-account.json",
    scopes=["https://www.googleapis.com/auth/cloud-platform"]
)

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    credentials=credentials,
    project="your-project-id",
    location="us-central1"
)

Basic Usage: ChatVertexAI

from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize Gemini 1.5 Pro
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0.1,
    max_output_tokens=2048,
)

# Simple invocation
response = llm.invoke([
    SystemMessage(content="You are a helpful assistant specialized in cloud architecture."),
    HumanMessage(content="What are the key differences between microservices and serverless architectures?")
])
print(response.content)

Available models (2026):

Model	Context Window	Best For
gemini-1.5-pro	1,000,000 tokens	Complex reasoning, long docs
gemini-1.5-flash	1,000,000 tokens	Fast, cost-effective tasks
gemini-1.0-pro	32,768 tokens	Standard chat and generation
gemini-1.5-pro-vision	1,000,000 tokens	Image + text analysis

VertexAI for Text Completion (Non-Chat)

For legacy completion-style workflows:

from langchain_google_vertexai import VertexAI

# Text completion (non-chat) model
text_llm = VertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=1024,
    top_p=0.8,
    top_k=40
)

# Simple completion
result = text_llm.invoke("Explain transformer self-attention in 3 sentences.")
print(result)

# Batch processing
results = text_llm.batch([
    "Explain gradient descent",
    "What is RLHF?",
    "Define embedding in ML"
])
for r in results:
    print(r[:200])

Vertex AI Embeddings

from langchain_google_vertexai import VertexAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# Initialize Vertex AI embeddings
embeddings = VertexAIEmbeddings(
    model_name="textembedding-gecko@003",  # or "text-embedding-004"
    project="your-project-id",
    location="us-central1"
)

# Embed a single text
embedding_vector = embeddings.embed_query("What is machine learning?")
print(f"Embedding dimension: {len(embedding_vector)}")  # 768 for gecko, 768 for text-embedding-004

# Embed multiple texts
texts = [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks with many layers",
    "Reinforcement learning trains through reward signals"
]
embedded_docs = embeddings.embed_documents(texts)
print(f"Embedded {len(embedded_docs)} documents, dimension {len(embedded_docs[0])}")

# Use with ChromaDB (drop-in for OpenAI embeddings)
docs = [
    Document(page_content=text, metadata={"index": i})
    for i, text in enumerate(texts)
]

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="vertex_demo"
)

results = vectorstore.similarity_search("neural networks", k=2)
for doc in results:
    print(doc.page_content)

Available embedding models:

Model	Dimensions	Best For
textembedding-gecko@003	768	General purpose
text-embedding-004	768	Latest, improved quality
textembedding-gecko-multilingual@001	768	100+ languages

Multimodal Inputs with Gemini Vision

One of Gemini's strongest advantages is native multimodal support:

from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage
import base64
from pathlib import Path

# Initialize vision-capable model
vision_llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

def encode_image_base64(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode()

# Image analysis
def analyze_image(image_path: str, question: str) -> str:
    image_data = encode_image_base64(image_path)
    
    message = HumanMessage(
        content=[
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_data}"
                }
            },
            {
                "type": "text",
                "text": question
            }
        ]
    )
    
    response = vision_llm.invoke([message])
    return response.content

# Example: analyze a chart
result = analyze_image(
    "quarterly_revenue.png",
    "What trend do you see in this revenue chart? Identify any notable changes."
)
print(result)

# Video analysis (Gemini 1.5 exclusive feature)
def analyze_video_from_gcs(gcs_uri: str, question: str) -> str:
    """Analyze a video stored in Google Cloud Storage."""
    message = HumanMessage(
        content=[
            {
                "type": "media",
                "file_uri": gcs_uri,  # gs://bucket/video.mp4
                "mime_type": "video/mp4"
            },
            {
                "type": "text",
                "text": question
            }
        ]
    )
    
    response = vision_llm.invoke([message])
    return response.content

# Analyze a product demo video
video_analysis = analyze_video_from_gcs(
    "gs://my-bucket/product-demo.mp4",
    "Summarize the main features demonstrated in this product video."
)

Video analysis is exclusive to Gemini 1.5 — you can pass entire video files and ask questions about them. This enables use cases like meeting summarization, training video indexing, and product demo analysis that are impossible with OpenAI models.

Function Calling with ChatVertexAI

from langchain_core.tools import tool
from langchain_google_vertexai import ChatVertexAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def get_gcp_project_info(project_id: str) -> str:
    """Get information about a GCP project including billing and resources."""
    # Mock implementation
    return f"Project {project_id}: Region us-central1, Budget $500/month, Services: GCS, BigQuery, Vertex AI"

@tool
def query_bigquery(sql: str, project_id: str) -> str:
    """Execute a BigQuery SQL query and return results."""
    # Mock implementation — replace with actual BigQuery client
    return f"Query executed. Result: 1,247 rows returned. Sample: [('2026-01-01', 15234), ('2026-01-02', 16891)]"

@tool
def list_gcs_buckets(project_id: str) -> str:
    """List Cloud Storage buckets in a GCP project."""
    return f"Buckets in {project_id}: ml-training-data, model-artifacts, raw-data, processed-features"

# Build Vertex AI agent
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

tools = [get_gcp_project_info, query_bigquery, list_gcs_buckets]
llm_with_tools = llm.bind_tools(tools)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a Google Cloud assistant. Use the available tools to answer questions about GCP resources."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "What BigQuery data do we have in our project and how much storage are we using?",
    "chat_history": []
})
print(result["output"])

RAG Pipeline with Vertex AI

Build a complete RAG system using Vertex AI embeddings and Gemini for generation:

from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Initialize Vertex AI components
embeddings = VertexAIEmbeddings(
    model_name="text-embedding-004",
    project="your-project-id",
    location="us-central1"
)

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=4096
)

# Load and index documents
loader = PyPDFLoader("technical_manual.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
chunks = splitter.split_documents(pages)

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="vertex_rag"
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# RAG prompt for Gemini
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a technical documentation assistant.
    Answer questions based strictly on the provided context.
    If the answer is not in the context, say "This information is not in the provided documentation."
    Cite specific sections when possible."""),
    ("human", """Context:
{context}

Question: {question}

Answer:""")
])

def format_docs(docs):
    return "\n\n---\n\n".join(
        f"[Page {doc.metadata.get('page', '?')}]\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    RunnableParallel({
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    })
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Query the RAG system
answer = rag_chain.invoke("What are the safety requirements for high-voltage operations?")
print(answer)

Long-Context Processing with Gemini 1.5 Pro

Gemini's 1M token context window enables "whole-document RAG" — feeding an entire document as context:

from langchain_google_vertexai import ChatVertexAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import ChatPromptTemplate

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=8192
)

# Load an entire book or long document
loader = PyPDFLoader("full_textbook.pdf")
all_pages = loader.load()
full_text = "\n\n".join(page.page_content for page in all_pages)

print(f"Document length: {len(full_text.split())} words")
# → For a 500-page book: ~125,000 words ≈ 166,000 tokens (well within 1M limit)

# Ask questions about the entire document at once
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are analyzing a complete technical document. Answer questions about its full content."),
    ("human", """Here is the complete document:

{document}

Question: {question}""")
])

chain = prompt | llm

# No chunking or retrieval needed for documents under ~700K words
answer = chain.invoke({
    "document": full_text,
    "question": "What are the three most important concepts introduced in Chapter 7, and how do they relate to each other?"
})
print(answer.content)

This is a fundamentally different approach to document QA compared to traditional RAG. For documents under 700K words, you can skip chunking and retrieval entirely and just pass everything to Gemini. The RAG system tutorial compares both approaches with benchmarks.

Vertex AI vs OpenAI Pricing Comparison (2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
Gemini 1.5 Pro (≤128K)	$1.25	$5.00	1M tokens
Gemini 1.5 Pro (>128K)	$2.50	$10.00	1M tokens
Gemini 1.5 Flash (≤128K)	$0.075	$0.30	1M tokens
Gemini 1.5 Flash (>128K)	$0.15	$0.60	1M tokens
GPT-4o	$5.00	$15.00	128K tokens
GPT-4o-mini	$0.15	$0.60	128K tokens
Claude 3.5 Sonnet	$3.00	$15.00	200K tokens

Cost analysis for 10K queries/day (RAG, 4K tokens in / 500 out):

Gemini 1.5 Flash: $0.075/M × 4K × 10K = $3/day in + $0.30/M × 500 × 10K = $1.50/day out = $4.50/day
GPT-4o: $5/M × 4K × 10K = $200/day in + $15/M × 500 × 10K = $75/day out = $275/day
GPT-4o-mini: $0.15/M × 4K × 10K = $6/day in + $0.60/M × 500 × 10K = $3/day out = $9/day

Gemini 1.5 Flash is the most cost-effective option for standard RAG. Gemini 1.5 Pro competes with Claude 3.5 Sonnet at lower per-token pricing.

Streaming Responses

from langchain_google_vertexai import ChatVertexAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatVertexAI(
    model_name="gemini-1.5-flash",
    project="your-project-id",
    location="us-central1",
    streaming=True  # Enable streaming
)

prompt = ChatPromptTemplate.from_template(
    "Write a detailed explanation of {topic} for a software engineer audience."
)

chain = prompt | llm | StrOutputParser()

# Synchronous streaming
print("Streaming response:")
for chunk in chain.stream({"topic": "transformer attention mechanisms"}):
    print(chunk, end="", flush=True)
print()

# Async streaming
import asyncio

async def stream_async(topic: str):
    print("\nAsync streaming:")
    async for chunk in chain.astream({"topic": topic}):
        print(chunk, end="", flush=True)
    print()

asyncio.run(stream_async("RLHF training process"))

Switching Between Providers

One of LangChain's best features is provider portability. Swap Vertex AI for OpenAI with minimal code changes:

import os

# Switch via environment variable
PROVIDER = os.getenv("LLM_PROVIDER", "vertex")

if PROVIDER == "vertex":
    from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
    llm = ChatVertexAI(
        model_name="gemini-1.5-pro",
        project=os.environ["GCP_PROJECT"],
        location="us-central1"
    )
    embeddings = VertexAIEmbeddings(
        model_name="text-embedding-004",
        project=os.environ["GCP_PROJECT"],
        location="us-central1"
    )
elif PROVIDER == "openai":
    from langchain_openai import ChatOpenAI, OpenAIEmbeddings
    llm = ChatOpenAI(model="gpt-4o")
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
elif PROVIDER == "anthropic":
    from langchain_anthropic import ChatAnthropic
    from langchain_openai import OpenAIEmbeddings  # Anthropic has no embedding model
    llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# The rest of your RAG chain, agent code, etc. stays identical

This pattern is invaluable for running A/B tests between providers or building provider-agnostic applications. The OpenAI API integration guide covers the OpenAI-specific features that don't map directly to Vertex AI.

Async Batch Processing on Vertex AI

import asyncio
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    model_name="gemini-1.5-flash",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

async def process_documents_async(documents: list[str]) -> list[str]:
    """Process multiple documents concurrently."""
    prompt = ChatPromptTemplate.from_template(
        "Summarize this document in 2 sentences: {doc}"
    )
    chain = prompt | llm | StrOutputParser()
    
    # Vertex AI allows up to 60 concurrent requests
    semaphore = asyncio.Semaphore(20)
    
    async def process_one(doc: str) -> str:
        async with semaphore:
            return await chain.ainvoke({"doc": doc})
    
    tasks = [process_one(doc) for doc in documents]
    return await asyncio.gather(*tasks, return_exceptions=True)

# Process 100 documents in parallel
documents = [f"Technical document {i} with content about ML systems..." for i in range(100)]
summaries = asyncio.run(process_documents_async(documents))
print(f"Processed {len(summaries)} documents")

For large-scale document processing, Vertex AI's batch API is even more cost-effective — 50% discount on gemini-1.5-flash for asynchronous batch jobs. See Google's Vertex AI Batch Prediction docs for setup.

Production Considerations

Quotas and Rate Limits:

Gemini 1.5 Pro: 360 requests/minute, 4M tokens/minute
Gemini 1.5 Flash: 1,000 requests/minute, 4M tokens/minute
Request increases via GCP support for production workloads

Regional Availability:

Models are available in us-central1, us-east4, europe-west4, and several others
Deploy in the same region as your other GCP services to minimize latency and egress costs

Logging and Monitoring:

from langchain_core.callbacks import StdOutCallbackHandler
from langchain_google_vertexai import ChatVertexAI

# Enable Cloud Logging via GCP (automatic when running on GCP)
# For local development, use LangChain callbacks
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    callbacks=[StdOutCallbackHandler()]  # Log all LLM calls
)

For deploying Vertex AI applications to production, the Deploy AI model to production guide covers Cloud Run, GKE, and serverless deployment patterns. For agent architectures compatible with Vertex AI, see Build AI agent with LangChain and AI agent memory and planning.

Frequently Asked Questions

Do I need a Google Cloud account to use LangChain with Vertex AI? Yes. Vertex AI requires a Google Cloud project with billing enabled. You authenticate either via Application Default Credentials (gcloud auth application-default login) or a service account JSON key. New GCP accounts receive $300 in free credits, which covers substantial Vertex AI usage for testing.

How does Gemini 1.5 Pro compare to GPT-4o for RAG applications? Gemini 1.5 Pro has a 1 million token context window (vs 128K for GPT-4o), making it better for whole-document RAG where you want to pass entire PDFs. GPT-4o generally has faster response times and broader tool ecosystem support. For pure context size, Gemini wins; for latency and ecosystem, OpenAI wins.

Can I use Vertex AI embeddings with Pinecone or ChromaDB in LangChain? Yes. VertexAIEmbeddings is a drop-in replacement for OpenAIEmbeddings in any LangChain vector store integration. Just replace the embeddings parameter with VertexAIEmbeddings() and the rest of your RAG code stays unchanged.

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

Yes. Vertex AI requires a Google Cloud project with billing enabled. You authenticate either via Application Default Credentials (gcloud auth application-default login) or a service account JSON key. New GCP accounts receive $300 in free credits, which covers substantial Vertex AI usage for testing.

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

search relevance ranking showing scores — LangChain advanced RAG retrieval strategies

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

AI agent architecture with memory and tool connections — LangChain agent memory tools

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

developer coding AI agent decision loop — LangChain agent types ZeroShot ReAct Conversational

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

FastAPI server running LangChain endpoint — deploy LangChain FastAPI REST streaming

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

NotesAI Agent Development Notes NotesRAG: Retrieval-Augmented Generation Guide BookAI Agent Development Guide BookBuilding AI Apps: Developer's Guide CourseAI Agent Development Course ToolPrompt Token Counter — Estimate ChatGPT, Claude & Gemini Tokens

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Langchain

How to Use LangChain with Vertex AI (Google Gemini 2026)

⚡ Quick Answer

Integrate LangChain with Google Vertex AI and Gemini models. Complete guide covering ChatVertexAI, embeddings, multimodal inputs, function calling, and cost comparison.

AiTechWorlds Team May 31, 2026 12 min read

#LangChain #Vertex AI #Gemini #Google Cloud #multimodal

📚Part of the Langchain guide — explore all Langchain articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

If you're building the same applications with OpenAI, see OpenAI API integration for comparison. The LangChain tutorial 2025 covers the shared LangChain patterns.

Why Vertex AI with LangChain?

Three reasons teams choose Vertex AI over other providers:

Context window — Gemini 1.5 Pro supports 1M tokens; Gemini 1.5 Flash supports 1M at lower cost
Google Cloud integration — Native access to BigQuery, Cloud Storage, GCS data, and GCP IAM
Multimodal — Video, audio, image, and text in a single API call

The LangChain langchain-google-vertexai package provides drop-in replacements for OpenAI classes — swap ChatOpenAI → ChatVertexAI and most of your existing code continues working.

Installation and Authentication

pip install langchain langchain-google-vertexai google-cloud-aiplatform

Authentication Option 1: Application Default Credentials (local development)

gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID

Authentication Option 2: Service Account (production)

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/service-account-key.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"

Authentication Option 3: Explicit credentials in code

from google.oauth2 import service_account
from langchain_google_vertexai import ChatVertexAI

credentials = service_account.Credentials.from_service_account_file(
    "service-account.json",
    scopes=["https://www.googleapis.com/auth/cloud-platform"]
)

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    credentials=credentials,
    project="your-project-id",
    location="us-central1"
)

Basic Usage: ChatVertexAI

from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize Gemini 1.5 Pro
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0.1,
    max_output_tokens=2048,
)

# Simple invocation
response = llm.invoke([
    SystemMessage(content="You are a helpful assistant specialized in cloud architecture."),
    HumanMessage(content="What are the key differences between microservices and serverless architectures?")
])
print(response.content)

Available models (2026):

Model	Context Window	Best For
gemini-1.5-pro	1,000,000 tokens	Complex reasoning, long docs
gemini-1.5-flash	1,000,000 tokens	Fast, cost-effective tasks
gemini-1.0-pro	32,768 tokens	Standard chat and generation
gemini-1.5-pro-vision	1,000,000 tokens	Image + text analysis

VertexAI for Text Completion (Non-Chat)

For legacy completion-style workflows:

from langchain_google_vertexai import VertexAI

# Text completion (non-chat) model
text_llm = VertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=1024,
    top_p=0.8,
    top_k=40
)

# Simple completion
result = text_llm.invoke("Explain transformer self-attention in 3 sentences.")
print(result)

# Batch processing
results = text_llm.batch([
    "Explain gradient descent",
    "What is RLHF?",
    "Define embedding in ML"
])
for r in results:
    print(r[:200])

Vertex AI Embeddings

from langchain_google_vertexai import VertexAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# Initialize Vertex AI embeddings
embeddings = VertexAIEmbeddings(
    model_name="textembedding-gecko@003",  # or "text-embedding-004"
    project="your-project-id",
    location="us-central1"
)

# Embed a single text
embedding_vector = embeddings.embed_query("What is machine learning?")
print(f"Embedding dimension: {len(embedding_vector)}")  # 768 for gecko, 768 for text-embedding-004

# Embed multiple texts
texts = [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks with many layers",
    "Reinforcement learning trains through reward signals"
]
embedded_docs = embeddings.embed_documents(texts)
print(f"Embedded {len(embedded_docs)} documents, dimension {len(embedded_docs[0])}")

# Use with ChromaDB (drop-in for OpenAI embeddings)
docs = [
    Document(page_content=text, metadata={"index": i})
    for i, text in enumerate(texts)
]

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="vertex_demo"
)

results = vectorstore.similarity_search("neural networks", k=2)
for doc in results:
    print(doc.page_content)

Available embedding models:

Model	Dimensions	Best For
textembedding-gecko@003	768	General purpose
text-embedding-004	768	Latest, improved quality
textembedding-gecko-multilingual@001	768	100+ languages

Multimodal Inputs with Gemini Vision

One of Gemini's strongest advantages is native multimodal support:

from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage
import base64
from pathlib import Path

# Initialize vision-capable model
vision_llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

def encode_image_base64(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode()

# Image analysis
def analyze_image(image_path: str, question: str) -> str:
    image_data = encode_image_base64(image_path)
    
    message = HumanMessage(
        content=[
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_data}"
                }
            },
            {
                "type": "text",
                "text": question
            }
        ]
    )
    
    response = vision_llm.invoke([message])
    return response.content

# Example: analyze a chart
result = analyze_image(
    "quarterly_revenue.png",
    "What trend do you see in this revenue chart? Identify any notable changes."
)
print(result)

# Video analysis (Gemini 1.5 exclusive feature)
def analyze_video_from_gcs(gcs_uri: str, question: str) -> str:
    """Analyze a video stored in Google Cloud Storage."""
    message = HumanMessage(
        content=[
            {
                "type": "media",
                "file_uri": gcs_uri,  # gs://bucket/video.mp4
                "mime_type": "video/mp4"
            },
            {
                "type": "text",
                "text": question
            }
        ]
    )
    
    response = vision_llm.invoke([message])
    return response.content

# Analyze a product demo video
video_analysis = analyze_video_from_gcs(
    "gs://my-bucket/product-demo.mp4",
    "Summarize the main features demonstrated in this product video."
)

Function Calling with ChatVertexAI

from langchain_core.tools import tool
from langchain_google_vertexai import ChatVertexAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

@tool
def get_gcp_project_info(project_id: str) -> str:
    """Get information about a GCP project including billing and resources."""
    # Mock implementation
    return f"Project {project_id}: Region us-central1, Budget $500/month, Services: GCS, BigQuery, Vertex AI"

@tool
def query_bigquery(sql: str, project_id: str) -> str:
    """Execute a BigQuery SQL query and return results."""
    # Mock implementation — replace with actual BigQuery client
    return f"Query executed. Result: 1,247 rows returned. Sample: [('2026-01-01', 15234), ('2026-01-02', 16891)]"

@tool
def list_gcs_buckets(project_id: str) -> str:
    """List Cloud Storage buckets in a GCP project."""
    return f"Buckets in {project_id}: ml-training-data, model-artifacts, raw-data, processed-features"

# Build Vertex AI agent
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

tools = [get_gcp_project_info, query_bigquery, list_gcs_buckets]
llm_with_tools = llm.bind_tools(tools)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a Google Cloud assistant. Use the available tools to answer questions about GCP resources."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "What BigQuery data do we have in our project and how much storage are we using?",
    "chat_history": []
})
print(result["output"])

RAG Pipeline with Vertex AI

Build a complete RAG system using Vertex AI embeddings and Gemini for generation:

from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Initialize Vertex AI components
embeddings = VertexAIEmbeddings(
    model_name="text-embedding-004",
    project="your-project-id",
    location="us-central1"
)

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=4096
)

# Load and index documents
loader = PyPDFLoader("technical_manual.pdf")
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
chunks = splitter.split_documents(pages)

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="vertex_rag"
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# RAG prompt for Gemini
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a technical documentation assistant.
    Answer questions based strictly on the provided context.
    If the answer is not in the context, say "This information is not in the provided documentation."
    Cite specific sections when possible."""),
    ("human", """Context:
{context}

Question: {question}

Answer:""")
])

def format_docs(docs):
    return "\n\n---\n\n".join(
        f"[Page {doc.metadata.get('page', '?')}]\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    RunnableParallel({
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    })
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Query the RAG system
answer = rag_chain.invoke("What are the safety requirements for high-voltage operations?")
print(answer)

Long-Context Processing with Gemini 1.5 Pro

Gemini's 1M token context window enables "whole-document RAG" — feeding an entire document as context:

from langchain_google_vertexai import ChatVertexAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.prompts import ChatPromptTemplate

llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    temperature=0,
    max_output_tokens=8192
)

# Load an entire book or long document
loader = PyPDFLoader("full_textbook.pdf")
all_pages = loader.load()
full_text = "\n\n".join(page.page_content for page in all_pages)

print(f"Document length: {len(full_text.split())} words")
# → For a 500-page book: ~125,000 words ≈ 166,000 tokens (well within 1M limit)

# Ask questions about the entire document at once
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are analyzing a complete technical document. Answer questions about its full content."),
    ("human", """Here is the complete document:

{document}

Question: {question}""")
])

chain = prompt | llm

# No chunking or retrieval needed for documents under ~700K words
answer = chain.invoke({
    "document": full_text,
    "question": "What are the three most important concepts introduced in Chapter 7, and how do they relate to each other?"
})
print(answer.content)

Vertex AI vs OpenAI Pricing Comparison (2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
Gemini 1.5 Pro (≤128K)	$1.25	$5.00	1M tokens
Gemini 1.5 Pro (>128K)	$2.50	$10.00	1M tokens
Gemini 1.5 Flash (≤128K)	$0.075	$0.30	1M tokens
Gemini 1.5 Flash (>128K)	$0.15	$0.60	1M tokens
GPT-4o	$5.00	$15.00	128K tokens
GPT-4o-mini	$0.15	$0.60	128K tokens
Claude 3.5 Sonnet	$3.00	$15.00	200K tokens

Cost analysis for 10K queries/day (RAG, 4K tokens in / 500 out):

Gemini 1.5 Flash: $0.075/M × 4K × 10K = $3/day in + $0.30/M × 500 × 10K = $1.50/day out = $4.50/day
GPT-4o: $5/M × 4K × 10K = $200/day in + $15/M × 500 × 10K = $75/day out = $275/day
GPT-4o-mini: $0.15/M × 4K × 10K = $6/day in + $0.60/M × 500 × 10K = $3/day out = $9/day

Gemini 1.5 Flash is the most cost-effective option for standard RAG. Gemini 1.5 Pro competes with Claude 3.5 Sonnet at lower per-token pricing.

Streaming Responses

from langchain_google_vertexai import ChatVertexAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatVertexAI(
    model_name="gemini-1.5-flash",
    project="your-project-id",
    location="us-central1",
    streaming=True  # Enable streaming
)

prompt = ChatPromptTemplate.from_template(
    "Write a detailed explanation of {topic} for a software engineer audience."
)

chain = prompt | llm | StrOutputParser()

# Synchronous streaming
print("Streaming response:")
for chunk in chain.stream({"topic": "transformer attention mechanisms"}):
    print(chunk, end="", flush=True)
print()

# Async streaming
import asyncio

async def stream_async(topic: str):
    print("\nAsync streaming:")
    async for chunk in chain.astream({"topic": topic}):
        print(chunk, end="", flush=True)
    print()

asyncio.run(stream_async("RLHF training process"))

Switching Between Providers

One of LangChain's best features is provider portability. Swap Vertex AI for OpenAI with minimal code changes:

import os

# Switch via environment variable
PROVIDER = os.getenv("LLM_PROVIDER", "vertex")

if PROVIDER == "vertex":
    from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings
    llm = ChatVertexAI(
        model_name="gemini-1.5-pro",
        project=os.environ["GCP_PROJECT"],
        location="us-central1"
    )
    embeddings = VertexAIEmbeddings(
        model_name="text-embedding-004",
        project=os.environ["GCP_PROJECT"],
        location="us-central1"
    )
elif PROVIDER == "openai":
    from langchain_openai import ChatOpenAI, OpenAIEmbeddings
    llm = ChatOpenAI(model="gpt-4o")
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
elif PROVIDER == "anthropic":
    from langchain_anthropic import ChatAnthropic
    from langchain_openai import OpenAIEmbeddings  # Anthropic has no embedding model
    llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# The rest of your RAG chain, agent code, etc. stays identical

Async Batch Processing on Vertex AI

import asyncio
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    model_name="gemini-1.5-flash",
    project="your-project-id",
    location="us-central1",
    temperature=0
)

async def process_documents_async(documents: list[str]) -> list[str]:
    """Process multiple documents concurrently."""
    prompt = ChatPromptTemplate.from_template(
        "Summarize this document in 2 sentences: {doc}"
    )
    chain = prompt | llm | StrOutputParser()
    
    # Vertex AI allows up to 60 concurrent requests
    semaphore = asyncio.Semaphore(20)
    
    async def process_one(doc: str) -> str:
        async with semaphore:
            return await chain.ainvoke({"doc": doc})
    
    tasks = [process_one(doc) for doc in documents]
    return await asyncio.gather(*tasks, return_exceptions=True)

# Process 100 documents in parallel
documents = [f"Technical document {i} with content about ML systems..." for i in range(100)]
summaries = asyncio.run(process_documents_async(documents))
print(f"Processed {len(summaries)} documents")

Production Considerations

Quotas and Rate Limits:

Gemini 1.5 Pro: 360 requests/minute, 4M tokens/minute
Gemini 1.5 Flash: 1,000 requests/minute, 4M tokens/minute
Request increases via GCP support for production workloads

Regional Availability:

Models are available in us-central1, us-east4, europe-west4, and several others
Deploy in the same region as your other GCP services to minimize latency and egress costs

Logging and Monitoring:

from langchain_core.callbacks import StdOutCallbackHandler
from langchain_google_vertexai import ChatVertexAI

# Enable Cloud Logging via GCP (automatic when running on GCP)
# For local development, use LangChain callbacks
llm = ChatVertexAI(
    model_name="gemini-1.5-pro",
    project="your-project-id",
    location="us-central1",
    callbacks=[StdOutCallbackHandler()]  # Log all LLM calls
)

Frequently Asked Questions

Share this article:Facebook Twitter/X LinkedIn Telegram WhatsApp

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

Agent Development

10 LangChain Retrieval Strategies for Better RAG Results

Go beyond basic similarity search with ParentDocumentRetriever, MultiQueryRetriever, EnsembleRetriever, HyDE, and 6 more LangChain retrieval strategies — with code for each.

May 31, 2026 13 min read

Agent Development

Build a LangChain Agent with Memory and Tools (Full Example)

Build a complete LangChain conversational agent with persistent memory, multiple tools, and step-by-step trace — from setup to a production-ready implementation with code.

May 31, 2026 14 min read

Agent Development

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

Understand every major LangChain agent type — ZeroShotAgent, ReAct, ConversationalAgent, and more — with Python code and agent trace walkthroughs.

May 31, 2026 13 min read

Agent Development

How to Deploy a LangChain App as a FastAPI REST Endpoint

Serve a LangChain app as a production FastAPI REST endpoint with streaming, async chains, error handling, and Docker deployment — full Python code included.

May 31, 2026 11 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

How to Use LangChain with Vertex AI (Google Gemini 2026)

Why Vertex AI with LangChain?

Installation and Authentication

Basic Usage: ChatVertexAI

VertexAI for Text Completion (Non-Chat)

Vertex AI Embeddings

Multimodal Inputs with Gemini Vision

Function Calling with ChatVertexAI

RAG Pipeline with Vertex AI

Long-Context Processing with Gemini 1.5 Pro

Vertex AI vs OpenAI Pricing Comparison (2026)

Streaming Responses

Switching Between Providers

Async Batch Processing on Vertex AI

Production Considerations

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily

How to Use LangChain with Vertex AI (Google Gemini 2026)

Why Vertex AI with LangChain?

Installation and Authentication

Basic Usage: ChatVertexAI

VertexAI for Text Completion (Non-Chat)

Vertex AI Embeddings

Multimodal Inputs with Gemini Vision

Function Calling with ChatVertexAI

RAG Pipeline with Vertex AI

Long-Context Processing with Gemini 1.5 Pro

Vertex AI vs OpenAI Pricing Comparison (2026)

Streaming Responses

Switching Between Providers

Async Batch Processing on Vertex AI

Production Considerations

Frequently Asked Questions

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

10 LangChain Retrieval Strategies for Better RAG Results

Build a LangChain Agent with Memory and Tools (Full Example)

5 LangChain Agent Types Explained (ZeroShot, ReAct, and More)

How to Deploy a LangChain App as a FastAPI REST Endpoint

Go deeper on this topic

Get Free AI Notes Daily