How to Use LangChain with Weaviate (Hybrid Search 2026)
Connect LangChain to Weaviate for hybrid vector and keyword search. Covers local and cloud setup, nearText, BM25, metadata filtering, and a comparison table.
Get more content like this on Telegram!
Daily AI tips, notes & resources β free
Most vector database tutorials stop at cosine similarity. You embed your query, find the nearest neighbors, and call it done. That works reasonably well until you hit a common failure mode: a user types an exact product code, a legal citation, or a technical acronym that the embedding model has never seen. The semantic search returns loosely related results when what the user wanted was an exact keyword match.
Weaviate's hybrid search solves this by running both a vector search and a BM25 keyword search simultaneously, then blending the scores. The result is a retrieval system that is both semantically aware and keyword-precise. This guide shows you how to wire it up with LangChain, starting from a local Docker instance and ending with a production-ready pattern.
What Makes Weaviate Different
Before diving into code, it is worth understanding what Weaviate brings to the table compared to simpler vector stores. The vector database guide covers the landscape, but Weaviate specifically offers:
- Native hybrid search: BM25 + vector in a single query, not two separate requests merged in Python
- Multi-tenancy: isolate collections per user or tenant without running separate instances
- Generative search: pipe retrieval results directly to an LLM in a single Weaviate query
- HNSW + Product Quantization: memory-efficient indexing for large collections
- Schema flexibility: optional strict schema or auto-schema creation
Industry data from the 2025 Weaviate benchmark report shows hybrid search achieving 18-23% higher NDCG@10 scores compared to pure vector search on enterprise document collections with mixed content types.
Environment Setup
Start a local Weaviate instance with Docker:
docker run -d \
-p 8080:8080 \
-p 50051:50051 \
--name weaviate \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
-e ENABLE_MODULES="text2vec-openai,generative-openai" \
-e OPENAI_APIKEY=$OPENAI_API_KEY \
cr.weaviate.io/semitechnologies/weaviate:1.25.0
Install Python dependencies:
pip install langchain-weaviate weaviate-client langchain-openai langchain-community
Connecting LangChain to Weaviate
import weaviate
from weaviate.classes.init import Auth
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_openai import OpenAIEmbeddings
# Local connection
client = weaviate.connect_to_local(
host="localhost",
port=8080,
grpc_port=50051
)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create or connect to a collection
vector_store = WeaviateVectorStore(
client=client,
index_name="DocumentChunk",
text_key="content",
embedding=embeddings,
attributes=["source", "doc_type", "created_at", "chunk_id"]
)
print("Connected to Weaviate:", client.is_ready())
For Weaviate Cloud Services (WCS):
import weaviate
from weaviate.classes.init import Auth
# Cloud connection
client = weaviate.connect_to_weaviate_cloud(
cluster_url="https://your-cluster.weaviate.network",
auth_credentials=Auth.api_key("your-weaviate-api-key"),
headers={"X-OpenAI-Api-Key": "your-openai-api-key"}
)
vector_store = WeaviateVectorStore(
client=client,
index_name="DocumentChunk",
text_key="content",
embedding=embeddings
)
The connection swap is the only change between local and cloud β the rest of your code stays identical.
Ingesting Documents
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_weaviate.vectorstores import WeaviateVectorStore
# Load and split documents
loader = PyPDFLoader("technical_manual.pdf")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=100,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(documents)
# Add metadata to each chunk
for i, chunk in enumerate(chunks):
chunk.metadata.update({
"chunk_id": f"chunk_{i:04d}",
"doc_type": "technical_manual",
"created_at": "2026-05-31"
})
# Ingest with auto-generated embeddings
ids = vector_store.add_documents(chunks)
print(f"Ingested {len(ids)} chunks into Weaviate")
Pure Vector Search with nearText
# Simple similarity search
query = "How do I configure the network interface?"
results = vector_store.similarity_search(query, k=5)
for doc in results:
print(f"Source: {doc.metadata.get('source', 'unknown')}")
print(f"Content: {doc.page_content[:200]}")
print("---")
With scores:
results_with_scores = vector_store.similarity_search_with_score(query, k=5)
for doc, score in results_with_scores:
print(f"Score: {score:.4f} | {doc.page_content[:150]}")
The score here is cosine distance β lower is closer. For semantic concepts this works well, but notice what happens with exact product codes:
# This often fails with pure vector search
exact_query = "Error code E-4021-B network timeout"
results = vector_store.similarity_search(exact_query, k=3)
# Results may be thematically related but miss the exact code
Hybrid Search: Combining nearText and BM25
This is where Weaviate's hybrid search shines. The alpha parameter controls the blend:
alpha=1.0β pure vector searchalpha=0.0β pure BM25 keyword searchalpha=0.5β equal weight to both (recommended starting point)
# Hybrid search via LangChain retriever
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={
"k": 10,
"alpha": 0.5 # 50/50 blend of vector and BM25
}
)
# Query that benefits from hybrid: has both semantic and keyword components
query = "Error code E-4021-B network timeout troubleshooting steps"
hybrid_results = retriever.invoke(query)
for doc in hybrid_results:
print(doc.page_content[:200])
print("---")
For direct Weaviate hybrid queries with full control:
from weaviate.classes.query import HybridFusion
# Using the Weaviate client directly for maximum control
collection = client.collections.get("DocumentChunk")
response = collection.query.hybrid(
query=query,
alpha=0.5,
fusion_type=HybridFusion.RELATIVE_SCORE, # or RANKED
limit=10,
return_properties=["content", "source", "doc_type"],
return_metadata=["score", "explain_score"]
)
for obj in response.objects:
print(f"Score: {obj.metadata.score:.4f}")
print(f"Content: {obj.properties['content'][:200]}")
print(f"Explanation: {obj.metadata.explain_score}")
print("---")
The explain_score field is invaluable for debugging β it tells you which component (vector or BM25) contributed how much to each result's final score.
Metadata Filtering
One of Weaviate's strengths is combining hybrid search with metadata filters. You get semantic + keyword matching constrained to a specific subset of your data.
from weaviate.classes.query import Filter
# Filter by doc_type then hybrid search within that subset
response = collection.query.hybrid(
query="network configuration error",
alpha=0.6,
limit=5,
filters=Filter.by_property("doc_type").equal("technical_manual")
)
for obj in response.objects:
print(f"DocType: {obj.properties.get('doc_type')} | Score: {obj.metadata.score:.4f}")
print(obj.properties["content"][:150])
Complex filter combinations:
from weaviate.classes.query import Filter
# Multiple filter conditions
compound_filter = (
Filter.by_property("doc_type").equal("technical_manual") &
Filter.by_property("created_at").greater_than("2026-01-01")
)
response = collection.query.hybrid(
query="installation requirements",
alpha=0.5,
filters=compound_filter,
limit=8
)
Via LangChain's interface:
results = vector_store.similarity_search(
query="installation requirements",
k=8,
where_filter={
"path": ["doc_type"],
"operator": "Equal",
"valueText": "technical_manual"
}
)
Building a Hybrid RAG Chain
Now let's integrate hybrid search into a full RAG pipeline:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Hybrid retriever with tuned alpha
hybrid_retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={
"k": 6,
"alpha": 0.55 # Slightly favor vector for most use cases
}
)
prompt = ChatPromptTemplate.from_template("""
You are a technical support specialist. Use the following document excerpts to answer the question accurately.
Context:
{context}
Question: {question}
Provide a clear, step-by-step answer. If the information is not in the context, say so explicitly.
""")
def format_docs(docs):
return "\n\n".join(
f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
for doc in docs
)
hybrid_rag_chain = (
{"context": hybrid_retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Test with both semantic and keyword-heavy queries
response = hybrid_rag_chain.invoke("What are the steps to resolve error E-4021-B?")
print(response)
response = hybrid_rag_chain.invoke("Explain the general approach to network troubleshooting")
print(response)
This pairs naturally with the patterns in the RAG system tutorial and builds on what you learn in the LangChain tutorial 2025.
Multi-Tenant Search
Weaviate's multi-tenancy feature is useful when you host a document search service for multiple customers and need strict data isolation.
import weaviate
from weaviate.classes.config import Configure, Property, DataType
# Create a multi-tenant collection
client.collections.create(
name="CustomerDocuments",
multi_tenancy_config=Configure.multi_tenancy(enabled=True),
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="doc_type", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
]
)
# Add tenants
collection = client.collections.get("CustomerDocuments")
collection.tenants.create([
weaviate.classes.tenants.Tenant(name="acme_corp"),
weaviate.classes.tenants.Tenant(name="globex_inc"),
])
# Ingest data for a specific tenant
acme_collection = collection.with_tenant("acme_corp")
acme_collection.data.insert({
"content": "ACME Corp internal network policy document content here...",
"doc_type": "policy",
"source": "acme_intranet"
})
# Query is automatically isolated to the tenant
response = acme_collection.query.hybrid(
query="network policy",
alpha=0.5,
limit=5
)
With LangChain, you can create a tenant-scoped vector store per user request:
def get_tenant_retriever(tenant_id: str) -> object:
"""Return a retriever scoped to a specific tenant."""
tenant_store = WeaviateVectorStore(
client=client,
index_name="CustomerDocuments",
text_key="content",
embedding=embeddings,
tenant=tenant_id
)
return tenant_store.as_retriever(
search_kwargs={"k": 5, "alpha": 0.5}
)
# Per-request retriever creation
acme_retriever = get_tenant_retriever("acme_corp")
globex_retriever = get_tenant_retriever("globex_inc")
This isolation model is critical for enterprise deployments. The Build AI agent with LangChain guide covers how to route requests to tenant-specific components.
Alpha Tuning: Finding the Right Blend
The right alpha value depends on your content and query patterns. Here is a systematic approach to tuning:
from langchain_core.runnables import RunnableLambda
import statistics
def evaluate_alpha(alpha: float, test_queries: list, ground_truth: list) -> dict:
"""Evaluate retrieval quality at a given alpha value."""
retriever = vector_store.as_retriever(
search_kwargs={"k": 5, "alpha": alpha}
)
hit_rates = []
for query, relevant_chunks in zip(test_queries, ground_truth):
results = retriever.invoke(query)
retrieved_contents = [r.page_content for r in results]
hits = sum(1 for gt in relevant_chunks if any(gt in r for r in retrieved_contents))
hit_rate = hits / len(relevant_chunks) if relevant_chunks else 0
hit_rates.append(hit_rate)
return {
"alpha": alpha,
"avg_hit_rate": statistics.mean(hit_rates),
"min_hit_rate": min(hit_rates),
"max_hit_rate": max(hit_rates)
}
test_queries = [
"Error code E-4021-B",
"How does the authentication system work?",
"network timeout configuration"
]
ground_truth = [
["Error E-4021-B occurs when", "E-4021-B network timeout"],
["authentication uses JWT tokens", "the auth flow validates"],
["timeout_seconds parameter", "network interface timeout setting"]
]
# Test a range of alpha values
for alpha in [0.0, 0.25, 0.5, 0.75, 1.0]:
result = evaluate_alpha(alpha, test_queries, ground_truth)
print(f"Alpha {alpha:.2f}: Hit Rate = {result['avg_hit_rate']:.3f}")
Weaviate Search Mode Comparison
| Search Mode | Semantic Understanding | Exact Keyword Match | Metadata Filter | Speed |
|---|---|---|---|---|
| nearText (vector only) | Excellent | Poor | Yes | Fast |
| BM25 (keyword only) | Poor | Excellent | Yes | Very fast |
| Hybrid (alpha=0.5) | Good | Good | Yes | Fast |
| Hybrid (alpha=0.75) | Very good | Moderate | Yes | Fast |
| Hybrid (alpha=0.25) | Moderate | Very good | Yes | Fast |
| Generative search | Excellent | Good | Yes | Slowest |
This mirrors findings in the semantic search tutorial β there is no single best configuration, only best configurations for specific query distributions.
Streaming Results with LangChain
For a better user experience in chat interfaces, stream the RAG response:
from langchain_core.callbacks import StreamingStdOutCallbackHandler
streaming_llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()]
)
streaming_chain = (
{"context": hybrid_retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| streaming_llm
| StrOutputParser()
)
# Tokens stream to stdout as they arrive
for chunk in streaming_chain.stream("How do I reset the admin password?"):
pass # StreamingStdOutCallbackHandler handles printing
Weaviate Backend Cleanup
Always close the Weaviate client connection when your application shuts down:
import atexit
@atexit.register
def cleanup():
client.close()
print("Weaviate connection closed.")
# Or use as a context manager in scripts:
with weaviate.connect_to_local() as client:
store = WeaviateVectorStore(client=client, ...)
results = store.similarity_search("test query", k=3)
# Connection closes automatically when the block exits
Combining Hybrid Search with Agent Tools
Wrapping the hybrid retriever as a LangChain tool makes it accessible to agents built with the patterns in Build AI agent with LangChain:
from langchain.tools.retriever import create_retriever_tool
hybrid_search_tool = create_retriever_tool(
retriever=hybrid_retriever,
name="hybrid_document_search",
description=(
"Search the technical documentation using hybrid vector and keyword search. "
"Use this for both broad conceptual questions and specific error codes or product names."
)
)
# Add to an agent
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
agent_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful technical support assistant."),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, [hybrid_search_tool], agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=[hybrid_search_tool], verbose=True)
result = agent_executor.invoke({
"input": "I'm seeing error E-4021-B, what should I do?",
"chat_history": []
})
print(result["output"])
This combination of hybrid retrieval and agent orchestration connects directly to what you see in the AI research agent build for more complex retrieval workflows.
Key Takeaways
Weaviate's hybrid search is not a gimmick β it addresses a real failure mode in production RAG systems where users type exact identifiers, codes, and acronyms that embedding models handle poorly. The alpha parameter gives you fine-grained control over the blend, and the metadata filtering capabilities let you constrain searches to relevant subsets of your data.
The LangChain integration abstracts away Weaviate's GraphQL API for common operations while still letting you drop down to the native client when you need advanced features like explain_score, RELATIVE_SCORE fusion, or multi-tenancy configuration.
For agents that need reliable document retrieval, the OpenAI API integration guide shows how to manage the embedding costs that come with large Weaviate collections, and the Deploy AI model to production guide covers the infrastructure patterns for running Weaviate at scale.
Frequently Asked Questions
What is hybrid search in Weaviate? Hybrid search combines vector similarity search (nearText or nearVector) with keyword-based BM25 search. Weaviate scores both results and blends them using a configurable alpha parameter, giving you the best of semantic and lexical retrieval.
Do I need a Weaviate Cloud account to follow this tutorial? No. All core examples run against a local Weaviate instance started with Docker. The cloud section shows how to swap the connection string for Weaviate Cloud Services when you are ready to deploy.
How does LangChain's WeaviateVectorStore compare to other vector store integrations? WeaviateVectorStore is one of the richer integrations β it exposes nearText, hybrid search, metadata filters, multi-tenancy, and HNSW configuration through a consistent LangChain interface, so you get Weaviate's advanced features without writing raw GraphQL.
Frequently Asked Questions
AiTechWorlds Team
β Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 β feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
How to Use AutoGen with Milvus (Vector Database Memory)
Integrate Milvus vector database with AutoGen agents for large-scale persistent memory. Full setup guide with LangChain integration and vector DB comparison table.
5 AutoGPT Memory Types (Vector, Redis, File, Conversation)
Compare AutoGPT's 5 memory backends β local file, Redis, Pinecone, Milvus, and Weaviate. Choose the right one for speed, cost, and persistence needs.
How to Set Up AutoGPT with Pinecone (Persistent Memory)
Step-by-step guide to configuring AutoGPT with Pinecone for persistent long-term memory. Covers Pinecone setup, memory.json config, and memory_backend settings.