7 Vector Stores Compatible with LangChain (Ranked 2026)
Compare Pinecone, Weaviate, FAISS, Chroma, Milvus, Qdrant, and PGVector for LangChain RAG — with code snippets, cost breakdown, and honest recommendations.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Choosing the wrong vector database for your RAG system is a decision you'll regret at scale. I've seen teams build on FAISS because it was the first example in the tutorial, then spend weeks migrating when they hit production requirements. I've also seen teams choose Pinecone for a weekend project and spend more on the database than they spent on the LLM API.
The choice matters. And it's genuinely non-trivial because the options vary significantly in hosting model, cost structure, query performance, metadata filtering capabilities, and operational complexity.
This guide covers all seven major vector stores that work with LangChain in 2026: Pinecone, Weaviate, FAISS, Chroma, Milvus, Qdrant, and PGVector. Each section includes working LangChain integration code, a comparison table, and my honest take on when each one makes sense.
For context on how vector stores fit into a complete RAG pipeline, see our RAG system tutorial. For LangChain fundamentals, the LangChain tutorial 2025 is the right starting point.
What Makes a Vector Store Good for LangChain
Before the comparisons, here's what actually matters when picking a vector store for LangChain RAG:
Query performance — How fast is approximate nearest neighbor (ANN) search? Most stores use HNSW (Hierarchical Navigable Small World) graphs. Latency under 100ms for sub-million-scale queries is the bar.
Metadata filtering — Can you filter by document type, date, user ID, or other attributes alongside vector search? This matters enormously for multi-tenant apps or when you only want to search a subset of documents.
Hybrid search — Does it combine dense vector search with BM25 keyword search? Hybrid search consistently outperforms pure vector search on most RAG benchmarks.
Update flexibility — Can you add and delete documents without rebuilding the index? Some stores (FAISS) require rebuilds. Others (Chroma, Qdrant) support real-time updates.
Operational simplicity — Do you want managed hosting (no ops) or self-hosted control? This is usually the deciding factor for small teams.
According to the ANN Benchmarks project, HNSW-based indexes consistently deliver the best recall-to-speed ratio for high-dimensional embeddings, which is why most modern vector stores use it.
1. Chroma: The Developer's First Choice
Chroma is where most LangChain RAG projects start. It's local-first, zero-config, and has first-class LangChain support.
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create or load a persistent collection
vectorstore = Chroma(
collection_name="my_documents",
embedding_function=embeddings,
persist_directory="./chroma_db",
collection_metadata={"hnsw:space": "cosine"} # Use cosine similarity
)
# Add documents with metadata
documents = [
Document(
page_content="LangChain is a framework for LLM applications.",
metadata={"source": "docs", "category": "framework", "year": 2026}
),
Document(
page_content="Vector databases store embeddings for similarity search.",
metadata={"source": "tutorial", "category": "database", "year": 2025}
),
]
ids = vectorstore.add_documents(documents)
print(f"Added documents with IDs: {ids}")
# Similarity search
results = vectorstore.similarity_search("LLM frameworks", k=2)
# Filtered search — only documents from 2026
filtered_results = vectorstore.similarity_search(
"AI frameworks",
k=5,
filter={"year": 2026}
)
# Search with scores
results_with_scores = vectorstore.similarity_search_with_score("LLM tools", k=3)
for doc, score in results_with_scores:
print(f"Score: {score:.4f} | {doc.page_content[:80]}")
# Use as retriever
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 20}
)
Verdict: Best for development and projects under ~500K documents. The hosted Chroma Cloud service extends it to production. The SQLite backend it uses by default doesn't scale to millions of documents.
2. FAISS: Maximum Speed, Zero Dependencies
FAISS (Facebook AI Similarity Search) is a C++ library with Python bindings. No server, no network latency, just raw performance.
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create from texts directly
texts = [
"Python is a programming language",
"LangChain builds LLM applications",
"Embeddings convert text to vectors",
"RAG retrieves context for LLMs",
]
metadatas = [{"id": i, "source": "test"} for i in range(len(texts))]
vectorstore = FAISS.from_texts(texts=texts, embedding=embeddings, metadatas=metadatas)
# Save to disk
vectorstore.save_local("./faiss_index")
# Load from disk (fast — no re-embedding needed)
loaded_store = FAISS.load_local(
"./faiss_index",
embeddings=embeddings,
allow_dangerous_deserialization=True # Required flag
)
# Similarity search
results = loaded_store.similarity_search("vector databases", k=3)
# Merge two FAISS stores (useful for incremental builds)
store1 = FAISS.from_texts(["doc1", "doc2"], embedding=embeddings)
store2 = FAISS.from_texts(["doc3", "doc4"], embedding=embeddings)
store1.merge_from(store2)
# Delete specific documents by ID
store1.delete(["doc_id_1", "doc_id_2"])
Verdict: Best for local deployments where all data fits in RAM. Exceptional query speed. Not suited for real-time updates at scale or distributed deployments.
3. Pinecone: Managed Scale
Pinecone is the managed vector database leader. Zero infrastructure, global distribution, and consistent performance at any scale.
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from pinecone import Pinecone, ServerlessSpec
import os
# Initialize Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
# Create a serverless index (pay per query, not per hour)
index_name = "langchain-rag"
if index_name not in [i.name for i in pc.list_indexes()]:
pc.create_index(
name=index_name,
dimension=1536, # text-embedding-3-small dimension
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Connect LangChain to Pinecone
vectorstore = PineconeVectorStore(
index=pc.Index(index_name),
embedding=embeddings,
text_key="text"
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(
page_content="Pinecone provides managed vector search at scale.",
metadata={"source": "pinecone_docs", "category": "database", "tenant_id": "user_001"}
),
]
vectorstore.add_documents(docs)
# Multi-tenant filtered search
results = vectorstore.similarity_search(
"managed databases",
k=5,
filter={"tenant_id": {"$eq": "user_001"}}
)
# Namespace-based multi-tenancy
tenant_store = PineconeVectorStore(
index=pc.Index(index_name),
embedding=embeddings,
namespace="tenant_user_001" # Namespace isolates data
)
Verdict: Best for production SaaS applications, multi-tenant RAG, and teams that don't want to manage infrastructure. The cost adds up at scale — budget carefully.
4. Qdrant: Best Self-Hosted Option
Qdrant is my personal recommendation for production self-hosted deployments. It has excellent metadata filtering, hybrid search, and a clean REST/gRPC API.
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from langchain_openai import OpenAIEmbeddings
import os
# Connect to Qdrant (local, Docker, or cloud)
# Local: QdrantClient(path="./qdrant_db")
# Docker: QdrantClient(host="localhost", port=6333)
# Cloud: QdrantClient(url="https://...", api_key="...")
client = QdrantClient(path="./qdrant_local")
# Create collection
collection_name = "langchain_docs"
client.recreate_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE
)
)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = QdrantVectorStore(
client=client,
collection_name=collection_name,
embedding=embeddings,
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(
page_content="Qdrant is a high-performance vector search engine.",
metadata={
"source": "qdrant_docs",
"category": "database",
"language": "en",
"published_year": 2024
}
),
]
vectorstore.add_documents(docs)
# Advanced filtered search using Qdrant's filter syntax
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
results = vectorstore.similarity_search(
"vector database performance",
k=5,
filter=Filter(
must=[
FieldCondition(key="language", match=MatchValue(value="en")),
FieldCondition(key="published_year", range=Range(gte=2023))
]
)
)
# Hybrid search (dense + sparse)
from langchain_qdrant import FastEmbedSparse, RetrievalMode
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
hybrid_store = QdrantVectorStore.from_documents(
docs,
embedding=embeddings,
sparse_embedding=sparse_embeddings,
location=":memory:",
collection_name="hybrid_collection",
retrieval_mode=RetrievalMode.HYBRID,
)
Verdict: Best for teams that need production performance, excellent filtering, and hybrid search without the Pinecone price tag. Docker deployment is straightforward.
5. Weaviate: Multi-Modal and Hybrid
Weaviate stands out for multi-modal search (text + images + audio) and built-in BM25 hybrid search. The schema-based approach is more structured than other stores.
from langchain_weaviate import WeaviateVectorStore
import weaviate
from langchain_openai import OpenAIEmbeddings
# Connect to Weaviate
client = weaviate.connect_to_local() # Docker local
# Or: weaviate.connect_to_weaviate_cloud(cluster_url="...", auth_credentials=...)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = WeaviateVectorStore(
client=client,
index_name="LangchainDocs",
text_key="text",
embedding=embeddings,
)
from langchain_core.documents import Document
docs = [
Document(
page_content="Weaviate supports hybrid search combining vectors and BM25.",
metadata={"source": "weaviate_docs", "author": "AiTechWorlds"}
),
]
vectorstore.add_documents(docs)
# Hybrid search
results = vectorstore.similarity_search(
"hybrid vector search BM25",
k=3,
alpha=0.5 # 0=pure BM25, 1=pure vector, 0.5=equal blend
)
client.close()
Verdict: Best for multi-modal RAG or when you need strong hybrid search out of the box. The schema system has a steeper learning curve than other options.
6. Milvus: High-Throughput Enterprise
Milvus is designed for high-throughput, large-scale deployments. It's more complex to operate than Qdrant but scales to billions of vectors.
from langchain_milvus import Milvus
from langchain_openai import OpenAIEmbeddings
from pymilvus import connections, utility
# Connect to Milvus
connections.connect(host="localhost", port="19530")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create collection with Milvus
vectorstore = Milvus.from_documents(
documents=[], # Start empty
embedding=embeddings,
collection_name="langchain_rag",
connection_args={"host": "localhost", "port": "19530"},
index_params={
"metric_type": "COSINE",
"index_type": "HNSW",
"params": {"M": 8, "efConstruction": 64}
}
)
from langchain_core.documents import Document
docs = [
Document(
page_content="Milvus handles billions of vectors with high throughput.",
metadata={"source": "milvus_docs", "type": "technical"}
),
]
vectorstore.add_documents(docs)
# Search with partition support (for large-scale multi-tenancy)
results = vectorstore.similarity_search("high performance vector search", k=5)
# Also available as managed: Zilliz Cloud (Milvus-compatible SaaS)
Verdict: Best for enterprise deployments requiring billions of vectors and multi-hundred QPS throughput. Overkill for most projects under 10 million documents.
7. PGVector: If You Already Have PostgreSQL
PGVector adds vector search as a Postgres extension. If your application already runs on Postgres, this is the path of least resistance.
from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings
import os
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Connection string format
connection_string = "postgresql+psycopg://user:password@localhost:5432/mydb"
vectorstore = PGVector(
embeddings=embeddings,
collection_name="langchain_vectors",
connection=connection_string,
use_jsonb=True, # Better metadata filtering with JSONB
)
from langchain_core.documents import Document
docs = [
Document(
page_content="PGVector adds vector search to PostgreSQL databases.",
metadata={
"source": "pgvector_docs",
"user_id": "u001",
"document_type": "tutorial"
}
),
]
vectorstore.add_documents(docs)
# Metadata filtering using PostgreSQL operators
results = vectorstore.similarity_search(
"PostgreSQL vector extension",
k=5,
filter={"user_id": "u001"}
)
# Full-text hybrid search combining pgvector and tsvector
# (Requires custom SQL — PGVector supports this via raw queries)
Verdict: Best when you already run PostgreSQL and want to avoid adding a new database to your stack. Performance is solid for small to medium scale. Not competitive with dedicated vector stores at large scale.
Head-to-Head Comparison Table
| Database | Hosting | Monthly Cost (est.) | ANN Algorithm | Metadata Filtering | Hybrid Search | Best For |
|---|---|---|---|---|---|---|
| Chroma | Local / Cloud | Free (self-host) | HNSW | Good | No | Dev, small prod |
| FAISS | Local only | Free | IVF + HNSW | Limited | No | Speed-critical local |
| Pinecone | Cloud only | $0–$70+ | Proprietary | Excellent | Yes (managed) | SaaS, managed prod |
| Qdrant | Local + Cloud | Free (self-host) | HNSW | Excellent | Yes | Self-hosted prod |
| Weaviate | Local + Cloud | Free (self-host) | HNSW | Good | Yes (built-in) | Multi-modal, hybrid |
| Milvus | Local + Cloud | Free (self-host) | HNSW / IVF | Good | Limited | Enterprise scale |
| PGVector | Local (Postgres) | Free (self-host) | HNSW / IVF | Excellent | Manual | Postgres-first teams |
Switching Between Vector Stores
One of LangChain's advantages is that the vector store interface is largely consistent. Switching stores is mostly about the initialization code, not the rest of your application:
from langchain_core.vectorstores import VectorStore
from langchain_openai import OpenAIEmbeddings
def build_rag_chain(vectorstore: VectorStore):
"""This function works with ANY LangChain vector store."""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 20}
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "Answer using context:\n{context}"),
("human", "{question}")
])
return (
{"context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Use with any store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Swap these out without changing build_rag_chain
from langchain_chroma import Chroma
chroma_store = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
chain = build_rag_chain(chroma_store)
This portability is one of the best reasons to use LangChain's vector store abstractions rather than vendor SDKs directly.
My Recommendations by Use Case
You're building a prototype or internal tool (< 100K documents): Chroma. Zero setup, good filtering, works on your laptop.
You need production RAG and don't want to manage infrastructure: Pinecone. It's more expensive than self-hosted but you get consistent performance and no ops burden.
You need production RAG and have DevOps resources: Qdrant. Best combination of performance, filtering, hybrid search, and cost for self-hosted deployments.
You already run PostgreSQL and your data is under 1M documents: PGVector. The operational simplicity of staying in one database is worth a lot.
You need multi-modal search (images, audio, text): Weaviate. It's the only option here with native multi-modal support.
You need billions of vectors at high throughput: Milvus or a managed Pinecone enterprise tier.
For more context on the semantic search fundamentals that underpin all of these stores, the semantic search tutorial covers embedding models, distance metrics, and ANN algorithms. The LangChain RAG pipeline guide shows how any of these stores fits into a complete retrieval system.
Conclusion
There's no universally best vector database — just the right one for your specific requirements. The table and recommendations above should narrow it down. If you're unsure, start with Chroma, get your RAG pipeline working end-to-end, then evaluate whether you need to migrate based on actual performance data from your use case.
The good news: switching stores with LangChain is a day of work, not a week. Build your chain against the interface, test with Chroma first, and upgrade when you have real reasons to. Don't over-engineer your database choice before you even know what your query patterns look like.
Pair this guide with the LangChain advanced RAG strategies post to get the most out of whichever store you choose.
Frequently Asked Questions
Which vector store is best for a beginner LangChain RAG project?
Start with Chroma. It requires zero infrastructure setup, runs locally, persists to disk, and integrates with LangChain in three lines. Switch to Qdrant or Pinecone when you outgrow it.
Can I switch vector stores without rewriting my LangChain code?
Mostly yes. LangChain's vector store interface is standardized — the similarity_search(), as_retriever(), and add_documents() methods work the same across providers. You'll need to re-embed your documents for the new store, but your chain code changes minimally.
Does FAISS work for production RAG systems?
FAISS works well for production if your data fits in RAM and you don't need real-time updates or multi-server deployments. Many companies run FAISS in production for read-heavy workloads with periodic batch rebuilds. It's not ideal for dynamic datasets that update frequently.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AutoGen vs LangChain: Which for Multi-Agent Systems in 2026?
AutoGen vs LangChain for multi-agent systems in 2026 — feature comparison, same use case in both frameworks, and an honest verdict on when each wins.
5 AutoGPT Memory Types (Vector, Redis, File, Conversation)
Compare AutoGPT's 5 memory backends — local file, Redis, Pinecone, Milvus, and Weaviate. Choose the right one for speed, cost, and persistence needs.
How to Set Up AutoGPT with Pinecone (Persistent Memory)
Step-by-step guide to configuring AutoGPT with Pinecone for persistent long-term memory. Covers Pinecone setup, memory.json config, and memory_backend settings.
AutoGPT vs LangChain Agents: Which is More Autonomous?
Compare AutoGPT's zero-shot autonomy against LangChain's ReAct agents. Discover which handles complex tasks better and when to choose each framework.