Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared
Vector database guide 2025 — compare Pinecone, Weaviate, Chroma, pgvector and Qdrant by features, performance, cost, and use cases for production AI applications.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared
The first production RAG system I shipped used Chroma. By month three, it was too slow for our growing dataset. I migrated to Pinecone. Six months later, we were paying $400/month for what PostgreSQL with pgvector could handle for $20.
Every vector database has a sweet spot. Understanding the tradeoffs before you commit to one saves significant pain. This guide covers the major options with honest assessments of where each wins and loses.
The Vector Database Landscape
Category 1: Managed Cloud (zero ops)
- Pinecone: easiest, most mature managed offering
- Weaviate Cloud: managed Weaviate
Category 2: Self-Hosted Open-Source
- Qdrant: Rust-based, excellent performance, great docs
- Weaviate: full-featured, multimodal, active community
- Milvus: enterprise-grade, complex but scalable
Category 3: Embedded (no separate server)
- Chroma: Python-first, great for development
- LanceDB: columnar storage, good for local use
Category 4: PostgreSQL Extensions
- pgvector: vector search in existing PostgreSQL
- pg_embedding: alternative to pgvector
Category 5: Search Platforms with Vector Support
- Elasticsearch: full-text + vector hybrid
- OpenSearch: AWS's Elasticsearch fork
Comparison Matrix
| Database | Type | Scale | Hybrid Search | Setup | Cost |
|---|---|---|---|---|---|
| Pinecone | Managed | 10M-1B+ | ✓ | Minutes | $70+/mo |
| Weaviate Cloud | Managed | 1M-100M+ | ✓ | Minutes | $0-$25+/mo |
| Qdrant Cloud | Managed | 1M-100M | ✓ | Minutes | $0-$50+/mo |
| Chroma | Embedded | <1M | ✗ | Seconds | Free |
| pgvector | Extension | <10M | Via PostgreSQL | Minutes | Existing DB |
| Qdrant (self-host) | Self-hosted | 1M-1B | ✓ | 1 hour | Infra cost |
| Weaviate (self-host) | Self-hosted | 1M-1B | ✓ | 1-2 hours | Infra cost |
Chroma: Development Default
# pip install chromadb
import chromadb
from chromadb.utils import embedding_functions
# In-memory client (lost when process ends)
client = chromadb.Client()
# Persistent client
client = chromadb.PersistentClient(path="./chroma_db")
# Create collection with embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-3-small"
)
collection = client.get_or_create_collection(
name="documents",
embedding_function=openai_ef,
metadata={"hnsw:space": "cosine"} # Distance metric
)
# Add documents (Chroma handles embedding)
collection.add(
documents=[
"The quick brown fox jumps over the lazy dog.",
"Machine learning is a subset of artificial intelligence.",
"Python is a versatile programming language.",
],
metadatas=[
{"source": "sample.txt", "category": "text"},
{"source": "ml_intro.txt", "category": "ai"},
{"source": "python_intro.txt", "category": "programming"},
],
ids=["doc1", "doc2", "doc3"]
)
# Query
results = collection.query(
query_texts=["What is AI?"],
n_results=2,
where={"category": "ai"} # Metadata filtering
)
for doc, meta, distance in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
):
print(f"Distance: {distance:.3f} | Source: {meta['source']}")
print(f"Content: {doc[:100]}\n")
# Update and delete
collection.update(ids=["doc1"], documents=["Updated text content"])
collection.delete(ids=["doc1"])
print(f"Collection count: {collection.count()}")
Pinecone: Managed Production
# pip install pinecone-client
from pinecone import Pinecone, ServerlessSpec
import numpy as np
pc = Pinecone(api_key="your-pinecone-api-key")
# Create index
if "my-index" not in [idx.name for idx in pc.list_indexes()]:
pc.create_index(
name="my-index",
dimension=1536, # Must match your embedding model
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("my-index")
# Upsert vectors (with metadata)
from openai import OpenAI
client = OpenAI()
def embed(texts: list[str]) -> list[list[float]]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
return [item.embedding for item in response.data]
documents = [
{"id": "doc1", "text": "Introduction to machine learning concepts."},
{"id": "doc2", "text": "Python data structures and algorithms."},
{"id": "doc3", "text": "Deep learning with PyTorch tutorial."},
]
texts = [d["text"] for d in documents]
embeddings = embed(texts)
# Batch upsert
vectors = [
{
"id": doc["id"],
"values": emb,
"metadata": {"text": doc["text"], "category": "tech"}
}
for doc, emb in zip(documents, embeddings)
]
index.upsert(vectors=vectors, namespace="production")
# Query
query_embedding = embed(["how does machine learning work?"])[0]
results = index.query(
namespace="production",
vector=query_embedding,
top_k=3,
include_values=False,
include_metadata=True,
filter={"category": {"$eq": "tech"}} # Metadata filter
)
for match in results.matches:
print(f"Score: {match.score:.3f} | ID: {match.id}")
print(f"Text: {match.metadata['text']}\n")
# Index statistics
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
pgvector: PostgreSQL Integration
# pip install psycopg2-binary pgvector
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np
# Connect to PostgreSQL with pgvector
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)
cur = conn.cursor()
# Enable extension and create table
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
category VARCHAR(50),
embedding vector(1536) -- Match your model's dimensions
)
""")
# Create HNSW index for fast approximate search
cur.execute("""
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64)
""")
conn.commit()
# Insert with embeddings
def insert_document(content: str, category: str, embedding: list[float]):
cur.execute(
"INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
(content, category, np.array(embedding))
)
conn.commit()
# Similarity search
def search(query_embedding: list[float], top_k: int = 5, category: str | None = None):
if category:
cur.execute("""
SELECT content, category, 1 - (embedding <=> %s) AS similarity
FROM documents
WHERE category = %s
ORDER BY embedding <=> %s
LIMIT %s
""", (np.array(query_embedding), category, np.array(query_embedding), top_k))
else:
cur.execute("""
SELECT content, category, 1 - (embedding <=> %s) AS similarity
FROM documents
ORDER BY embedding <=> %s
LIMIT %s
""", (np.array(query_embedding), np.array(query_embedding), top_k))
return cur.fetchall()
# Hybrid search (semantic + keyword)
def hybrid_search(query: str, query_embedding: list[float], top_k: int = 5):
cur.execute("""
SELECT content, category,
ts_rank(to_tsvector(content), plainto_tsquery(%s)) AS keyword_score,
1 - (embedding <=> %s) AS semantic_score,
-- Combine: 50% keyword + 50% semantic
(ts_rank(to_tsvector(content), plainto_tsquery(%s)) * 0.5 +
(1 - (embedding <=> %s)) * 0.5) AS combined_score
FROM documents
WHERE to_tsvector(content) @@ plainto_tsquery(%s)
OR (1 - (embedding <=> %s)) > 0.7
ORDER BY combined_score DESC
LIMIT %s
""", (query, np.array(query_embedding), query, np.array(query_embedding),
query, np.array(query_embedding), top_k))
return cur.fetchall()
Qdrant: High-Performance Self-Hosted
# pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue, SearchRequest
)
# Connect to local Qdrant (docker run -p 6333:6333 qdrant/qdrant)
client = QdrantClient(host="localhost", port=6333)
# Or cloud
# client = QdrantClient(url="https://your-cluster.qdrant.io", api_key="your-key")
# Create collection
client.recreate_collection(
collection_name="articles",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Upsert points
points = [
PointStruct(
id=1,
vector=[0.1] * 1536, # Your actual embedding
payload={"text": "Article about ML", "category": "ai", "views": 1500}
),
PointStruct(
id=2,
vector=[0.2] * 1536,
payload={"text": "Python tutorial", "category": "programming", "views": 3000}
),
]
client.upsert(collection_name="articles", points=points)
# Search with filters
results = client.search(
collection_name="articles",
query_vector=[0.1] * 1536,
limit=5,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="ai")),
FieldCondition(key="views", range={"gte": 1000})
]
),
with_payload=True
)
for r in results:
print(f"Score: {r.score:.3f} | Text: {r.payload['text']}")
Choosing the Right Database
Decision Tree:
Q: Is this for development/prototyping?
YES → Chroma (zero setup, Python-native)
Q: Do you already use PostgreSQL?
YES → pgvector (no new infrastructure)
Q: Do you need zero ops managed cloud?
YES → Pinecone (easiest) or Weaviate Cloud
Q: Do you need maximum performance + self-hosted?
YES → Qdrant (best performance/cost ratio for self-hosted)
Q: Do you need multimodal (images + text) vectors?
YES → Weaviate (native multimodal support)
Q: Do you need full-text + vector hybrid (existing Elasticsearch)?
YES → Stay in Elasticsearch with kNN
Scale thresholds:
< 100K vectors: any option works, choose by ops preference
100K-10M: Qdrant, Pinecone, Weaviate all scale fine
10M+: Pinecone or Qdrant with proper instance sizing
1B+: Pinecone, Milvus, or Elasticsearch
Conclusion
Vector databases have matured rapidly. For most applications, Chroma during development and Qdrant or pgvector in production covers 90% of use cases at a fraction of managed service costs.
Pinecone wins on operational simplicity — no servers, no maintenance, scales automatically. If your team's time is worth more than the $70+/month difference versus self-hosting, it's often the right choice.
For the retrieval application layer that uses these databases, see our RAG system tutorial. For understanding the embeddings stored in these databases, see our embeddings explained guide.
Frequently Asked Questions
What is a vector database?
Stores embedding vectors and supports fast similarity search using ANN (approximate nearest neighbor) algorithms like HNSW. Unlike SQL databases that find exact matches, vector databases find "most similar" embeddings in milliseconds across millions of vectors. Essential for RAG, semantic search, and recommendation systems.
Which vector database should I use?
Development: Chroma. Existing PostgreSQL: pgvector. Managed cloud: Pinecone (easiest) or Qdrant Cloud. Self-hosted high-performance: Qdrant. Full-featured with multimodal: Weaviate. Start simple — don't over-engineer for scale you don't have yet.
What is the difference between HNSW and IVF indexing?
HNSW: graph-based, fastest queries (~1ms), memory-intensive, modern default. IVF: cluster-based, more memory-efficient, slightly slower, better for very large datasets on limited RAM. HNSW is the right choice for most use cases under 50M vectors.
What is hybrid search in vector databases?
Combines dense vector search (semantic similarity) with sparse BM25 keyword search. Dense finds semantically related content; sparse finds exact keywords and proper nouns. Hybrid consistently outperforms either alone by 5-20%. Use for production RAG; supported natively by Weaviate, Pinecone, and Qdrant.
How do I choose the right vector dimension?
Your embedding model determines dimension — you can't choose independently. OpenAI text-embedding-3-small = 1536; text-embedding-3-large = 3072; all-MiniLM = 384. Higher dimensions = better quality but higher cost. 768-1536 is sufficient for most RAG applications.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality
AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.
Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment
Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.
Build a Personal AI Assistant: Complete Python Project with Memory and Tools
Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.
CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together
CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.