Which vector database should I use?

By use case: Development/prototyping → Chroma (zero setup, runs in-process). Small-to-medium production (< 10M vectors) → Qdrant (open-source, self-host or cloud, excellent performance). Large-scale managed cloud → Pinecone (easiest ops, best managed service). Existing PostgreSQL users → pgvector (no new infrastructure, familiar tooling). Full-featured open-source with hybrid search → Weaviate. Feature-rich with multimodal → Weaviate. The #1 mistake: over-engineering for scale you don't have. Start with Chroma or pgvector, move to Pinecone/Qdrant when you need scale.

What is the difference between HNSW and IVF indexing?

HNSW (Hierarchical Navigable Small World): builds a graph structure of vectors. Query traverses the graph to find nearest neighbors. Very fast at query time (~1ms for millions of vectors). Memory-intensive (stores the graph in RAM). Good for: datasets under ~50M vectors where you want fast queries. IVF (Inverted File): divides vectors into clusters (Voronoi cells). Query finds the nearest clusters, then searches within them. More memory-efficient than HNSW. Slower than HNSW but can handle larger datasets on limited RAM. HNSW is the modern default for most use cases. All major vector databases use HNSW or HNSW variants. IVF is used in Faiss and some large-scale deployments.

How do I choose the right vector dimension?

Vector dimension is determined by your embedding model — you can't choose independently. OpenAI text-embedding-3-small produces 1536-dimensional vectors; text-embedding-3-large produces 3072. BAAI/bge-large produces 1024. all-MiniLM-L6-v2 produces 384. Higher dimensions generally mean better quality but higher storage and query costs. OpenAI's Matryoshka embeddings support dimension reduction: text-embedding-3-large can be reduced to 256 dimensions with only minor quality loss. Rule of thumb: for most RAG applications, 768-1536 dimensions is sufficient. Only use 3072 if you need maximum retrieval precision.

What is hybrid search in vector databases?

Hybrid search combines dense vector search (semantic similarity) with sparse keyword search (BM25/TF-IDF). Dense search finds semantically similar content even without exact word overlap. Sparse search finds exact keyword matches — critical for product codes, proper nouns, technical terms. Hybrid retrieval consistently outperforms either alone by 5-20% on real-world benchmarks. Supported natively by: Weaviate (Alpha/BM25 fusion), Pinecone (sparse-dense), Qdrant (sparse vectors), Elasticsearch/OpenSearch (built-in). For production RAG systems, hybrid search should be the default choice over pure dense retrieval.

AI Tips Prompting Python AI Tools Web Dev ChatGPT LLM Agent Dev Reviews Notes Free Books

AiTechWorlds

AI application development code in Python editor — vector database guide

Ai Development

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

Q: What is a vector database?

A vector database stores high-dimensional numerical vectors (embeddings) and supports fast similarity search — finding the most similar vectors to a query vector. Unlike traditional databases that search by exact match or range queries, vector databases use approximate nearest neighbor (ANN) algorithms (HNSW, IVF) to find similar embeddings in milliseconds even across millions of vectors. They're essential for RAG systems (finding relevant document chunks), semantic search (finding similar content without exact keywords), recommendation systems, and any application where you need to 'find things with similar meaning to this.'

Q: What is the difference between HNSW and IVF indexing?

HNSW (Hierarchical Navigable Small World): builds a graph structure of vectors. Query traverses the graph to find nearest neighbors. Very fast at query time (~1ms for millions of vectors). Memory-intensive (stores the graph in RAM). Good for: datasets under ~50M vectors where you want fast queries. IVF (Inverted File): divides vectors into clusters (Voronoi cells). Query finds the nearest clusters, then searches within them. More memory-efficient than HNSW. Slower than HNSW but can handle larger datasets on limited RAM. HNSW is the modern default for most use cases. All major vector databases use HNSW or HNSW variants. IVF is used in Faiss and some large-scale deployments.

Q: How do I choose the right vector dimension?

Vector dimension is determined by your embedding model — you can't choose independently. OpenAI text-embedding-3-small produces 1536-dimensional vectors; text-embedding-3-large produces 3072. BAAI/bge-large produces 1024. all-MiniLM-L6-v2 produces 384. Higher dimensions generally mean better quality but higher storage and query costs. OpenAI's Matryoshka embeddings support dimension reduction: text-embedding-3-large can be reduced to 256 dimensions with only minor quality loss. Rule of thumb: for most RAG applications, 768-1536 dimensions is sufficient. Only use 3072 if you need maximum retrieval precision.

Q: What is hybrid search in vector databases?

Hybrid search combines dense vector search (semantic similarity) with sparse keyword search (BM25/TF-IDF). Dense search finds semantically similar content even without exact word overlap. Sparse search finds exact keyword matches — critical for product codes, proper nouns, technical terms. Hybrid retrieval consistently outperforms either alone by 5-20% on real-world benchmarks. Supported natively by: Weaviate (Alpha/BM25 fusion), Pinecone (sparse-dense), Qdrant (sparse vectors), Elasticsearch/OpenSearch (built-in). For production RAG systems, hybrid search should be the default choice over pure dense retrieval.

⚡ Quick Answer

Vector database guide 2025 — compare Pinecone, Weaviate, Chroma, pgvector and Qdrant by features, performance, cost, and use cases for production AI applications.

AiTechWorlds Team May 27, 2026 7 min read

#vector-database-guide #pinecone-vs-weaviate #chroma-vector-db #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

The first production RAG system I shipped used Chroma. By month three, it was too slow for our growing dataset. I migrated to Pinecone. Six months later, we were paying $400/month for what PostgreSQL with pgvector could handle for $20.

Every vector database has a sweet spot. Understanding the tradeoffs before you commit to one saves significant pain. This guide covers the major options with honest assessments of where each wins and loses.

The Vector Database Landscape

Category 1: Managed Cloud (zero ops)
  - Pinecone: easiest, most mature managed offering
  - Weaviate Cloud: managed Weaviate

Category 2: Self-Hosted Open-Source
  - Qdrant: Rust-based, excellent performance, great docs
  - Weaviate: full-featured, multimodal, active community
  - Milvus: enterprise-grade, complex but scalable

Category 3: Embedded (no separate server)
  - Chroma: Python-first, great for development
  - LanceDB: columnar storage, good for local use

Category 4: PostgreSQL Extensions
  - pgvector: vector search in existing PostgreSQL
  - pg_embedding: alternative to pgvector

Category 5: Search Platforms with Vector Support
  - Elasticsearch: full-text + vector hybrid
  - OpenSearch: AWS's Elasticsearch fork

Comparison Matrix

Database	Type	Scale	Hybrid Search	Setup	Cost
Pinecone	Managed	10M-1B+	✓	Minutes	$70+/mo
Weaviate Cloud	Managed	1M-100M+	✓	Minutes	$0-$25+/mo
Qdrant Cloud	Managed	1M-100M	✓	Minutes	$0-$50+/mo
Chroma	Embedded	<1M	✗	Seconds	Free
pgvector	Extension	<10M	Via PostgreSQL	Minutes	Existing DB
Qdrant (self-host)	Self-hosted	1M-1B	✓	1 hour	Infra cost
Weaviate (self-host)	Self-hosted	1M-1B	✓	1-2 hours	Infra cost

Chroma: Development Default

# pip install chromadb

import chromadb
from chromadb.utils import embedding_functions

# In-memory client (lost when process ends)
client = chromadb.Client()

# Persistent client
client = chromadb.PersistentClient(path="./chroma_db")

# Create collection with embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="documents",
    embedding_function=openai_ef,
    metadata={"hnsw:space": "cosine"}  # Distance metric
)

# Add documents (Chroma handles embedding)
collection.add(
    documents=[
        "The quick brown fox jumps over the lazy dog.",
        "Machine learning is a subset of artificial intelligence.",
        "Python is a versatile programming language.",
    ],
    metadatas=[
        {"source": "sample.txt", "category": "text"},
        {"source": "ml_intro.txt", "category": "ai"},
        {"source": "python_intro.txt", "category": "programming"},
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Query
results = collection.query(
    query_texts=["What is AI?"],
    n_results=2,
    where={"category": "ai"}  # Metadata filtering
)

for doc, meta, distance in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    print(f"Distance: {distance:.3f} | Source: {meta['source']}")
    print(f"Content: {doc[:100]}\n")

# Update and delete
collection.update(ids=["doc1"], documents=["Updated text content"])
collection.delete(ids=["doc1"])
print(f"Collection count: {collection.count()}")

Pinecone: Managed Production

# pip install pinecone-client

from pinecone import Pinecone, ServerlessSpec
import numpy as np

pc = Pinecone(api_key="your-pinecone-api-key")

# Create index
if "my-index" not in [idx.name for idx in pc.list_indexes()]:
    pc.create_index(
        name="my-index",
        dimension=1536,  # Must match your embedding model
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index("my-index")

# Upsert vectors (with metadata)
from openai import OpenAI

client = OpenAI()

def embed(texts: list[str]) -> list[list[float]]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [item.embedding for item in response.data]

documents = [
    {"id": "doc1", "text": "Introduction to machine learning concepts."},
    {"id": "doc2", "text": "Python data structures and algorithms."},
    {"id": "doc3", "text": "Deep learning with PyTorch tutorial."},
]

texts = [d["text"] for d in documents]
embeddings = embed(texts)

# Batch upsert
vectors = [
    {
        "id": doc["id"],
        "values": emb,
        "metadata": {"text": doc["text"], "category": "tech"}
    }
    for doc, emb in zip(documents, embeddings)
]

index.upsert(vectors=vectors, namespace="production")

# Query
query_embedding = embed(["how does machine learning work?"])[0]

results = index.query(
    namespace="production",
    vector=query_embedding,
    top_k=3,
    include_values=False,
    include_metadata=True,
    filter={"category": {"$eq": "tech"}}  # Metadata filter
)

for match in results.matches:
    print(f"Score: {match.score:.3f} | ID: {match.id}")
    print(f"Text: {match.metadata['text']}\n")

# Index statistics
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")

pgvector: PostgreSQL Integration

# pip install psycopg2-binary pgvector

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

# Connect to PostgreSQL with pgvector
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)

cur = conn.cursor()

# Enable extension and create table
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")

cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT NOT NULL,
        category VARCHAR(50),
        embedding vector(1536)  -- Match your model's dimensions
    )
""")

# Create HNSW index for fast approximate search
cur.execute("""
    CREATE INDEX IF NOT EXISTS documents_embedding_idx
    ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
""")

conn.commit()

# Insert with embeddings
def insert_document(content: str, category: str, embedding: list[float]):
    cur.execute(
        "INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
        (content, category, np.array(embedding))
    )
    conn.commit()

# Similarity search
def search(query_embedding: list[float], top_k: int = 5, category: str | None = None):
    if category:
        cur.execute("""
            SELECT content, category, 1 - (embedding <=> %s) AS similarity
            FROM documents
            WHERE category = %s
            ORDER BY embedding <=> %s
            LIMIT %s
        """, (np.array(query_embedding), category, np.array(query_embedding), top_k))
    else:
        cur.execute("""
            SELECT content, category, 1 - (embedding <=> %s) AS similarity
            FROM documents
            ORDER BY embedding <=> %s
            LIMIT %s
        """, (np.array(query_embedding), np.array(query_embedding), top_k))
    
    return cur.fetchall()

# Hybrid search (semantic + keyword)
def hybrid_search(query: str, query_embedding: list[float], top_k: int = 5):
    cur.execute("""
        SELECT content, category,
               ts_rank(to_tsvector(content), plainto_tsquery(%s)) AS keyword_score,
               1 - (embedding <=> %s) AS semantic_score,
               -- Combine: 50% keyword + 50% semantic
               (ts_rank(to_tsvector(content), plainto_tsquery(%s)) * 0.5 +
                (1 - (embedding <=> %s)) * 0.5) AS combined_score
        FROM documents
        WHERE to_tsvector(content) @@ plainto_tsquery(%s)
           OR (1 - (embedding <=> %s)) > 0.7
        ORDER BY combined_score DESC
        LIMIT %s
    """, (query, np.array(query_embedding), query, np.array(query_embedding), 
          query, np.array(query_embedding), top_k))
    
    return cur.fetchall()

Qdrant: High-Performance Self-Hosted

# pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue, SearchRequest
)

# Connect to local Qdrant (docker run -p 6333:6333 qdrant/qdrant)
client = QdrantClient(host="localhost", port=6333)

# Or cloud
# client = QdrantClient(url="https://your-cluster.qdrant.io", api_key="your-key")

# Create collection
client.recreate_collection(
    collection_name="articles",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Upsert points
points = [
    PointStruct(
        id=1,
        vector=[0.1] * 1536,  # Your actual embedding
        payload={"text": "Article about ML", "category": "ai", "views": 1500}
    ),
    PointStruct(
        id=2,
        vector=[0.2] * 1536,
        payload={"text": "Python tutorial", "category": "programming", "views": 3000}
    ),
]

client.upsert(collection_name="articles", points=points)

# Search with filters
results = client.search(
    collection_name="articles",
    query_vector=[0.1] * 1536,
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="category", match=MatchValue(value="ai")),
            FieldCondition(key="views", range={"gte": 1000})
        ]
    ),
    with_payload=True
)

for r in results:
    print(f"Score: {r.score:.3f} | Text: {r.payload['text']}")

Choosing the Right Database

Decision Tree:

Q: Is this for development/prototyping?
  YES → Chroma (zero setup, Python-native)

Q: Do you already use PostgreSQL?
  YES → pgvector (no new infrastructure)

Q: Do you need zero ops managed cloud?
  YES → Pinecone (easiest) or Weaviate Cloud

Q: Do you need maximum performance + self-hosted?
  YES → Qdrant (best performance/cost ratio for self-hosted)

Q: Do you need multimodal (images + text) vectors?
  YES → Weaviate (native multimodal support)

Q: Do you need full-text + vector hybrid (existing Elasticsearch)?
  YES → Stay in Elasticsearch with kNN

Scale thresholds:
  < 100K vectors: any option works, choose by ops preference
  100K-10M: Qdrant, Pinecone, Weaviate all scale fine
  10M+: Pinecone or Qdrant with proper instance sizing
  1B+: Pinecone, Milvus, or Elasticsearch

Conclusion

Vector databases have matured rapidly. For most applications, Chroma during development and Qdrant or pgvector in production covers 90% of use cases at a fraction of managed service costs.

Pinecone wins on operational simplicity — no servers, no maintenance, scales automatically. If your team's time is worth more than the $70+/month difference versus self-hosting, it's often the right choice.

For the retrieval application layer that uses these databases, see our RAG system tutorial. For understanding the embeddings stored in these databases, see our embeddings explained guide.

Frequently Asked Questions

A vector database stores high-dimensional numerical vectors (embeddings) and supports fast similarity search — finding the most similar vectors to a query vector. Unlike traditional databases that search by exact match or range queries, vector databases use approximate nearest neighbor (ANN) algorithms (HNSW, IVF) to find similar embeddings in milliseconds even across millions of vectors. They're essential for RAG systems (finding relevant document chunks), semantic search (finding similar content without exact keywords), recommendation systems, and any application where you need to 'find things with similar meaning to this.'

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI application development code in Python editor — ai api cost management

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI application development code in Python editor — build an ai chatbot with python build ai chatbot python

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI application development code in Python editor — build a personal ai assistant build personal ai assistant

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI application development code in Python editor — crewai tutorial

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

NotesPrompt Engineering Cheat Sheet NotesLLM Core Concepts Explained NotesChatGPT Tips & Tricks Cheat Sheet NotesAI Agent Development Notes NotesTransformer Architecture Cheat Sheet NotesPrompt Engineering vs Fine-Tuning vs RLHF

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Ai Development

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

⚡ Quick Answer

Vector database guide 2025 — compare Pinecone, Weaviate, Chroma, pgvector and Qdrant by features, performance, cost, and use cases for production AI applications.

AiTechWorlds Team May 27, 2026 7 min read

#vector-database-guide #pinecone-vs-weaviate #chroma-vector-db #ai-development

📚Part of the Ai Development guide — explore all Ai Development articles→

Share:Facebook Twitter/X LinkedIn Telegram WhatsApp

📱

Get more content like this on Telegram!

Daily AI tips, notes & resources — free

Join Free →

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

The Vector Database Landscape

Category 1: Managed Cloud (zero ops)
  - Pinecone: easiest, most mature managed offering
  - Weaviate Cloud: managed Weaviate

Category 2: Self-Hosted Open-Source
  - Qdrant: Rust-based, excellent performance, great docs
  - Weaviate: full-featured, multimodal, active community
  - Milvus: enterprise-grade, complex but scalable

Category 3: Embedded (no separate server)
  - Chroma: Python-first, great for development
  - LanceDB: columnar storage, good for local use

Category 4: PostgreSQL Extensions
  - pgvector: vector search in existing PostgreSQL
  - pg_embedding: alternative to pgvector

Category 5: Search Platforms with Vector Support
  - Elasticsearch: full-text + vector hybrid
  - OpenSearch: AWS's Elasticsearch fork

Comparison Matrix

Database	Type	Scale	Hybrid Search	Setup	Cost
Pinecone	Managed	10M-1B+	✓	Minutes	$70+/mo
Weaviate Cloud	Managed	1M-100M+	✓	Minutes	$0-$25+/mo
Qdrant Cloud	Managed	1M-100M	✓	Minutes	$0-$50+/mo
Chroma	Embedded	<1M	✗	Seconds	Free
pgvector	Extension	<10M	Via PostgreSQL	Minutes	Existing DB
Qdrant (self-host)	Self-hosted	1M-1B	✓	1 hour	Infra cost
Weaviate (self-host)	Self-hosted	1M-1B	✓	1-2 hours	Infra cost

Chroma: Development Default

# pip install chromadb

import chromadb
from chromadb.utils import embedding_functions

# In-memory client (lost when process ends)
client = chromadb.Client()

# Persistent client
client = chromadb.PersistentClient(path="./chroma_db")

# Create collection with embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="documents",
    embedding_function=openai_ef,
    metadata={"hnsw:space": "cosine"}  # Distance metric
)

# Add documents (Chroma handles embedding)
collection.add(
    documents=[
        "The quick brown fox jumps over the lazy dog.",
        "Machine learning is a subset of artificial intelligence.",
        "Python is a versatile programming language.",
    ],
    metadatas=[
        {"source": "sample.txt", "category": "text"},
        {"source": "ml_intro.txt", "category": "ai"},
        {"source": "python_intro.txt", "category": "programming"},
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Query
results = collection.query(
    query_texts=["What is AI?"],
    n_results=2,
    where={"category": "ai"}  # Metadata filtering
)

for doc, meta, distance in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    print(f"Distance: {distance:.3f} | Source: {meta['source']}")
    print(f"Content: {doc[:100]}\n")

# Update and delete
collection.update(ids=["doc1"], documents=["Updated text content"])
collection.delete(ids=["doc1"])
print(f"Collection count: {collection.count()}")

Pinecone: Managed Production

# pip install pinecone-client

from pinecone import Pinecone, ServerlessSpec
import numpy as np

pc = Pinecone(api_key="your-pinecone-api-key")

# Create index
if "my-index" not in [idx.name for idx in pc.list_indexes()]:
    pc.create_index(
        name="my-index",
        dimension=1536,  # Must match your embedding model
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index("my-index")

# Upsert vectors (with metadata)
from openai import OpenAI

client = OpenAI()

def embed(texts: list[str]) -> list[list[float]]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    return [item.embedding for item in response.data]

documents = [
    {"id": "doc1", "text": "Introduction to machine learning concepts."},
    {"id": "doc2", "text": "Python data structures and algorithms."},
    {"id": "doc3", "text": "Deep learning with PyTorch tutorial."},
]

texts = [d["text"] for d in documents]
embeddings = embed(texts)

# Batch upsert
vectors = [
    {
        "id": doc["id"],
        "values": emb,
        "metadata": {"text": doc["text"], "category": "tech"}
    }
    for doc, emb in zip(documents, embeddings)
]

index.upsert(vectors=vectors, namespace="production")

# Query
query_embedding = embed(["how does machine learning work?"])[0]

results = index.query(
    namespace="production",
    vector=query_embedding,
    top_k=3,
    include_values=False,
    include_metadata=True,
    filter={"category": {"$eq": "tech"}}  # Metadata filter
)

for match in results.matches:
    print(f"Score: {match.score:.3f} | ID: {match.id}")
    print(f"Text: {match.metadata['text']}\n")

# Index statistics
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")

pgvector: PostgreSQL Integration

# pip install psycopg2-binary pgvector

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

# Connect to PostgreSQL with pgvector
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)

cur = conn.cursor()

# Enable extension and create table
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")

cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT NOT NULL,
        category VARCHAR(50),
        embedding vector(1536)  -- Match your model's dimensions
    )
""")

# Create HNSW index for fast approximate search
cur.execute("""
    CREATE INDEX IF NOT EXISTS documents_embedding_idx
    ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
""")

conn.commit()

# Insert with embeddings
def insert_document(content: str, category: str, embedding: list[float]):
    cur.execute(
        "INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
        (content, category, np.array(embedding))
    )
    conn.commit()

# Similarity search
def search(query_embedding: list[float], top_k: int = 5, category: str | None = None):
    if category:
        cur.execute("""
            SELECT content, category, 1 - (embedding <=> %s) AS similarity
            FROM documents
            WHERE category = %s
            ORDER BY embedding <=> %s
            LIMIT %s
        """, (np.array(query_embedding), category, np.array(query_embedding), top_k))
    else:
        cur.execute("""
            SELECT content, category, 1 - (embedding <=> %s) AS similarity
            FROM documents
            ORDER BY embedding <=> %s
            LIMIT %s
        """, (np.array(query_embedding), np.array(query_embedding), top_k))
    
    return cur.fetchall()

# Hybrid search (semantic + keyword)
def hybrid_search(query: str, query_embedding: list[float], top_k: int = 5):
    cur.execute("""
        SELECT content, category,
               ts_rank(to_tsvector(content), plainto_tsquery(%s)) AS keyword_score,
               1 - (embedding <=> %s) AS semantic_score,
               -- Combine: 50% keyword + 50% semantic
               (ts_rank(to_tsvector(content), plainto_tsquery(%s)) * 0.5 +
                (1 - (embedding <=> %s)) * 0.5) AS combined_score
        FROM documents
        WHERE to_tsvector(content) @@ plainto_tsquery(%s)
           OR (1 - (embedding <=> %s)) > 0.7
        ORDER BY combined_score DESC
        LIMIT %s
    """, (query, np.array(query_embedding), query, np.array(query_embedding), 
          query, np.array(query_embedding), top_k))
    
    return cur.fetchall()

Qdrant: High-Performance Self-Hosted

# pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue, SearchRequest
)

# Connect to local Qdrant (docker run -p 6333:6333 qdrant/qdrant)
client = QdrantClient(host="localhost", port=6333)

# Or cloud
# client = QdrantClient(url="https://your-cluster.qdrant.io", api_key="your-key")

# Create collection
client.recreate_collection(
    collection_name="articles",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Upsert points
points = [
    PointStruct(
        id=1,
        vector=[0.1] * 1536,  # Your actual embedding
        payload={"text": "Article about ML", "category": "ai", "views": 1500}
    ),
    PointStruct(
        id=2,
        vector=[0.2] * 1536,
        payload={"text": "Python tutorial", "category": "programming", "views": 3000}
    ),
]

client.upsert(collection_name="articles", points=points)

# Search with filters
results = client.search(
    collection_name="articles",
    query_vector=[0.1] * 1536,
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="category", match=MatchValue(value="ai")),
            FieldCondition(key="views", range={"gte": 1000})
        ]
    ),
    with_payload=True
)

for r in results:
    print(f"Score: {r.score:.3f} | Text: {r.payload['text']}")

Choosing the Right Database

Decision Tree:

Q: Is this for development/prototyping?
  YES → Chroma (zero setup, Python-native)

Q: Do you already use PostgreSQL?
  YES → pgvector (no new infrastructure)

Q: Do you need zero ops managed cloud?
  YES → Pinecone (easiest) or Weaviate Cloud

Q: Do you need maximum performance + self-hosted?
  YES → Qdrant (best performance/cost ratio for self-hosted)

Q: Do you need multimodal (images + text) vectors?
  YES → Weaviate (native multimodal support)

Q: Do you need full-text + vector hybrid (existing Elasticsearch)?
  YES → Stay in Elasticsearch with kNN

Scale thresholds:
  < 100K vectors: any option works, choose by ops preference
  100K-10M: Qdrant, Pinecone, Weaviate all scale fine
  10M+: Pinecone or Qdrant with proper instance sizing
  1B+: Pinecone, Milvus, or Elasticsearch

Conclusion

Vector databases have matured rapidly. For most applications, Chroma during development and Qdrant or pgvector in production covers 90% of use cases at a fraction of managed service costs.

For the retrieval application layer that uses these databases, see our RAG system tutorial. For understanding the embeddings stored in these databases, see our embeddings explained guide.

Frequently Asked Questions

AiTechWorlds Team

✓ Verified Writer

The AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.

📱 Follow on Telegram 🐦 Follow on X Learn More →

AI Learning

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

AI API cost management — practical strategies to reduce OpenAI, Claude, and Gemini API costs by 80% using model selection, caching, RAG, prompt optimization, and batch processing.

May 27, 2026 7 min read

AI Learning

🔥 Trending

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build an AI chatbot with Python — complete tutorial from OpenAI API integration to conversation memory, streaming responses, and deploying a production-ready chatbot application.

May 27, 2026 7 min read

AI Learning

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

Build a personal AI assistant in Python with persistent memory, web search, file access, and calendar integration — a complete project from architecture to working prototype.

May 27, 2026 7 min read

AI Learning

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

CrewAI tutorial — build multi-agent AI systems where specialized agents collaborate to complete complex tasks, with practical Python examples for research, coding, and content workflows.

May 27, 2026 8 min read

Go deeper on this topic

10K+ Members Growing Daily

Get Free AI Notes Daily

Join AiTechWorlds on Telegram and get daily AI tips, prompt engineering templates, coding resources, and exclusive content — 100% free!

📚 Free Study Notes🤖 AI Tips Daily⚡ Prompt Templates💻 Coding Resources

Join Free Channel

No spam. Leave anytime.

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

The Vector Database Landscape

Comparison Matrix

Chroma: Development Default

Pinecone: Managed Production

pgvector: PostgreSQL Integration

Qdrant: High-Performance Self-Hosted

Choosing the Right Database

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

Vector Database Guide: Pinecone, Weaviate, Chroma, and pgvector Compared

The Vector Database Landscape

Comparison Matrix

Chroma: Development Default

Pinecone: Managed Production

pgvector: PostgreSQL Integration

Qdrant: High-Performance Self-Hosted

Choosing the Right Database

Conclusion

Further Reading

💬 DiscussionPowered by GitHub Discussions

Frequently Asked Questions

AiTechWorlds Team

Related Articles

AI API Cost Management: How to Cut LLM Costs by 80% Without Losing Quality

Build an AI Chatbot with Python: Complete Guide from Scratch to Deployment

Build a Personal AI Assistant: Complete Python Project with Memory and Tools

CrewAI Tutorial: Build Multi-Agent AI Systems That Work Together

Go deeper on this topic

Get Free AI Notes Daily